Project 1

The Leverage Effect in Monthly ETF Returns



In this project you will test out your new found tidyverse wrangling and analysis skills. You have all the core tools you will need to complete this, however along that way I will show you a couple of new functions. In the final part of the analysis I will show you how to generate an interesting scatter plot with the data - this is to give you a preview of the visualziation capabilities of ggplot2.

We will work through the project together in session by filling in the wrangler_assignment.R file, and the analysis_assignment.Rmd file.

You may find this project may be a bit challenging. If so, I recommend that you work through these again on your own, ideally starting from a blank .R and .Rmd file.


Analysis Objectives

Let’s first understand the project objectives at a high level.

  1. We will be analyzing 10 ETFs during 2018, and considering their performace in one month increments.

  2. Here are the symbols (these are the SPDR Sector Select Funds): XLY, XLP, XLE, XLF, XLV, XLI, XLB, XLRE, XLK, XLU.

  3. We will generate a report (in a data.frame) that calculates the monthly return and monthly volatility for each ETF and each month in 2018.

  4. We will then query our monthly report to determine which ETF had the highest return each month.

  5. To observe the presence of the leverage effect, we wll generate a scatter plot of monthly returns vs monthly volatilty. (We haven’t covered covered visualization, I’ll simply give you the code.)


Project Breakdown

Here is a step-by-step breakdown of how to perform the analysis task described above.

Wrangle (.R script)

Here are is a step-by-step guide to in the wrangler_assignment.R file. The numbers correspond to the commented outline numbers in the file.

  1. Load Packages: tidyverse, tidyquant, lubridate.

  2. Read-in the 2018 price data for the following symbols using tq_get(). Assign the pride data to a data.frame called df_etf.

symbols <- 
    c("XLY", "XLP", "XLE", "XLF", "XLV" 
      , "XLI", "XLB", "XLRE", "XLK", "XLU")
  1. Complete the following two data checks on df_etf
    1. use distinct() to check that all the symbols are in the data set
    2. use summarize() to determine the first and last dates in the data set to make sure they are in the correct range
  2. Use lubridate::year() and lubridate::month() to add a year and month columns to df_etf.
    1. as a minor data check, use distinct() to make sure there are 120 unique symbol, year, month combinations
  3. Generate the monthly performance report and call it df_report. This will look a lot like what we did in tutorial 9; it will involve select, group_by, mutate, and summarize. Label the calculated columns monthly_ret and monthly_vol.

  4. Use the write_csv() from the readr package to the data_output/ directory - call the file monthly_report.csv.


Analysis and Write-Up (.Rmd document)

These steps are to be completed in the analysis_assignment.Rmd file. The numbers correspond to the commented outline numbers in the file.

  1. Using the read_csv() function from the readr package, to import monthly_report.csv into a data.frame called df_report.

  2. With df_report, add a column called ret_rank and polulate it with the rank of the monthly returns for each month - this entails a grouped mutate using the rank() function. To find the top performer each month, filter df_report by ret_rank == 1.

  3. Here is th code to generate a scatter plot of monthly_ret vs monthly_vol. Would you say that the relationship is positive or negative?

df_report %>% 
    ggplot() + geom_point(aes(x = monthly_ret, monthly_vol))