In this project you will test out your new found tidyverse wrangling and analysis skills. You have all the core tools you will need to complete this, however along that way I will show you a couple of new functions. In the final part of the analysis I will show you how to generate an interesting scatter plot with the data - this is to give you a preview of the visualziation capabilities of ggplot2.
We will work through the project together in session by filling in the wrangler_assignment.R
file, and the analysis_assignment.Rmd
file.
You may find this project may be a bit challenging. If so, I recommend that you work through these again on your own, ideally starting from a blank .R and .Rmd file.
Let’s first understand the project objectives at a high level.
We will be analyzing 10 ETFs during 2018, and considering their performace in one month increments.
Here are the symbols (these are the SPDR Sector Select Funds): XLY, XLP, XLE, XLF, XLV, XLI, XLB, XLRE, XLK, XLU.
We will generate a report (in a data.frame
) that calculates the monthly return and monthly volatility for each ETF and each month in 2018.
We will then query our monthly report to determine which ETF had the highest return each month.
To observe the presence of the leverage effect, we wll generate a scatter plot of monthly returns vs monthly volatilty. (We haven’t covered covered visualization, I’ll simply give you the code.)
Here is a step-by-step breakdown of how to perform the analysis task described above.
Here are is a step-by-step guide to in the wrangler_assignment.R
file. The numbers correspond to the commented outline numbers in the file.
Load Packages: tidyverse, tidyquant, lubridate.
Read-in the 2018 price data for the following symbols using tq_get()
. Assign the pride data to a data.frame
called df_etf
.
symbols <-
c("XLY", "XLP", "XLE", "XLF", "XLV"
, "XLI", "XLB", "XLRE", "XLK", "XLU")
df_etf
distinct()
to check that all the symbols are in the data setsummarize()
to determine the first and last dates in the data set to make sure they are in the correct rangelubridate::year()
and lubridate::month()
to add a year
and month
columns to df_etf
.
distinct()
to make sure there are 120 unique symbol
, year
, month
combinationsGenerate the monthly performance report and call it df_report
. This will look a lot like what we did in tutorial 9; it will involve select
, group_by
, mutate
, and summarize
. Label the calculated columns monthly_ret
and monthly_vol
.
Use the write_csv()
from the readr package to the data_output/
directory - call the file monthly_report.csv
.
These steps are to be completed in the analysis_assignment.Rmd
file. The numbers correspond to the commented outline numbers in the file.
Using the read_csv()
function from the readr package, to import monthly_report.csv
into a data.frame
called df_report
.
With df_report
, add a column called ret_rank
and polulate it with the rank of the monthly returns for each month - this entails a grouped mutate
using the rank()
function. To find the top performer each month, filter df_report
by ret_rank == 1
.
Here is th code to generate a scatter plot of monthly_ret
vs monthly_vol
. Would you say that the relationship is positive or negative?
df_report %>%
ggplot() + geom_point(aes(x = monthly_ret, monthly_vol))