Tutorial 8

Calculating Daily Returns



In this tutorial, we move towards doing tasks that are useful in a finance context. Specifically, we calculate daily returns for SPY during the month of December 2018.

This simple returns calculation will serve as a window to various idioms and patterns in base R and tidyverse programming.

In order to acocomplish this we will need to discuss two preliminary programming topics:

  1. conditional statements: if-else

  2. iterating with for loops


Loading Packages

1. As usual, let’s begin by loading the packages that we will need.

library(tidyverse)
library(tidyquant)


Conditional Statements: if-else

1. Conditional statements are ubiquitous in programming and data analysis.

2. Here is a toy example to demonstrate the syntax of if-else statements in R.

payoff <- NA
strike <- 50
upx <- 45

if ((upx - strike) > 0){
    payoff <- upx - strike
} else {
  payoff <- 0  
}
    
payoff
[1] 0

Code Challenge: Modify the code above by setting upx to 55 and rerun it. What is the resulting value of payoff?


Reading-In Data

1. Let’s use tq_get() to grab the December 2018 SPY prices; we will use these prices to calculate our daily returns.

df_spy <- tq_get("SPY", get = "stock.prices", from = "2018-12-01"
                 , to = "2019-01-01")

df_spy %>% head()
# A tibble: 6 x 7
  date        open  high   low close    volume adjusted
  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
1 2018-12-03  280.  280.  278.  279. 103176300     276.
2 2018-12-04  278.  279.  270.  270. 177986000     268.
3 2018-12-06  266.  270.  262.  270. 204185400     267.
4 2018-12-07  269.  271.  263.  264. 161018900     261.
5 2018-12-10  263.  265.  259.  264. 151445900     261.
6 2018-12-11  268.  268.  262.  264. 121504400     261.


Iterating with for Loops

1. A for loop allows you to repeatedly run a block of code a fixed number of times.

2. A pattern that occurs often is iterating through a vector of consecutive integers.

3. The following code iterates through the numbers 1 through 5 and prints them to the screen:

for (ix in 1:5){
    print(ix)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

4. A pattern that is common in data analysis is iterating through the column of a data frame.

5. The following code iterates through df_spy row by row and to prints the date value for each row.

for (ix in 1:nrow(df_spy)){
    print(df_spy$date[ix])
}
[1] "2018-12-03"
[1] "2018-12-04"
[1] "2018-12-06"
[1] "2018-12-07"
[1] "2018-12-10"
[1] "2018-12-11"
[1] "2018-12-12"
[1] "2018-12-13"
[1] "2018-12-14"
[1] "2018-12-17"
[1] "2018-12-18"
[1] "2018-12-19"
[1] "2018-12-20"
[1] "2018-12-21"
[1] "2018-12-24"
[1] "2018-12-26"
[1] "2018-12-27"
[1] "2018-12-28"
[1] "2018-12-31"

Code Challenge: Write a for loop that prints the first five dates in df_spy.


Calculating Returns - for loop

1. We can now combine our knowledge of conditional execution and iteration to calculate the daily returns of SPY during the month of December 2018.

2. Our code will encapsulate the following two principles about returns:

  1. On the first day in our data set, the return will be set to 0.

  2. On all subsequent days the simple return is defined as (curr_price / prev_price) - 1.

3. Let’s begin with a couple of preliminary steps.

4. First, we will sort our data frame by date since our calculation requires the prices to be in chronological order. This is done with the arrange() function as follows:

df_spy <- df_spy %>% arrange(date)
df_spy %>% head()
# A tibble: 6 x 7
  date        open  high   low close    volume adjusted
  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
1 2018-12-03  280.  280.  278.  279. 103176300     276.
2 2018-12-04  278.  279.  270.  270. 177986000     268.
3 2018-12-06  266.  270.  262.  270. 204185400     267.
4 2018-12-07  269.  271.  263.  264. 161018900     261.
5 2018-12-10  263.  265.  259.  264. 151445900     261.
6 2018-12-11  268.  268.  262.  264. 121504400     261.

5. Second, let’s add a column called ret and initialize it with NA values. In R, an NA value represents missing data.

df_spy <- df_spy %>% mutate(ret = NA)
df_spy %>% head()
# A tibble: 6 x 8
  date        open  high   low close    volume adjusted ret  
  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <lgl>
1 2018-12-03  280.  280.  278.  279. 103176300     276. NA   
2 2018-12-04  278.  279.  270.  270. 177986000     268. NA   
3 2018-12-06  266.  270.  262.  270. 204185400     267. NA   
4 2018-12-07  269.  271.  263.  264. 161018900     261. NA   
5 2018-12-10  263.  265.  259.  264. 151445900     261. NA   
6 2018-12-11  268.  268.  262.  264. 121504400     261. NA   

6. Now we are ready to complete the task of calculating the daily returns.

7. The following code iterates through the df_spy, and performs a different calculation depending on whether or not it’s the first date.

for (ix in 1:nrow(df_spy)){
    
    if (ix == 1){
        # for the first date just set to zero
        df_spy$ret[ix] <- 0
    } else {
        # grabbing prices from data frame
        curr_price <- df_spy$close[ix]
        prev_price <- df_spy$close[ix - 1]
        df_spy$ret[ix] <- (curr_price / prev_price) - 1
    }
}

df_spy %>% head()
# A tibble: 6 x 8
  date        open  high   low close    volume adjusted       ret
  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>     <dbl>
1 2018-12-03  280.  280.  278.  279. 103176300     276.  0       
2 2018-12-04  278.  279.  270.  270. 177986000     268. -0.0324  
3 2018-12-06  266.  270.  262.  270. 204185400     267. -0.00152 
4 2018-12-07  269.  271.  263.  264. 161018900     261. -0.0232  
5 2018-12-10  263.  265.  259.  264. 151445900     261.  0.00190 
6 2018-12-11  268.  268.  262.  264. 121504400     261.  0.000227

Coding Challenge: Calculate the one month return over the entirety of December 2018. (Bonus points if you can do it with using division. Hint: the prod() function calculates the product of an array of numbers.)


Calculating Returns - vectorized base

1. While our for loop solution above works, it is not an ideal. Moreover, it’s not idiomatic of data analysis programming.

2. By utilizing a few tidyverse functions we can write code that is easier to read.

3. Our key ingredient will be the lag() function in dplyr.

4. The lag() function take in a vector, and returns a vector of the same length with all the entries pushed down one spot.

5. Let’s observe the effect that lag() has when applied to the first 4 element of the date column of df_spy.

df_spy$date[1:4]
[1] "2018-12-03" "2018-12-04" "2018-12-06" "2018-12-07"
lag(df_spy$date[1:4])
[1] NA           "2018-12-03" "2018-12-04" "2018-12-06"

6. Let’s apply it similarly to the close column:

df_spy$close[1:4]
[1] 279.30 270.25 269.84 263.57
df_spy$close[1:4] %>% lag()
[1]     NA 279.30 270.25 269.84

7. The following code exploits the lag() and calculates all the daily returns in a single line of code (rather than a for loop). This type of vectorized code is more idiomatic data analysis in R.

(df_spy$close / lag(df_spy$close)) - 1
 [1]            NA -0.0324023931 -0.0015171286 -0.0232359513  0.0018970292
 [6]  0.0002272049  0.0050353461 -0.0003390191 -0.0184647628 -0.0196183821
[11] -0.0010964873 -0.0149757212 -0.0162779475 -0.0261763202 -0.0264229376
[16]  0.0505248664  0.0076773664 -0.0012899867  0.0087588214

8. Using base R techniques, let’s create a new column in df_spy called ret_vec and populate it with the daily returns.

df_spy$ret_vec <- (df_spy$close / lag(df_spy$close)) - 1
df_spy %>% 
    select(date, close, ret, ret_vec) %>% 
    head()
# A tibble: 6 x 4
  date       close       ret    ret_vec
  <date>     <dbl>     <dbl>      <dbl>
1 2018-12-03  279.  0         NA       
2 2018-12-04  270. -0.0324    -0.0324  
3 2018-12-06  270. -0.00152   -0.00152 
4 2018-12-07  264. -0.0232    -0.0232  
5 2018-12-10  264.  0.00190    0.00190 
6 2018-12-11  264.  0.000227   0.000227


Calculating Returns - tidyverse

1. The vectorized base solution is an improvement in many ways, but it is a little hard to read. There are multiple $ signs and multiple references to df_spy. This style of code gets especially confusing as your analysis grows in complexity.

2. Combining tidyverse functions with the pipe allow us affect similar vectorized code in a succinct and readable fashion.

3. The following code utilizes tidyverse idioms to add a returns column to df_spy. We will call our new return calculation ret_tidy.

df_spy <- 
    df_spy %>% mutate(ret_tidy = (close / lag(close)) - 1) 

4. Notice that ret, ret_vec, and ret_tidy all match.

df_spy %>% 
    select(date, ret, ret_vec, ret_tidy) %>% 
    head()
# A tibble: 6 x 4
  date             ret    ret_vec   ret_tidy
  <date>         <dbl>      <dbl>      <dbl>
1 2018-12-03  0         NA         NA       
2 2018-12-04 -0.0324    -0.0324    -0.0324  
3 2018-12-06 -0.00152   -0.00152   -0.00152 
4 2018-12-07 -0.0232    -0.0232    -0.0232  
5 2018-12-10  0.00190    0.00190    0.00190 
6 2018-12-11  0.000227   0.000227   0.000227

Code Challenge: Apply tidyverse idioms add a column to df_spy called ret_adj. This column will contain daily returns using the adjusted prices.


Further Reading

R4DS - 5.3 - Arrange rows with arrange()

R4DS - 5.4 - Select columns with select()

R4DS - 5.5 - Add new variables with mutate()

R4DS - 21.1 - Introduction

R4DS - 21.2 - For loops

R4DS - 21.3 - For loop variations