Tutorial 4

Summarizing Columns - Calculating VWAP



In this tutorial we calculate the volume weighted average close price (VWAP) for IWM over the month of December 2018.

In order to do this, we will need to learn how to perform summarizing calculations on the columns of a data.frame. Our tool for this will be the summarize() function in the dplyr package.


Loading Packages

1. Let’s begin by loading the packages that we will need.

library(tidyverse)
library(tidyquant)


Reading-In Data

1. Let’s use tq_get() to read in the IWM data for the month of December 2018.

df_iwm <- tq_get("IWM", get = "stock.prices", from = "2018-12-01", 
                 to = "2019-01-01")
head(df_iwm)
# A tibble: 6 x 7
  date        open  high   low close   volume adjusted
  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 2018-12-03  154.  154.  152.  154. 23379800     153.
2 2018-12-04  154.  154.  147.  148. 41077900     147.
3 2018-12-06  145.  147.  143.  147. 37581200     146.
4 2018-12-07  147.  148.  143.  144. 28525100     143.
5 2018-12-10  144.  145.  141.  144. 30776600     143.
6 2018-12-11  146.  146.  142.  143. 29229300     143.

2. In the code above, we utilized the head() function to only view the first few rows of df_IWM. This is useful for larger data.frames.


Coding Challenge: What code would you write to view all of df_IWM?


Summarizing Calculations (Aggregations)

1. Recall that the columns of df_iwm are vectors.

2. A summarizing calculation takes in a vector of numbers and returns a single number.

3. The length() function is one of the simplest example of this.

4. The following code calculates the length of the the close column of df_iwm.

length(df_iwm$close)
[1] 19

5. The following code calculates the sum of the close prices.

sum(df_iwm$close)
[1] 2633.15

Code Challenge: Use sum() and length() to calculate the average close price of IWM during December 2018. Verify your answer by comparing it to the mean function.

Dicussion Topic: Based on your own personal data analysis needs, what are other summarizing functions that you hope R has built in?


Calculating VWAP (base)

1. Let’s put together our new found data.frame and aggregation skills towards the task of calculating the VWAP of IWM during December 2018.

2. Recall that VWAP is a weighted average of the close prices defined as: \[\begin{align*} \sum_{i} \frac{v_i}{v_m} \cdot c_i \end{align*}\]

where we have that:

  • \(i\) ranges over the days of the month
  • \(v_{i}\) is the volume on day \(i\)
  • \(v_{m}\) is the total volume over the month
  • \(c_{i}\) is the close price for day \(i\)

3. As a first step, let’s calculate the total volume over the month and assign that value to a variable total_volume.

total_volume <- sum(df_iwm$volume)
total_volume
[1] 682896600

4. Next, let’s capture the weight value \(\frac{v_i}{v_m}\) for each date by creating a column vwap_weight which stores that value:

df_iwm <- 
    mutate(df_iwm, vwap_weight = volume / total_volume)
head(df_iwm)
# A tibble: 6 x 8
  date        open  high   low close   volume adjusted vwap_weight
  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>       <dbl>
1 2018-12-03  154.  154.  152.  154. 23379800     153.      0.0342
2 2018-12-04  154.  154.  147.  148. 41077900     147.      0.0602
3 2018-12-06  145.  147.  143.  147. 37581200     146.      0.0550
4 2018-12-07  147.  148.  143.  144. 28525100     143.      0.0418
5 2018-12-10  144.  145.  141.  144. 30776600     143.      0.0451
6 2018-12-11  146.  146.  142.  143. 29229300     143.      0.0428

5. By definition, the vwap_weight column should sum to 1; let’s verify this.

sum(df_iwm$vwap_weight)
[1] 1

6. Next, we will create a column called weighted_close which contains the vwap_weight multiplied by the close.

df_iwm <- 
    mutate(df_iwm, weighted_close = vwap_weight * close)

7. The VWAP is now simply the sum of weighted_close column.

sum(df_iwm$weighted_close)
[1] 137.5021

Code Challenge: Calculate the VWAP with a single line of code, using only the original columns of df_iwm. (Hint: my solution involves a combination of $, component-wise calculations, and aggregation.)


Introdution to summarize()

1. Before we tackle the VWAP calculation with the tidyverse, let’s try a simpler exercise to introduce the summarize() function.

2. Suppose that we want to calculate the follow for our IWM data: a. average close price b. average daily volume

3. The summarize() function gives us a compact way of perform both calculations simultaneously, and conveniently returns the results as a data.frame.

summarize(df_iwm, avg_close = mean(close), avg_volume = mean(volume))
# A tibble: 1 x 2
  avg_close avg_volume
      <dbl>      <dbl>
1      139.  35941926.

Code Challenge: Use summarize() to calculate the average adjusted price of IWM.


Calculating VWAP (tidyverse)

1. Our previous base R approach to calculating VWAP took several steps, and involved saving intermediate values.

2. In the Code Challenge that followed that section, we saw that we could perform that calculation in a single line of base R code.

3. The base R solutiona work fine, but they are a little ugly.

4. The following tidyverse code uses the summarize() function to succinctly calculate the VWAP without all the intermediate steps. And the code is more readable.

summarize(df_iwm, vwap =  sum((volume / sum(volume)) * close))
# A tibble: 1 x 1
   vwap
  <dbl>
1  138.

Code Challenge: Modify the above code to add the volume weighted average adjusted price.