In this tutorial we calculate the volume weighted average close price (VWAP) for IWM over the month of December 2018.
In order to do this, we will need to learn how to perform summarizing calculations on the columns of a data.frame
. Our tool for this will be the summarize()
function in the dplyr package.
1. Let’s begin by loading the packages that we will need.
library(tidyverse)
library(tidyquant)
1. Let’s use tq_get()
to read in the IWM data for the month of December 2018.
df_iwm <- tq_get("IWM", get = "stock.prices", from = "2018-12-01",
to = "2019-01-01")
head(df_iwm)
# A tibble: 6 x 7
date open high low close volume adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018-12-03 154. 154. 152. 154. 23379800 153.
2 2018-12-04 154. 154. 147. 148. 41077900 147.
3 2018-12-06 145. 147. 143. 147. 37581200 146.
4 2018-12-07 147. 148. 143. 144. 28525100 143.
5 2018-12-10 144. 145. 141. 144. 30776600 143.
6 2018-12-11 146. 146. 142. 143. 29229300 143.
2. In the code above, we utilized the head()
function to only view the first few rows of df_IWM
. This is useful for larger data.frames
.
Coding Challenge: What code would you write to view all of df_IWM
?
1. Recall that the columns of df_iwm
are vectors.
2. A summarizing calculation takes in a vector of numbers and returns a single number.
3. The length()
function is one of the simplest example of this.
4. The following code calculates the length of the the close
column of df_iwm
.
length(df_iwm$close)
[1] 19
5. The following code calculates the sum of the close
prices.
sum(df_iwm$close)
[1] 2633.15
Code Challenge: Use sum()
and length()
to calculate the average close price of IWM during December 2018. Verify your answer by comparing it to the mean
function.
Dicussion Topic: Based on your own personal data analysis needs, what are other summarizing functions that you hope R has built in?
1. Let’s put together our new found data.frame
and aggregation skills towards the task of calculating the VWAP of IWM during December 2018.
close
prices defined as:
\[\begin{align*}
\sum_{i} \frac{v_i}{v_m} \cdot c_i
\end{align*}\]
where we have that:
volume
over the monthclose
price for day \(i\)3. As a first step, let’s calculate the total volume over the month and assign that value to a variable total_volume
.
total_volume <- sum(df_iwm$volume)
total_volume
[1] 682896600
4. Next, let’s capture the weight value \(\frac{v_i}{v_m}\) for each date by creating a column vwap_weight
which stores that value:
df_iwm <-
mutate(df_iwm, vwap_weight = volume / total_volume)
head(df_iwm)
# A tibble: 6 x 8
date open high low close volume adjusted vwap_weight
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018-12-03 154. 154. 152. 154. 23379800 153. 0.0342
2 2018-12-04 154. 154. 147. 148. 41077900 147. 0.0602
3 2018-12-06 145. 147. 143. 147. 37581200 146. 0.0550
4 2018-12-07 147. 148. 143. 144. 28525100 143. 0.0418
5 2018-12-10 144. 145. 141. 144. 30776600 143. 0.0451
6 2018-12-11 146. 146. 142. 143. 29229300 143. 0.0428
5. By definition, the vwap_weight
column should sum to 1; let’s verify this.
sum(df_iwm$vwap_weight)
[1] 1
6. Next, we will create a column called weighted_close
which contains the vwap_weight
multiplied by the close
.
df_iwm <-
mutate(df_iwm, weighted_close = vwap_weight * close)
7. The VWAP is now simply the sum of weighted_close
column.
sum(df_iwm$weighted_close)
[1] 137.5021
Code Challenge: Calculate the VWAP with a single line of code, using only the original columns of df_iwm
. (Hint: my solution involves a combination of $
, component-wise calculations, and aggregation.)
summarize()
1. Before we tackle the VWAP calculation with the tidyverse, let’s try a simpler exercise to introduce the summarize()
function.
2. Suppose that we want to calculate the follow for our IWM data: a. average close
price b. average daily volume
3. The summarize()
function gives us a compact way of perform both calculations simultaneously, and conveniently returns the results as a data.frame
.
summarize(df_iwm, avg_close = mean(close), avg_volume = mean(volume))
# A tibble: 1 x 2
avg_close avg_volume
<dbl> <dbl>
1 139. 35941926.
Code Challenge: Use summarize()
to calculate the average adjusted
price of IWM.
1. Our previous base R approach to calculating VWAP took several steps, and involved saving intermediate values.
2. In the Code Challenge that followed that section, we saw that we could perform that calculation in a single line of base R code.
3. The base R solutiona work fine, but they are a little ugly.
4. The following tidyverse code uses the summarize()
function to succinctly calculate the VWAP without all the intermediate steps. And the code is more readable.
summarize(df_iwm, vwap = sum((volume / sum(volume)) * close))
# A tibble: 1 x 1
vwap
<dbl>
1 138.
Code Challenge: Modify the above code to add the volume weighted average adjusted
price.