In our previous discussion of data subsetting we saw, how to access one or more rows of a data.frame
based on their position within the data.frame
.
As it turns out, positional row access has limited application.
A far more useful technique is to access all the rows of data.frame
the meet certain criteria.
This is known as filtering. Our main tool will be filter()
from dplyr.
1. Let’s load the packages we will need for this tutorial.
library(tidyverse)
library(tidyquant)
1. Let’s use tq_get()
to download price data for SPY, IWM, QQQ, and DIA for the month of December 2018.
symbols <- c("SPY", "IWM", "QQQ", "DIA")
df_etf <- tq_get(symbols, get = "stock.prices"
, from = "2018-12-01", to = "2019-01-01")
df_etf
# A tibble: 76 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SPY 2018-12-03 280. 280. 278. 279. 103176300 276.
2 SPY 2018-12-04 278. 279. 270. 270. 177986000 268.
3 SPY 2018-12-06 266. 270. 262. 270. 204185400 267.
4 SPY 2018-12-07 269. 271. 263. 264. 161018900 261.
5 SPY 2018-12-10 263. 265. 259. 264. 151445900 261.
6 SPY 2018-12-11 268. 268. 262. 264. 121504400 261.
7 SPY 2018-12-12 267. 269 265. 265. 97976700 263.
8 SPY 2018-12-13 267. 267. 264. 265. 96662700 263.
9 SPY 2018-12-14 263. 264. 260. 260. 116961100 258.
10 SPY 2018-12-17 259. 261. 254. 255. 165492300 253.
# … with 66 more rows
Observations:
Notice in our tq_get()
call that the first argument is a vector of four symbols, rather than just a single symbol.
This is a feature of many R functions - they know how to behave for a single input, or for a vector of multiple inputs.
1. Suppose we want to grab only the rows for IWM.
2. In order to do this we would use the filter()
function from dplyr as follows:
filter(df_etf, symbol == "IWM")
# A tibble: 19 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IWM 2018-12-03 154. 154. 152. 154. 23379800 153.
2 IWM 2018-12-04 154. 154. 147. 148. 41077900 147.
3 IWM 2018-12-06 145. 147. 143. 147. 37581200 146.
4 IWM 2018-12-07 147. 148. 143. 144. 28525100 143.
5 IWM 2018-12-10 144. 145. 141. 144. 30776600 143.
6 IWM 2018-12-11 146. 146. 142. 143. 29229300 143.
7 IWM 2018-12-12 145. 147. 145. 145. 27894800 144.
8 IWM 2018-12-13 145. 146. 143. 143. 23205400 142.
9 IWM 2018-12-14 142. 143. 140. 141. 24037200 140.
10 IWM 2018-12-17 140. 141. 136. 137. 36468500 137.
11 IWM 2018-12-18 138. 139. 136. 137. 42277200 136.
12 IWM 2018-12-19 137. 139. 133. 134 47676600 134.
13 IWM 2018-12-20 134. 134. 130. 132. 57908100 131.
14 IWM 2018-12-21 132. 133. 128. 128. 59404800 128.
15 IWM 2018-12-24 128. 128. 126. 126. 28575700 126.
16 IWM 2018-12-26 126. 132. 126. 132. 40182700 132.
17 IWM 2018-12-27 130. 132. 128. 132. 39527900 132.
18 IWM 2018-12-28 132. 135. 132. 133. 35994400 133.
19 IWM 2018-12-31 134. 134. 132. 134. 29173400 134.
Observations:
When checking for equality in filter()
we must use the double equals ==
.
Our filtering doesn’t affect the original data.frame
.
3. As you may have guessed, we can rewrite above code with a %>%
:
df_etf %>% filter(symbol == "IWM")
# A tibble: 19 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IWM 2018-12-03 154. 154. 152. 154. 23379800 153.
2 IWM 2018-12-04 154. 154. 147. 148. 41077900 147.
3 IWM 2018-12-06 145. 147. 143. 147. 37581200 146.
4 IWM 2018-12-07 147. 148. 143. 144. 28525100 143.
5 IWM 2018-12-10 144. 145. 141. 144. 30776600 143.
6 IWM 2018-12-11 146. 146. 142. 143. 29229300 143.
7 IWM 2018-12-12 145. 147. 145. 145. 27894800 144.
8 IWM 2018-12-13 145. 146. 143. 143. 23205400 142.
9 IWM 2018-12-14 142. 143. 140. 141. 24037200 140.
10 IWM 2018-12-17 140. 141. 136. 137. 36468500 137.
11 IWM 2018-12-18 138. 139. 136. 137. 42277200 136.
12 IWM 2018-12-19 137. 139. 133. 134 47676600 134.
13 IWM 2018-12-20 134. 134. 130. 132. 57908100 131.
14 IWM 2018-12-21 132. 133. 128. 128. 59404800 128.
15 IWM 2018-12-24 128. 128. 126. 126. 28575700 126.
16 IWM 2018-12-26 126. 132. 126. 132. 40182700 132.
17 IWM 2018-12-27 130. 132. 128. 132. 39527900 132.
18 IWM 2018-12-28 132. 135. 132. 133. 35994400 133.
19 IWM 2018-12-31 134. 134. 132. 134. 29173400 134.
Code Challenge: Grab all the SPY prices from df_etf
- make sure to use %>%
.
1. Suppose we want to grab the prices for the December 2018 regular option expiration which was 12/21/2018.
2. We can again do this with the filter()
function:
df_etf %>% filter(date == "2018-12-21")
# A tibble: 4 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SPY 2018-12-21 247. 250. 240. 241. 255345600 240.
2 IWM 2018-12-21 132. 133. 128. 128. 59404800 128.
3 QQQ 2018-12-21 153. 154. 147. 148. 141129400 147.
4 DIA 2018-12-21 229. 232. 224. 224. 10242700 223.
Code Challenge: Grab the prices from New Years Eve - make sure to use %>%
.
1. What if we want grab all the prices for both QQQ and DIA?
2. We can do so by using the %in%
operator in our filter()
condition:
df_etf %>%
filter(symbol %in% c("QQQ", "DIA"))
# A tibble: 38 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 QQQ 2018-12-03 173. 173. 170. 172. 50771700 172.
2 QQQ 2018-12-04 171. 172. 166. 166. 70594700 165.
3 QQQ 2018-12-06 162. 167. 162. 167. 71715500 166.
4 QQQ 2018-12-07 166. 167. 161. 161. 80432200 161.
5 QQQ 2018-12-10 161. 164. 159. 163. 73960800 162.
6 QQQ 2018-12-11 166. 166. 162. 164. 59058300 163.
7 QQQ 2018-12-12 166 168. 165. 165. 53780000 164.
8 QQQ 2018-12-13 166. 167. 164. 165. 46378500 164.
9 QQQ 2018-12-14 163. 164. 161. 161. 56547400 160.
10 QQQ 2018-12-17 160. 162. 156. 157. 74834100 157.
# … with 28 more rows
Code Challenge: Grab all prices for SPY, IWM, and QQQ.
1. Suppose that we are only concerned with prices after 12/15/2018.
2. We can isolate these prices as follows:
df_etf %>% filter(date > "2018-12-15")
# A tibble: 40 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SPY 2018-12-17 259. 261. 254. 255. 165492300 253.
2 SPY 2018-12-18 257. 258. 253. 255. 134515100 252.
3 SPY 2018-12-19 255. 259. 249. 251. 214992800 249.
4 SPY 2018-12-20 250. 252. 245. 247. 252053400 245.
5 SPY 2018-12-21 247. 250. 240. 241. 255345600 240.
6 SPY 2018-12-24 239. 241. 234. 234. 147311600 233.
7 SPY 2018-12-26 236. 246. 234. 246. 218485400 245.
8 SPY 2018-12-27 243. 248. 239. 248. 186267300 247.
9 SPY 2018-12-28 250. 251. 246. 248. 153100200 247.
10 SPY 2018-12-31 250. 250. 247. 250. 144299400 249.
# … with 30 more rows
Code Challenge: Grab all prices from before 12/7/2018.
1. Suppose we want all prices for SPY and IWM that come after Christmas.
2. This is achieved easily by piping together two filters
.
df_etf %>%
filter(symbol %in% c("SPY", "IWM")) %>%
filter(date > "2018-12-25")
# A tibble: 8 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SPY 2018-12-26 236. 246. 234. 246. 218485400 245.
2 SPY 2018-12-27 243. 248. 239. 248. 186267300 247.
3 SPY 2018-12-28 250. 251. 246. 248. 153100200 247.
4 SPY 2018-12-31 250. 250. 247. 250. 144299400 249.
5 IWM 2018-12-26 126. 132. 126. 132. 40182700 132.
6 IWM 2018-12-27 130. 132. 128. 132. 39527900 132.
7 IWM 2018-12-28 132. 135. 132. 133. 35994400 133.
8 IWM 2018-12-31 134. 134. 132. 134. 29173400 134.
Code Challenge: Isolate the price of QQQ on 12/31/2018 using two filter()
calls piped together.
R4DS - 5.2 - Filter rows with filter()