The purpose of this tutorial give a brief tour of R, by getting you to type some R code.
The code that you write will accomplish two finance related tasks:
Don’t worry if what you type seems foreign. As long as you are running code and getting output (even errors), you are making progress.
Type the following and the press ctrl + shift + enter.
rnorm(5)
[1] -0.04990784 0.24567051 -0.51225640 0.27608265 -0.67427157
What did we do?
we typed code that called the function rnorm()
with the input 5
.
we ran the code by pressing ctrl + shift + enter.
What happened?
Code Challenge: Generate a set of 10 random numbers and print them to the screen.
1. R is free and open source.
2. Anyone can extend R by creating packages of functionality.
3. There are thousands of packages freely available on CRAN.
4. The power of R comes from this huge ecosystem of packages.
5. Proficiency in R entails knowing which packages will be useful for you.
1. If you want to use a package, you first have to install it on your machine.
2. This is done easily from RStudio by running: install.packages("package_name")
3. Type this code to install the packages that we will need throughout this course.
install.packages("tidyverse")
install.packages("tidyquant")
install.packages("lubridate")
4. These packages now live on your machine.
5. In order to use them in an R session, you have to load them with the library()
function.
1. Whenever you sit down to do some analysis, you will first load the packages that you are going to need.
2. This is done with the following command: library(package_name)
3. Let’s load the two packages we are going to need for this tutorial.
library(tidyverse)
library(tidyquant)
4. When you run the code you will see a bunch of text get printed - don’t worry about this for now.
1. In R, most data analysis takes the form of applying functions to input data. The result of the function call is usually more data.
2. For example, we can use the tq_get()
function in the tidyquant package to grab price data from Yahoo Finance.
3. The following function call retrieves historical SPY price data - from 2014-2018 - and then prints it to the screen.
tq_get("SPY", get = "stock.prices", from = "2014-01-01"
, to = "2019-01-01")
# A tibble: 1,258 x 7
date open high low close volume adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2014-01-02 184. 184. 182. 183. 119636900 165.
2 2014-01-03 183. 184. 183. 183. 81390600 165.
3 2014-01-06 183. 184. 182. 182. 108028200 164.
4 2014-01-07 183. 184. 183. 183. 86144200 165.
5 2014-01-08 183. 184. 183. 184. 96582300 165.
6 2014-01-09 184. 184. 183. 184. 90683400 166.
7 2014-01-10 184. 184. 183. 184. 102026400 166.
8 2014-01-13 184. 184. 181. 182. 149892000 164.
9 2014-01-14 182. 184. 182. 184. 105016100 166.
10 2014-01-15 184. 185. 184. 185. 98525800 167.
# … with 1,248 more rows
Code Challenge: Copy and paste the tq_get()
call from above, and then modify it to grab data only for the month of December 2018.
1. Assigning values to variables is an important part of data analysis.
2. A variable can contain data that is as simple as a single character, or as complicated as a five million row data set.
3. In R, we use <-
to assign value to a variable. (Keyboard shortcut alt + “–”.)
4. The following code assigns the 5-year data set of SPY prices to the variable df_spy
.
df_spy <-
tq_get(x = "SPY", get = "stock.prices", from = "2014-01-01"
, to = "2018-12-31")
1. You can view the contents of a variable by running name of the variable.
df_spy
# A tibble: 1,257 x 7
date open high low close volume adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2014-01-02 184. 184. 182. 183. 119636900 165.
2 2014-01-03 183. 184. 183. 183. 81390600 165.
3 2014-01-06 183. 184. 182. 182. 108028200 164.
4 2014-01-07 183. 184. 183. 183. 86144200 165.
5 2014-01-08 183. 184. 183. 184. 96582300 165.
6 2014-01-09 184. 184. 183. 184. 90683400 166.
7 2014-01-10 184. 184. 183. 184. 102026400 166.
8 2014-01-13 184. 184. 181. 182. 149892000 164.
9 2014-01-14 182. 184. 182. 184. 105016100 166.
10 2014-01-15 184. 185. 184. 185. 98525800 167.
# … with 1,247 more rows
1. Visualization is an important tool in data analysis.
2. The ggplot2 package, which is a part of the tidyverse, makes plotting easy.
3. This code references the price data that we have in df_spy
and then plots it.
ggplot(data = df_spy) + geom_line(mapping = aes(x = date, y = close))
Code Challenge: Copy the above ggplot()
function call, and then modify the code to graph the adjusted
prices instead of the close
prices.