A look at the adjusting closing price of stocks from January 1, 2015 to September 28, 2015

Overview

The work presented in this report is a coding problem assignment that was giving to me during the hiring process for a data engineer position. Because the assignment was kind cool, I decided to make a report out of it.

The task of this assignment was to find the S&P stock with the highest adjusting close price standard deviation for the period January 1, 2015 until September 28, 2015.

Tools used

Python and Pandas
R and rvest for scraping the stocks list

The solution

As mentioned previously, the purpose of this assignment was to find the S&P stock with the highest close price standard deviation during a given period of time. So, the first part of the problem is to find the stocks list, since it was not given and it is kind of silly to write them in the code (there a bit over 500 stocks). To do this, I used R's rvest web scraping package.

Scraping the stocks list

The data was scraped from the Wikipedia page List of S&P 500 companies. If you see the page, it has a table with the 505 common stocks. R's rvest package works by specifying the css selector that matches the data we want. To find the css selector I wanted, I used the SelectorGadget widget. If you use the gadget in the page I linked before, you will see that the selector tag containing the stock code is tr:nth-child(i) td:nth-child(1), where i is the position of the stock (starting from 2) in the table.

This is the R script for scraping the stocks.

library(rvest)
stocks <- data.frame(stock = character(), stringsAsFactors = FALSE)
stocks.site <- read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
for (i in 2:506) {
stock.symbol <- stocks.site %>% 
                  html_node(paste0("tr:nth-child(", i, ") td:nth-child(1) .text")) %>%
                  html_text()
stocks[i - 1, 1] <-stock.symbol
}

write.table(stocks, file = 'stocks_list.txt', col.names = FALSE, row.names = FALSE,
            quote = FALSE)

Now that we have scraped the data, it's time for writing the solution.

Finding the highest adjusting close price standard deviation

The first step is to load the required libraries. As I mentioned at the beginning, we'll be using the Pandas library.

import datetime
import pandas as pd
import pandas.io.data

Read the data and create an empty list that will have the stocks.

# Load the stocks codes from an external file
stocks_file = open('stocks_list.txt', 'r')
stocks_list = []

Specify the time period

# Note: the stock market is closed on New Year's Day
start_date = datetime.datetime(2015, 1, 1)
end_date = datetime.datetime(2015, 9, 28)

To keep track of the largest standard deviation and its stock, I used a dictionary that will be updated each time the program finds a standard deviation greater than the current one.

# This dict has the current stock with the highest std
current_highest_adjclose = {'stock': 'placeholder', 'stdev': -1}

Now that the structures have been created, it's time to add the stocks to stocks_list.

# Add the stocks to a new list while removing \n
for line in stocks_file:
    stocks_list.append(line.strip('\n'))

The next piece of code is the main part of the script. It is a loop that iterates through the stocks list and do the following:

It looks for the stock stock using pd.io.data.get_data_yahoo ,which returns a dataframe. From said dataframe, we are interested in the 'Adj Close' column.
Then it calculates the standard deviation of the column.
It checks if the current standard deviation is greater than the one in the dictionary. If it is greater, it updates the value with the new standard deviation and it also updates the name of the stock. If the standard deviation is smaller, we continue.

for stock in stocks_list:
    print stock
    try:
        s = pd.io.data.get_data_yahoo(stock, start=start_date,
                                      end=end_date)['Adj Close']
        current_standard_deviation = s.std()
        if current_standard_deviation > current_highest_adjclose['stdev']:
            current_highest_adjclose['stock'] = stock
            current_highest_adjclose['stdev'] = current_standard_deviation
    except:
        'Something happened with ', stock

Finally, the stock and its standard deviation are printed.

print 'Highest \'Adj Close\' is: %f (%s)' % (
    current_highest_adjclose['stdev'],
    current_highest_adjclose['stock'])

Conclusion

So there you have it! I thought this exercise was kind of cool, mostly because it is a bit different than your standard programming assignment. Even though this one is very simple, it capture the essence of a real and possible case you could encounter while working with data. At the moment of writing, I am thinking of adding new stuff to the script, such as visualizations or further univariate analysis using some of the standard deviations or the adjusted close price.