Topics: 7 min readtime

Stock market analysis with Tableau 1/3

Written by Jels De Maeght
Friday 28 August, 2020

Analyse your stock market data in Tableau

 

I call hereby all stock data enthusiasts to follow along with this three-part miniseries for designing reporting around your stock portfolio in Tableau.
We'll go over all the different steps to go from having zero data to automating the data gathering. To wrap up we'll build a dashboard to keep an eye on our performance.

Three parts of which you'll read the first one today.

  • PART I : Gathering the Data
  • PART II: Leveraging Tableau Prep for automating the 'Gathering the data' - step
  • PART III: Building stock market specific visualisations and a portfolio Management Dashboard

At the start of COVID, I finally started working on a personal project of mine, namely being able to track my stock portfolio in Tableau. (This is just a hobby btw 😉) Off course before being able to bring analysis to our portfolio we need data. And in most cases lots of it!

Step 1: Import all necessary packages

import bs4 as bs
import datetime as dt
import os
import pandas as pd
import numpy as np
import pandas_datareader.data as pdr
import yfinance as yf
import pickle
import requests

While using Python for stock market analysis in Tableau, we need to be sure that we have all our used packages available for our script if we want to enable the capabilities of some of the packages. This means we obviously need to make them available for our script and also make sure that they are installed on your computer.

Some of these packages are very specific to the tasks at hand:

  • bs4 (Beautiful Soup): for web scraping
  • pandas_datareader: gives our script the capability to access remote data over the web
  • yfinance: makes a connection to the database of yahoo finance through their API’s

For example (how it looks in Terminal on Mac):Download-Python-Packages

Step 2: Retrieving company tickers

The following we’ll do is try and compose a list of company names and to retrieve the financial information in a uniform way, we’ll need the company - ticker. This is their official name on the stock market itself and is some kind of abbreviation. We have chosen in this case to retrieve the full list of the S&P500

The following script will go to the Wikipedia page source code and pick up the table where all the tickers are saved, once the script found that column, it will pick up all the tickers saved in column one and store them in a dataframe.
This dataframe containing the tickers will then be converted or put in a pickle, so we can later on just retrieve our list of tickers and we don’t need to retrieve them again via bs4 and the Wikipedia page.

def save_sp500_tickers():
resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
#respons.text = text of the sourcecode of the webpage itself
#lxml = the parser for that block of text of the sc code, translater
table = soup.find('table', {'class' : 'wikitable sortable'})
#identifier for the specific table
#next up is an empty DF for the tickers itself
tickers = []
for row in table.findAll('tr')[1:]:
ticker = row.findAll('td')[0].text
tickers.append(ticker)

with open("sp500tickers.pickle","wb") as f:
pickle.dump(tickers,f)
#split of the new line <tag>
result = []
for i in tickers:
result.append(i.split('\n')[0])

return result

Step 3: Retrieving pricing information for tickers

The following part of our script is a new function that will eventually pick up our basic information about all our S&P500 stocks

  • Date (EOD - data) => so we’ll generate a new line for every day
  • Open and close prices
    Adjusted closing prices (which are stock prices that are corrected for potential stock splits, this makes a comparison of stock prices uniform over different years and between all the different companies)
  • Volumes

The script starts by picking up our pickle containing all the S&P 500 tickers, then moves on to make a folder on our drive to store all the files it is going to generate.
Next, we’ll define a date range for the data we want to retrieve (don’t make it too big in the beginning, cause we’re talking lot’s of data otherwise).

Finally, it uses these tickers as an input to go online and pick up the previously described information for all the tickers and stores that information in a CSV file (per ticker).

def get_data_from_yahoo (reload_sp500 = False):
if reload_sp500:
tickers = save_sp500_tickers()
else:
with open("sp500tickers.pickle","rb") as f:
# remember => we are reading bytes instead of writing now
tickers = pickle.load(f)

#next steps are for storing the data we are pulling from YAHOO
if not os.path.exists('stock_dfs'):
os.makedirs('stock_dfs')

start = dt.datetime(2010, 1, 1)
end = dt.datetime.now()

for ticker in tickers:
# just in case your connection breaks, we'd like to save our progress!
if not os.path.exists('stock_dfs/{}.csv'.format(ticker)):
yf.pdr_override()
df = pdr.get_data_yahoo(ticker, start, end)
df.to_csv('stock_dfs/{}.csv'.format(ticker))
else:
print('Already have {}'.format(ticker))

get_data_from_yahoo()

If all goes well after running the script you need to see 500 CSV-files and see that your script has run successfully…

The following goes on for 500 times until the final [Done]Import-stock-market-tickers

Result of our script making a directory for our csv’s
stock-market-script-output

One of the 500 CSV’s in detail.
stock-market-csv-output

We can now automate a bunch of stuff like picking up the start date and making it flexible to the first date the stock had their IPO, and pick up all the historical data for each individual member of the S&P 500.

We could also take a look at importing an already existing CSV - file where we have saved our personal portfolio of maybe 20 stocks and pick up those tickers instead of scraping on the web for tickers...

Lots and lots of fun stuff we can do to finetune right now, I think my goal of this first part now is done and I like to see you guys again as readers of PART II where we will leverage Tableau Prep to do some of the data prep and automation stuff, EXCITING!!

Jels.