Analyzing 'South Park' for Stock Market Strategies - The Data

Is a South Park Stock Market trading strategy viable?

In the following, we will use Python to get a list of all the mentions of public companies mentioned in South Park episodes. We’ll scrape Wikipedia for this purpose.

Following this, we’ll try to use this “alt-data” to generate a South Park Stock Market trading strategy.

Step 1: Get a list of all the individual episode wikis

We can see that the full list of episodes is available on this page, split between a few different tables, over here: https://en.wikipedia.org/wiki/List_of_South_Park_episodes.

While manually taking every link is an option, it’s an excruciating and time-consuming one. Luckily, we do have a better way of doing this!

Using the requests library, alongside BeautifulSoup, we can extract all the info on the URL:

url = 'https://en.wikipedia.org/wiki/List_of_South_Park_episodes' page = requests.get(url) soup = BeautifulSoup(page.content)

However, our “soup” contains a lot more than we need, so let’s try and find our specific info.

Looking at the page source, we notice that our desired information is situated in blocks with the following specifics:

table_class = "wikitable plainrowheaders wikiepisodetable" span_class = "bday dtstart published updated"

Using the information above, and a bit more tweaking, we arrive at a final code that loops through the tables in the page that have the appropriate class, and through their rows. The information is then gathered in a dictionary.

dict_sp={} i=1 for table in soup.find_all("table", class_=table_class): for tr in table.find_all('tr', class_="vevent"): if 'id' in tr.find('th').attrs.keys(): dict_sp={'date':tr.find('span',class_=span_class).contents, 'episode_id':tr.find('th').attrs['id'], 'episode_wiki_link':tr.find('a', href=True, title=True).attrs['href'], 'episode_title':tr.find('a', href=True, title=True).attrs['title']} i=i+1

An entry in our resulting dict looks like this:

{'date': '1997-08-13', 'episode_id': 'ep1', 'episode_wiki_link': '/wiki/Cartman_Gets_an_Anal_Probe', 'episode_title': 'Cartman Gets an Anal Probe'}

Step 2: Crawl through each episode wiki and find all the wiki links in the pages

We first define a helper function:

def find_wiki_hrefs_in_soup(soup): wiki_hrefs=[] for elem in soup.find_all("a", href=True, title=True): elem_href=elem.attrs['href'] if elem_href.split("/")=='wiki': wiki_hrefs.append(elem_href) return np.unique(wiki_hrefs)

Using it, we go through each item of our dictionary, apply the function, and append the new info to our dictionary. We also add a small sleep in between calls, so that we don’t overwhelm our environment and also so that we don’t get blocked by Wikipedia.

for key in dict_sp.keys(): url_ep = 'https://en.wikipedia.org/'+dict_sp['episode_wiki_link'] page_ep = requests.get(url_ep) soup_ep = BeautifulSoup(page_ep.content) dict_sp['links']=find_wiki_hrefs_in_soup(soup_ep) time.sleep(0.1)

Now, are dictionary looks like this (when converted to a Pandas DataFrame):

Step 3: See if any of the links match public companies

While the most straightforward approach would be to start crawling through all the links in all the episodes, this would not be ideal, as a lot of the links are repeated (i.e. almost all pages include a link to https://en.wikipedia.org/wiki/South_Park ). Thus, for the sake of efficiency, we first generate a list of all unique URLs:

all_urls=[] for key in dict_sp.keys(): current_urls=dict_sp['links'] all_urls=np.unique([*all_urls, *current_urls])

For this list, we can start crawling. We notice that for all the public companies, Wikipedia has a VCard, containing a link to https://en.wikipedia.org/wiki/Public_company, but also the ticker. We’ll use this to check if our links refer to a public company, as well as to get the corresponding ticker. Using the page source, we find the class of the vcard table and define it as below, as well as the required match to the Public Company URL.

vcard_class = "infobox vcard" key_match = '/wiki/Public_company'

In a new dictionary, we will collect all the vcard information for all of our 10k + URLs (if they have one!).

dict_url = {} err = [] for wiki_url in all_urls: url = 'https://en.wikipedia.org'+wiki_url page = requests.get(url) soup = BeautifulSoup(page.content) try: dict_url=find_wiki_hrefs_in_soup(soup.find("table",class_=vcard_class)) except: err.append("No vcard for "+ wiki_url) time.sleep(0.1)

We end up with a list of 49 URLs, but some manual tinkering is required for the tickers, due to the very different formats used in the vcards, differences in ticker name across regions, over-the-counter securities, as well as for duplicate pages such as the below:

'https://en.wikipedia.org/wiki/ViacomCBS', 'https://en.wikipedia.org/wiki/Viacom_(2005%E2%80%93present)',

Final Results

Any South Park fan could name a few examples of situations in which Public Companies were mentioned in South Park episodes. For me, these would be the most notable three examples:

Handicar - Uber, Lyft, and Tesla

Unfulfilled - Amazon

Doubling Down - Beyond Meat

Doing a quick check, we see that we’ve correctly mapped all three examples above!

result = pd.DataFrame.from_dict(dict_sp_stocks, orient='Index')
result[result.isin(['Handicar','Doubling Down','Unfulfilled'])]

episode_title	episode_date	tickers
Handicar	2014-10-15	[‘LYFT’ ‘TSLA’ ‘UBER’]
Doubling Down	2017-11-08	[‘BYND’]
Unfulfilled	2018-12-05	[‘AMZN’]

Full results below:

episode_title	episode_date	tickers
The Wacky Molestation Adventure	2000-12-13	[‘DENN’]
Osama bin Laden Has Farty Pants	2001-11-07	[‘DIS’]
Red Man’s Greed	2003-04-30	[‘CPB’]
Something Wall-Mart This Way Comes	2004-11-03	[‘WMT’]
Make Love, Not Warcraft	2006-10-04	[‘BBY’]
Britney’s New Look	2008-03-19	[‘CHH’]
Over Logging	2008-04-16	[‘SBUX’]
Pandemic 2: The Startling	2008-10-29	[‘BBY’]
The Ring	2009-03-11	[‘DIS’]
Margaritaville	2009-03-25	[‘AXP’]
Dead Celebrities	2009-10-07	[‘CMG’ ‘MCD’ ‘TWTR’]
W.T.F.	2009-10-21	[‘WWE’]
The F Word	2009-11-04	[‘HOG’]
The Tale of Scrotie McBoogerballs	2010-03-24	[‘TWTR’]
It’s a Jersey Thing	2010-10-13	[‘TWTR’]
Mysterion Rises	2010-11-03	[‘NKE’]
Coon vs. Coon and Friends	2010-11-10	[‘NKE’]
Crème Fraîche	2010-11-17	[‘PGR’]
HumancentiPad	2011-04-27	[‘AAPL’ ‘BBY’]
T.M.I.	2011-05-18	[‘FDX’]
1%	2011-11-02	[‘RRGB’]
Raising the Bar	2012-10-03	[‘WMT’]
Insecurity	2012-10-10	[‘AMZN’ ‘UPS’]
A Scause for Applause	2012-10-31	[‘NKE’]
Obama Wins!	2012-11-07	[‘DIS’]
Black Friday	2013-11-13	[‘SNE’]
A Song of Ass and Fire	2013-11-20	[‘MSFT’ ‘SNE’]
Titties and Dragons	2013-12-04	[‘MSFT’ ‘RRGB’]
Handicar	2014-10-15	[‘LYFT’ ‘TSLA’ ‘UBER’]
Freemium Isn’t Free	2014-11-05	[‘TWTR’]
Grounded Vindaloop	2014-11-12	[‘BBY’]
Rehash	2014-12-03	[‘TWTR’]
HappyHolograms	2014-12-10	[‘TWTR’]
You’re Not Yelping	2015-10-14	[‘YELP’]
Skank Hunt	2016-09-21	[‘TWTR’]
The Damned	2016-09-28	[‘TWTR’]
White People Renovating Houses	2017-09-13	[‘TWTR’]
Franchise Prequel	2017-10-11	[‘NFLX’]
Sons a Witches	2017-10-25	[‘ROST’]
Doubling Down	2017-11-08	[‘BYND’]
Super Hard PCness	2017-11-29	[‘NFLX’]
Unfulfilled	2018-12-05	[‘AMZN’]
Bike Parade	2018-12-12	[‘AMZN’]
Band in China	2019-10-02	[‘AAPL’ ‘DIS’]
The Pandemic Special	2020-09-30	[‘BBW’ ‘VIAC’]

End notes

We’ve managed to meet our current goal, getting all the data that we needed for developing a strategy, in a fairly quick and simple way, thanks to “Requests” and “BeautifulSoup”. If you’re also interested in how this can be used further for developing the “South Park Stock Market” trading strategy and whether or not it performs better than the SP500 (Standard&Poor) index fund, jump over to this post.