Analyzing 'South Park' for Stock Market Strategies - The Data

Is a South Park Stock Market trading strategy viable?

In the following, we will use Python to get a list of all the mentions of public companies mentioned in South Park episodes. We’ll scrape Wikipedia for this purpose.

Following this, we’ll try to use this “alt-data” to generate a South Park Stock Market trading strategy.

Step 1: Get a list of all the individual episode wikis

We can see that the full list of episodes is available on this page, split between a few different tables, over here: https://en.wikipedia.org/wiki/List_of_South_Park_episodes.

While manually taking every link is an option, it’s an excruciating and time-consuming one. Luckily, we do have a better way of doing this!

Using the requests library, alongside BeautifulSoup, we can extract all the info on the URL:

url = 'https://en.wikipedia.org/wiki/List_of_South_Park_episodes' page = requests.get(url) soup = BeautifulSoup(page.content)

However, our “soup” contains a lot more than we need, so let’s try and find our specific info.

Looking at the page source, we notice that our desired information is situated in blocks with the following specifics:

table_class = "wikitable plainrowheaders wikiepisodetable" span_class = "bday dtstart published updated"

Using the information above, and a bit more tweaking, we arrive at a final code that loops through the tables in the page that have the appropriate class, and through their rows. The information is then gathered in a dictionary.

dict_sp={} i=1 for table in soup.find_all("table", class_=table_class): for tr in table.find_all('tr', class_="vevent"): if 'id' in tr.find('th').attrs.keys(): dict_sp={'date':tr.find('span',class_=span_class).contents, 'episode_id':tr.find('th').attrs['id'], 'episode_wiki_link':tr.find('a', href=True, title=True).attrs['href'], 'episode_title':tr.find('a', href=True, title=True).attrs['title']} i=i+1

An entry in our resulting dict looks like this:

{'date': '1997-08-13', 'episode_id': 'ep1', 'episode_wiki_link': '/wiki/Cartman_Gets_an_Anal_Probe', 'episode_title': 'Cartman Gets an Anal Probe'}

We first define a helper function:

def find_wiki_hrefs_in_soup(soup): wiki_hrefs=[] for elem in soup.find_all("a", href=True, title=True): elem_href=elem.attrs['href'] if elem_href.split("/")=='wiki': wiki_hrefs.append(elem_href) return np.unique(wiki_hrefs)

Using it, we go through each item of our dictionary, apply the function, and append the new info to our dictionary. We also add a small sleep in between calls, so that we don’t overwhelm our environment and also so that we don’t get blocked by Wikipedia.

for key in dict_sp.keys(): url_ep = 'https://en.wikipedia.org/'+dict_sp['episode_wiki_link'] page_ep = requests.get(url_ep) soup_ep = BeautifulSoup(page_ep.content) dict_sp['links']=find_wiki_hrefs_in_soup(soup_ep) time.sleep(0.1)

Now, are dictionary looks like this (when converted to a Pandas DataFrame):

While the most straightforward approach would be to start crawling through all the links in all the episodes, this would not be ideal, as a lot of the links are repeated (i.e. almost all pages include a link to https://en.wikipedia.org/wiki/South_Park ). Thus, for the sake of efficiency, we first generate a list of all unique URLs:

all_urls=[] for key in dict_sp.keys(): current_urls=dict_sp['links'] all_urls=np.unique([*all_urls, *current_urls])

For this list, we can start crawling. We notice that for all the public companies, Wikipedia has a VCard, containing a link to https://en.wikipedia.org/wiki/Public_company, but also the ticker. We’ll use this to check if our links refer to a public company, as well as to get the corresponding ticker. Using the page source, we find the class of the vcard table and define it as below, as well as the required match to the Public Company URL.

vcard_class = "infobox vcard" key_match = '/wiki/Public_company'

In a new dictionary, we will collect all the vcard information for all of our 10k + URLs (if they have one!).

dict_url = {} err = [] for wiki_url in all_urls: url = 'https://en.wikipedia.org'+wiki_url page = requests.get(url) soup = BeautifulSoup(page.content) try: dict_url=find_wiki_hrefs_in_soup(soup.find("table",class_=vcard_class)) except: err.append("No vcard for "+ wiki_url) time.sleep(0.1)

We end up with a list of 49 URLs, but some manual tinkering is required for the tickers, due to the very different formats used in the vcards, differences in ticker name across regions, over-the-counter securities, as well as for duplicate pages such as the below:

'https://en.wikipedia.org/wiki/ViacomCBS', 'https://en.wikipedia.org/wiki/Viacom_(2005%E2%80%93present)',

Final Results

Any South Park fan could name a few examples of situations in which Public Companies were mentioned in South Park episodes. For me, these would be the most notable three examples:

Handicar - Uber, Lyft, and Tesla

Unfulfilled - Amazon

Doubling Down - Beyond Meat

Doing a quick check, we see that we’ve correctly mapped all three examples above!

result = pd.DataFrame.from_dict(dict_sp_stocks, orient='Index')
result[result.isin(['Handicar','Doubling Down','Unfulfilled'])]
episode_titleepisode_datetickers
Handicar2014-10-15[‘LYFT’ ‘TSLA’ ‘UBER’]
Doubling Down2017-11-08[‘BYND’]
Unfulfilled2018-12-05[‘AMZN’]

Full results below:

episode_titleepisode_datetickers
The Wacky Molestation Adventure2000-12-13[‘DENN’]
Osama bin Laden Has Farty Pants2001-11-07[‘DIS’]
Red Man’s Greed2003-04-30[‘CPB’]
Something Wall-Mart This Way Comes2004-11-03[‘WMT’]
Make Love, Not Warcraft2006-10-04[‘BBY’]
Britney’s New Look2008-03-19[‘CHH’]
Over Logging2008-04-16[‘SBUX’]
Pandemic 2: The Startling2008-10-29[‘BBY’]
The Ring2009-03-11[‘DIS’]
Margaritaville2009-03-25[‘AXP’]
Dead Celebrities2009-10-07[‘CMG’ ‘MCD’ ‘TWTR’]
W.T.F.2009-10-21[‘WWE’]
The F Word2009-11-04[‘HOG’]
The Tale of Scrotie McBoogerballs2010-03-24[‘TWTR’]
It’s a Jersey Thing2010-10-13[‘TWTR’]
Mysterion Rises2010-11-03[‘NKE’]
Coon vs. Coon and Friends2010-11-10[‘NKE’]
Crème Fraîche2010-11-17[‘PGR’]
HumancentiPad2011-04-27[‘AAPL’ ‘BBY’]
T.M.I.2011-05-18[‘FDX’]
1%2011-11-02[‘RRGB’]
Raising the Bar2012-10-03[‘WMT’]
Insecurity2012-10-10[‘AMZN’ ‘UPS’]
A Scause for Applause2012-10-31[‘NKE’]
Obama Wins!2012-11-07[‘DIS’]
Black Friday2013-11-13[‘SNE’]
A Song of Ass and Fire2013-11-20[‘MSFT’ ‘SNE’]
Titties and Dragons2013-12-04[‘MSFT’ ‘RRGB’]
Handicar2014-10-15[‘LYFT’ ‘TSLA’ ‘UBER’]
Freemium Isn’t Free2014-11-05[‘TWTR’]
Grounded Vindaloop2014-11-12[‘BBY’]
Rehash2014-12-03[‘TWTR’]
HappyHolograms2014-12-10[‘TWTR’]
You’re Not Yelping2015-10-14[‘YELP’]
Skank Hunt2016-09-21[‘TWTR’]
The Damned2016-09-28[‘TWTR’]
White People Renovating Houses2017-09-13[‘TWTR’]
Franchise Prequel2017-10-11[‘NFLX’]
Sons a Witches2017-10-25[‘ROST’]
Doubling Down2017-11-08[‘BYND’]
Super Hard PCness2017-11-29[‘NFLX’]
Unfulfilled2018-12-05[‘AMZN’]
Bike Parade2018-12-12[‘AMZN’]
Band in China2019-10-02[‘AAPL’ ‘DIS’]
The Pandemic Special2020-09-30[‘BBW’ ‘VIAC’]

End notes

We’ve managed to meet our current goal, getting all the data that we needed for developing a strategy, in a fairly quick and simple way, thanks to “Requests” and “BeautifulSoup”. If you’re also interested in how this can be used further for developing the “South Park Stock Market” trading strategy and whether or not it performs better than the SP500 (Standard&Poor) index fund, jump over to this post.