The Consequence Eradicator™
Table of Contents
MHacks 11 (and the customary apologies)
I have been kind of terrible about keeping up this blog. Sorry about that. That said, a lot has happened since the last blog post almost a full year ago. For one thing, I am a college kid now!
I also had a great time as a tech intern for Capital One over the summer (something I'm definitely going to be writing about very soon).
But let's get back to this story.
MHacks 11 was held at the University of Michigan from October 12 - 14. I convinced my friend, David, to fly up to Ann Arbor all the way from Charlottesville for the event.
What follows is the story of how David (UVA '19), Renee (also U-M '22), and I did our parts in contributing to the eerie convergence between the Simpsons and reality.
Conrad: the Consequence Eradicator
You don't need to watch the whole video (though I highly encourage it). Basically, Lisa and a team of programmers write an artificial intelligence (Conrad) to predict the outcomes of social media posts.
Thusly inspired, we made a Chrome extension which predicts the reactions a post on Facebook would get.
Step 1: Data Collection
Fortunately for us, @minimaxir on Github had a relatively large dataset of public posts from large Facebook pages, and their reactions. The dataset is linked here. I only found out later that he also did our exact same project, but whatever.
Anyway, that was data collection.
Step 2: Data Preprocessing
We chose only to look at posts with text content (so we ignored shared links, photos, videos, etc.) We also only considered posts less than 1000 characters long, and which had more than 11 non-like reacts.
Step 3: Modelling
We spent most of Saturday doing preprocessing, and then eventually realized we still had to actually build a model. We started off by trying a random forest regression with bag of words. Results were not great.
After tweaking the hyperparameters around a bit, it became clear results weren't improving.
That's when we took the leap to using
doc2vec to generate sentence embeddings.
This meant that, instead of just using word frequency, we used a much more complex model to encode
posts which took into account word order.
This yielded much better results, especially after I also normalized the target output.
Step 4: Building the Extension
At this point, it was the morning of demos, and while we had a halfway decent model, the chrome extension was non-existent.
So came the mad scramble, in which I was literally writing code as we walked to the IM building. I wrote a small Flask API backend, which the Chrome extension was supposed to call.
We ran into a roadblock, where cross origin requests were forbidden. I tried to overcome this
by editing my
/etc/hosts file to redirect some ancillary Facebook domain to
almost worked, except Facebook also expected https-only requests, so I was stuck with having
to do the correct thing of using Chrome's messaging protocol.
By this point, I'm writing the code while we are standing at our demo table.
It turned out to actually be super easy to use the messaging protocol, but that didn't stop me from screwing up 11 times before I finally got everything working.
This was me when I finally got the Chrome extension to work.
Unfortunately, I have no screenshots of the actual app working, but if you want to run the janky code for yourself, it's all open source on Github.
I learned a lot through this hackathon. It was the first hackathon project that I did with a heavy data science workflow.