The Consequence Eradicator™

Table of Contents

MHacks 11 (and the cus­tom­ary apolo­gies)

I have been kind of ter­ri­ble about keep­ing up this blog. Sorry about that. That said, a lot has hap­pened since the last blog post al­most a full year ago. For one thing, I am a col­lege kid now!

Go Blue!

I also had a great time as a tech in­tern for Capital One over the sum­mer (something I’m definitely go­ing to be writ­ing about very soon).

But let’s get back to this story.

MHacks 11 was held at the University of Michigan from October 12 - 14. I con­vinced my friend, David, to fly up to Ann Arbor all the way from Charlottesville for the event.

What fol­lows is the story of how David (UVA 19), Renee (also U-M 22), and I did our parts in con­tribut­ing to the eerie con­ver­gence be­tween the Simpsons and re­al­ity.

Conrad: the Consequence Eradicator

You don’t need to watch the whole video (though I highly en­cour­age it). Basically, Lisa and a team of pro­gram­mers write an ar­ti­fi­cial in­tel­li­gence (Conrad) to pre­dict the out­comes of so­cial me­dia posts.

Thusly in­spired, we made a Chrome ex­ten­sion which pre­dicts the re­ac­tions a post on Facebook would get.

Step 1: Data Collection

Fortunately for us, @minimaxir on Github had a rel­a­tively large dataset of pub­lic posts from large Facebook pages, and their re­ac­tions. The dataset is linked here. I only found out later that he also did our ex­act same pro­ject, but what­ever.

Anyway, that was data col­lec­tion.

Step 2: Data Preprocessing

We chose only to look at posts with text con­tent (so we ig­nored shared links, pho­tos, videos, etc.) We also only con­sid­ered posts less than 1000 char­ac­ters long, and which had more than 11 non-like re­acts.

Step 3: Modelling

We spent most of Saturday do­ing pre­pro­cess­ing, and then even­tu­ally re­al­ized we still had to ac­tu­ally build a model. We started off by try­ing a ran­dom for­est re­gres­sion with bag of words. Results were not great.

After tweak­ing the hy­per­pa­ra­me­ters around a bit, it be­came clear re­sults weren’t im­prov­ing.

That’s when we took the leap to us­ing gensim and doc2vec to gen­er­ate sen­tence em­bed­dings. This meant that, in­stead of just us­ing word fre­quency, we used a much more com­plex model to en­code posts which took into ac­count word or­der.

This yielded much bet­ter re­sults, es­pe­cially af­ter I also nor­mal­ized the tar­get out­put.

Step 4: Building the Extension

At this point, it was the morn­ing of demos, and while we had a halfway de­cent model, the chrome ex­ten­sion was non-ex­is­tent.

So came the mad scram­ble, in which I was lit­er­ally writ­ing code as we walked to the IM build­ing. I wrote a small Flask API back­end, which the Chrome ex­ten­sion was sup­posed to call.

Initially, I did­n’t want to deal with the Chrome mes­sag­ing pro­to­col, so I tried to make the call di­rectly from the con­tent script (which ran as Javascript code on the client with no ex­tra per­mis­sions).

We ran into a road­block, where cross ori­gin re­quests were for­bid­den. I tried to over­come this by edit­ing my /etc/hosts file to redi­rect some an­cil­lary Facebook do­main to localhost. This al­most worked, ex­cept Facebook also ex­pected https-only re­quests, so I was stuck with hav­ing to do the cor­rect thing of us­ing Chrome’s mes­sag­ing pro­to­col.

By this point, I’m writ­ing the code while we are stand­ing at our demo table.

It turned out to ac­tu­ally be su­per easy to use the mes­sag­ing pro­to­col, but that did­n’t stop me from screw­ing up 11 times be­fore I fi­nally got every­thing work­ing.

Literally Me Celebrating

This was me when I fi­nally got the Chrome ex­ten­sion to work.

Wrap-Up

Unfortunately, I have no screen­shots of the ac­tual app work­ing, but if you want to run the janky code for your­self, it’s all open source on Github.

I learned a lot through this hackathon. It was the first hackathon pro­ject that I did with a heavy data sci­ence work­flow.

This hackathon was a blast, and David and Renee were awe­some team­mates.