Paranoid Android, or How I Built a Twitter Bot and Lost a Little Bit of Faith In Humanity

The setup

I lied – this post isn’t going to be about how I built the Twitter bot. There are dozens of half-way decent tutorials scattered across the web and even a couple of decent ones and you’re welcome to exercise your DuckDuckGo-Fu or Google-Fu and figure it out.

Instead, this will be about my experience wading into the collective dumpster fire/garbage pit that is Twitter with some simple automation.

Now, don’t get me wrong – there’re a LOT of great things about Twitter. I primarily use it to keep up with others who are in to vintage/retro computing/gaming. Short, to the point, access to to-the-minute info all across the planet, easy access to your favorite <insert profession here>-ist. But, like its cousin Facebook, it’s still sometimes hard to escape the political aspects of it.

…and there’s no doubt it’s host to garbage. It’s literally the lowest common denominator of the Internet, rivaling the storied chans of yesteryear.

Why did I do it?

So, anyway, the bot. I’ve programmed a few Twitter bots over the years, starting about 2013-’14 or so. Usually just for fun or to report on sensor data from an Arduino project, that sort of thing.

I’ve been noticing lately that, to me, Twitter had taken a decidedly more negative tone overall. I started thinking about doing some sentiment analysis to see if that was truly the case or if my perspective has changed and I’m just looking at it differently than I used to. I’m no statistician but I thought I might be able to answer the question well enough for myself and that also, I could throw it into my toolbox for when I want to do a little bit of stock speculation or figure out if I’m interacting with a bot-net since they’ve definitely become more sophisticated the last couple of years.

I spent the first couple of days on my Christmas break getting familiar with the Twitter streaming API and accessing it via Tweepy. Then I started putting the pieces together to pull tweets from the real-time API triggered by certain words/names/phrases – I think my first word set was ‘Hillary Clinton‘, ‘Donald Trump‘, ‘Michael Flynn‘, ‘Mattis‘ and ‘Syria‘ because those were what was hot in the news. I set it loose collecting tweets in real-time based off those words, used TextBlob to assign a subjectivity/polarity rating to each tweet based on a regular dictionary (so my own biases would be minimized), then added that data, plus some Twitter-provided metadata to a MySQL database to save on space and speed up later analysis.

Came back 24 hours later, checked the database and had 987,000 tweets. Jesus H Christ.

I realized I could fork paths at this point and choose my own adventure: start reading and refreshing my statistics brain to figure out the most appropriate analysis to run on the tweets associated with those terms; programming the actual, y’know, sentiment analysis portion of the program; or use the scraper to have a little fun.

Fuck it, I’m on vacation, let’s have some fun.

I noticed a couple of things in the 987K tweets I’d already collected, namely that there were a lot of tweets with identical text that were not retweets (which to me either indicate a complete lack of originality or, more likely, bot-nets) and that there were a literal metric shit-ton of either political dog whistles or fecal-flecked dehumanizing terms being flung about in these tweets.

I don’t really consider myself a SJW, but I do tend to speak up if something strikes me as not humane, ethical or right. I thought, “Man, it’d be fun to throw some of this garbage back at them and maybe open their eyes a bit.” That’s assuming the account on the other side is operated by a human, right? If not, it’d be fun to drown out a competing propaganda bot.

So, I did that. I reprogrammed the bot to read streaming tweets, separate those out with common right-wing political dog whistles (you can see a sample list of terms in the References below), replace the dog whistle text with the implied terms, then tweet the revised text back to the original poster and whomever they @’d.

And, disclaimer: don’t get me wrong, I had fun with it which means some terms are not necessarily 1-to-1 replacements.

What did I learn?

  • That it worked great. By “worked great,” I mean my bot account got shut down about 4-5 times within minutes after switching it on. I was sympathetic to plight of the users I was @’ing back – I mean, technically, I was violating Twitter ToS by @’ing the retweeted accounts without them @’ing me first. No big deal though, I made some changes to update rate limits, stripping the Twitter handles from the revised texts, and a couple of other minor annoyances and got activated again each time. This helped to minimize any offense to their pearl clutching sensibilities.
  • In line with that, I’m going out on a limb to make the assumption that it appears that people do not like to read their own words when you strip out all the dehumanizing, dog whistle, jingoistic garbage that they blithely post.
  • I had an odd error that kept on killing my bot. From a technical perspective, it’s really easy to determine by the errors thrown when you’ve been rate-limited by the API and they’ve booted you or when they’ve suspended your write access because of complaints and you can’t tweet anything out – I can code for those and work around them. The one I’m writing about took a bit of sleuthing – it turns out the accounts that typically post this garbage are fond of emojis in their usernames and/or bios (all publicly available data that was being collected by my bot and seemed like it may come in handy for sentiment analysis). Emojis are encoded in 4 bytes rather than 1 or 2 bytes like you typically see with most text characters so it was causing a couple of my string slicing functions to choke. Once I adjusted the modules I was using to process the text, I was golden.
    • Side note: I have a hunch there’s a huge correlation between the number of 4 byte encoded characters associated with a Twitter account and the likelihood that it’s spouting out fringe content, regardless of left vs. right sensibilities of the content, and/or it’s a moderated/bot account masquerading as a real person.
  • Twitter is quick – QUICK – to ban individual bot accounts if complaints are received.
    • Despite mainstream misconceptions, bots are not against the site’s ToS – in fact, they’re useful and help get real-time data spread for all sorts of admirable reasons. However, if the bots are part of a bot-net, in my experience Twitter Support is horribly slow to review. I don’t know if this is due to the political leanings of the company or the reviewers – I’d like to think not – or the huge number of reports I’m sure they receive, or what, but based on the number of duplicate, non-retweeted, tweets I saw come across my screen, there is definitely some coordinated bot-net activity going on here with regard to these particular terms. Yet, you can report them and they stay up with zero break in service.
    • Reviewing the analytics for my bot account, I wonder if the bot-nets themselves are being directed to report it based on the huge number of impressions some of the tweets receive. In other words, if 1K bots from a bot-net are filling complaints using headless Selenium to bypass the API or something, rather than 1-2 individual users, it’d cause my account to get taken down sooner because it’d appear a huge numbers of real people were making the complaints. Another thing worth experimenting with in the future in any case.
  • The dog whistle bot is dumb and just replaces x with y, but only if x is spelled according to a dict list I manually compiled. But, y’know, the things it analyzes are dumb too. I never knew there were so many different ways to misspell common 4th or 5th grade level words. Should have known, but never stopped to think about it. Education should be a bigger focus in this country, fer sher.
  • People will make huge cognitive leaps to justify mistreatment of others. As an example, the below screen shot is one of the revised tweets I sent out – implying that “the wall (TM) ” should be put up for humanitarian reasons to funnel immigrants to designated choke points because incoming migrants are too stupid to know where to go (bot replaced text is bolded).
    • Original text: Some_handle Some_handle – Stop spreading idiotic propaganda. No one has ever proposed a barrier across the entire border. If for one moment, you people would think about the uneducated illegals who think remote areas are the only place to cross. A wall could save lives, guide them to areas with help.
    • Revised text: Some_handle Some_handle – Stop spreading idiotic propaganda. No one has ever proposed a barrier across the entire border. If for one moment, you people would think about the uneducated economic/humanitarian-crisis refugees who think remote areas are the only place to cross. A wall could save lives, guide them to areas with help.
Screenshot of bot output

In closing

To get back to my headline: It’s easy to see how this type of activity can be weaponized. Some would roll their eyes at the notion, but if you have a couple million, hell, couple hundred thousand, bot accounts tweeting out this stuff on a regular schedule, that’s enough to get the intentions behind the tweets inserted into the public dialogue – from there it all takes on a life of its own.

We see and hear it every day now.

Cue faith lost soundtrack – something by Enya sounds nice.

References

  1. Dog Whistle Bot – I only run her sporadically, but when I do, new tweets are spit out every few minutes.
  2. Dog whistle terms and their replacements (not all of them are in the dict at one time and I infrequently rotate them out):
  • illegals / illegal alien(s) -> economic/humanitarian-crisis refugees
  • state(‘)s(‘) rights -> institutionalized hate
  • deep state -> Rule of Law
  • secure our borders -> keep brown people out
  • Barack Hussein Obama -> Barack “Scary Sounding Middle Name” Obama
  • witch hunt -> fair and impartial investigation
  • Crooked Hillary/Hillery / Killery/Kilery/Killary/Kilary -> the Legitimate 45th POTUS
  • libs /leftist(s) -> people I disagree with
  • RINOs -> Republicans I disagree with
  • shut down the government -> screw over veterans
  • #MAGA -> #IAmACollaborator
  • #BuildTheWall / #BuildThatWall -> #IPayNoTaxes
  • #adenochrome -> #IEatLeadPaintChips
  • #WWG1WGA -> #IncelsUnite
  • #WalkAway -> #RussianBot

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d