Uncovering fake news bots

May 15, 2019

Having the right information at the right time can make you rich or save your life, but in today’s grim reality, the information field all around the globe is poisoned with fake news. Nowadays, people are making their careers producing false information and spreading it online. But even that is not enough: In addition to real people, this industry also employs thousands of social media bots to maximize effect. At this year’s Security Analyst Summit, researchers from Recorded Future gave a talk on ways to expose those bots.

Why fake news bots are still a thing

Nobody likes bots, but social media companies really hate them, because bots make social networks less attractive to real people. For example, Twitter periodically identifies massive numbers of bots and banishes them (leaving real people whining about losing followers). That means social networks have their own ways of detecting bots. But their efforts are not enough to wipe the bots out completely.

Social networks don’t disclose their algorithms, but it’s safe to say their effort is based on detecting abnormal behavior. The most obvious example: If an account tries to post a hundred posts a minute, that’s certainly a bot. Or, say, an account only retweets stuff from other accounts and never posts anything on its own, that’s also most likely a bot.

But the creators of bots are constantly learning to modify their bots so that they can bypass social media services’ techniques. And social media services cannot afford to have too many false positives; mistakenly banning a lot of real people would cause outrage, so they have to be cautious. That means a certain number of bots go undetected.

To dig deeper into how the bots behave, Recorded Future chose a characteristic to highlight a certain group of bots — in this case, talking about terror events that are mentioned only on Twitter. If an account does that, it’s probably a bot (or it is retweeting a bot). Now let’s take a look at what else these accounts have in common.

How fake news bots behave

First of all, the terror events these accounts were mentioning actually happened, and the articles about them were hosted on somewhat respectable websites (websites that did not demonstrably spread fake news). One small but important detail: The events happened years ago, something the accounts did not mention. Linking to respectable media keeps Twitter’s bot-detection algorithms placated, and that is why the mastermind behind the bots chose that strategy.

Second, in the case of this particular bot network, the account owners pretended to be based in the US but were talking mostly about European countries. Having this information allowed Recorded Future to identify more than 200 accounts that bore that similarity and dig deeper into the other similarities and connections between them.

For example, researchers drew an activity pattern and realized that a lot of those bots were active only during certain coinciding periods of time. Some of the accounts were banned this May, but then new ones with the same behavior were created — and they are still operational.

Another similarity is that all of the accounts relied on a number of URL shorteners to post their not-entirely-fake news. The URL shorteners were used to provide those behind the bots with some analytics — how many times each of the links was clicked, for example. They were not the usual shorteners people would normally use, like t.co or goo.gl, but some nonpublic ones created for the sole purpose of gathering analytics. By the way, all of these shorteners have a surprisingly similar orange and white minimalistic design. Use of these shorteners could also link those accounts to each other.

WHOIS data for the websites of these shorteners shows that all of them are hosted on the Microsoft Azure cloud platform and registered anonymously. Coincidence? Probably not. More similarities exist among the accounts, although of course the campaigns differ. But in general, examining one bot account, finding peculiarities, and then searching for other accounts with the same peculiarities is an effective way to expose bot networks.

The fake news bot checklist

We’ve prepared a small list of features typical of bots. Accounts used in one network or campaign usually have several of these features in common. So, accounts on the same social media site are probably bots if they:

  • Have similarities in handles or names;
  • Were created on exactly the same date;
  • Post links to the same sites;
  • Use the same phrasing;
  • Make the same grammar mistakes;
  • Follow each other or another similar account;
  • Use the same tools such as URL shorteners;
  • Are active only during certain coinciding periods of time;
  • Have similarities in bios;
  • Use generic images or faces of other people (easily searchable on Google) as avatars.

Of course, that doesn’t mean just because several accounts have one similarity, they should be considered bots. Certainly not. But if there is more than one (or, to avoid false positives, more than four or five) such similarities, then there’s a high probability that you’re looking at yet another social media bot network.

Tinker, tailor, soldier, bot

The Recorded Future team’s research shows that using behavioral analysis can still work for identifying bots. Researchers find a couple of bots, see something special in their behavior, and then search for other accounts that behave the same way. That helps them to identify other bots and to find more similarities that they can add to their search criteria to find other accounts used in adjacent campaigns.

This, of course, is ongoing (and probably never-ending) work, because new bots appear every day, and they have their own behavioral patterns. One cannot identify all bots with just one set of behavioral rules, but using behavioral analysis at least helps identify all parts of certain bot networks — and helps social media sites take them down, making social media a better place for humans.

And of course, everybody on social media should be aware that bots exist, are plentiful, and shouldn’t be trusted.