Analyzing Disinformation at Scale

From DevSummit
Revision as of 17:32, 14 November 2019 by Josh (talk | contribs) (→‎Tools)
Jump to navigation Jump to search

Tracking Disinformation Online


List of tools:

  • University of Indiana is working on tools
  • Measure volume of bot traffic, and bot-score of a given account.
  • Mapping
  • Scraping and API tools
  • If you’re building things, encouraged to create. Co-Lab version so it can be used by others.

Interests in this topic / skill level

  • Currently working with a group that crawls government webpages for changes and writes reports when there are questionable changes. e.g. EPA removed “climate change”. Interested in other tools
  • Professional background in natural language processing. Fascism doesn’t kick it’s own ass.
  • California department of public health; population campaigns on massive scale. Vaccines and health related information. Some work in data but not expert.
  • How to refute things that are obvious misinformation. Looked at scams on IndieGogo and Kickstarter but not misinformation
  • Conflict between end-to-end encrypted systems and punching the nazis
  • Not a fascist and can write python and do OSINT
  • Interested in methodologies and how these things are tracked.
  • Company does outsourced nonprofit IT. Clients bump into disinformation a lot. Looking for coaching and how to train people. Also have had family members who have suffered.
  • Works in communications. Believes should be more informed as citizen.
  • Interested in vaccine misinformation; is a coder but doesn’t know how to start. And despite news about state-backed misinformation, the tools themselves aren’t discussed.
  • The way that we talk and think about privacy technology is super fucked up. It’s clear that technology requires governance.
  • Works on disinformation on the policy side and in elections.
  • Works in communications with activists in Latin America. Wants to bring those tools back to Latin American contexts.

Won’t cover here

Outside the scope of this conversation, despite organizations doing good work in these areas:

  • Fact checking
  • Civil discourse initiatives
  • Open Source Intelligence (OSINT): tools for digging through public data. Examples: Google Dorking, Maltego. Not scalable because they are low signal-to-noise and require lots of manual input and it’s easy to make mistakes, for example, when trying to identify Nazis.
  • Dox defense. Pay for DeleteMe if you have the money. It’s fine.


  • Doxing: taking people’s private information and posting it publicly for the intent of causing someone harm. For example, publishing an activist’s home address and amazon account to a right-wing forum where a leaderless group of people go about destroying their life.
  • Bot vs Organic accounts. Bot accounts are automated accounts that artificially like and share a central account’s posts in order to elevate and spread a message. Organic accounts are real people. It can be hard to identify bot accounts because they may delete all of their messages after an action.

Hot Take

Extremely top-down interventions are unable to deal with the next generation of threats which include state intervention and grassroots leaderless movements like 8chan. Because these actors are highly adaptive and a hierarchical org trying to adapt to those change will get run over.

To be adaptive, we need more people working on the problem. That means we need more accessible tools. Which will happen by getting people to use them and iteratively honing rough edges.

Currently working on

Start with a social platform like Gab, reddit, 4chan, twitter. Choose variables such as hashtag, account over time. Then choose from various forms of analyses. Then some kind of report/analysis.

Ingesting lots of different data from social platforms. Then building API to make it easy to get content out of it. e.g. all a different tweets from this person. Then built very simple analysis tools and a frontend for the website to make it more accessible.

Is worried that the tool will be abused. Still debating how open to make it. For example, don’t want to publish bot-detection algorithms because that could show bot developers how to avoid detection.

Is not building tools for censoring. Is building tools for activists and journalists.

These tools are reactionary, not necessarily preventative.


These things happen very fast. After every mass shooting, there will be a cluster of alternative news sites. One will write a story, then it will be shared and linked as if it was a true story. Then it will be legitimized by being covered by another more reputable outlook, then Breitbart, then Fox News, then President Trump.

Researchers working on this are Kate Starbird, Jeremy Blackburn.

Example audience is investigative journalists, activists who are trying to understand people. And scholars who are trying to understand an emerging field

Sex, hate clicks, and conspiracies tend to be very viral because they trigger reactions. Conspiracies always travel faster than the repudiation of the conspiracy. It’s rare to have a Snopes article go as viral as the thing it’s refuting. Refutations can sometimes feed into the conspiracy too; for example, Snopes is Soros funded.

Challenges and Questions

It’s not possible to scrape WhatsApp which means that it’s not possible to track misinformation easily. WhatsApp is a huge conspiracy vector in India and Latin America. They have a limit to how many people can be in a group/list, but it will be split and used to spread political and racist propaganda. There is a wikipedia page on WhatsApp murders.

Who is the target audience of this proposed set of tools? Is it designed for professional researchers or for casual users who are trying to refute conspiracies? Both, but that’s challenging because it might too casual for researchers with Institutional Review Boards.

A challenge of building democratized tools is that people don’t always understand the limitations of those tools.

  • For example, there is a community of people who use Gephi, which creates beautiful network maps. Those graphs are used in research, but they don’t disclose the assumptions and limitations that go into their creation. For example, one can place President Trump at the center of a conspiracy theory like QAnon, but that’s just a side-effect of everyone tweeting at the president.
  • Another example is a tool will scrape out links people post, but doesn’t provide any context for why they were posted. For example, a Jewish organization was sharing anti-semitic links, but one might assume that they were sharing them in a critical context. Another example is Richard Spencer and Southern Poverty Law Center having a high degree of overlap.

What happens when it’s adopted by bad actors? For example, Norwegian neo-Nazis created their own decentralized Scuttlebutt servers where they post awful stuff. They can be banned, but that may not be effective because of the nature of the technology.

Is this being built with a very particular set of users in mind? For example, EDGI is built for a specific group of researchers and journalists who are delivering a specific kind of product for publication. It’s important that journalists get something that is dry and direct that they can report upon without significant work and EDGI will produce those reports and hone it based on feedback from journalists to reduce effort to publish.

This analysis tool is being built for people and they are developing users personas and this session is user research too. Doesn’t want to automate what is looked at and instead wants to trust peoples curiosities and devotions.

User research

  • When wandering around Facebook and come across a post that’s bullshit. Wants to be able to plug some details from the post and get back info to refute it (who is this person? Who do they associated with? Who funds them?) Example tools: Influence network. Source funding. List of co-occurring hashtags to identify outlets that are manipulating that hashtag.
  • For academic research, it would be helpful to map the spread of health information and different categories and messages; which messages are more effective at disseminating misinformation e.g. This message was seen by X people on these networks.
  • How does one get to the truth? A paper came out saying it’s 100% NOT worthwhile to argue with anti-vaccers on the internet. But if someone asks a good faith question, there is value in engaging with them. Engaging with neo-nazis is not work the effort. Could do sentiment analysis.
  • What are examples these tools could do that are more active in realtime e.g. a bot one could sign up for and watch your own activities as a buddy over your shoulder?