The Science of Disinformation on Social Media

Published Mar 30, 2021

Social media analyst Erin McAweeney pulls back the curtain on how disinformation spreads across Facebook and Twitter.

Guest byline

Erin McAweeney is a senior analyst at Graphika, focusing on online social movements, election integrity, and health misinformation. Her recent work uses open-source social media data to map the global spread of mis- and disinformation related to the Covid-19 pandemic and vaccines.

In this episode

Colleen and Erin talk about:

  • how we can use science to track disinformation on social media
  • how anti-vaccination, QAnon, and climate denial clusters infiltrate each other
  • what story the data can tell
Timing and cues

Opener (0:00-0:30)
Intro (0:30-3:21)
Interview part 1 (3:21-12:41)
Break (12:41-13:39)
Interview part 2 (13:39-24:15)
Segment throw (24:15-25:13)
Ending segment (25:13-28:24)
Outro (28:24-29:00)

Related links

Vaccine FAQ segment: Casey Kalman
Editing: Colleen MacDonald
Editing and music: Brian Middleton
Research and writing: Jiayu Liang and Pamela Worth
Executive producer: Rich Hayes
Host: Colleen MacDonald

Full transcript

Colleen: On March 16th, 8 people, 6 of whom were Asian women, were killed in a shooting spree across 3 Asian-owned spas around Atlanta. These attacks, which come at a time when COVID-19 disinformation has already inflamed anti-Asian hate crimes, are fueled by the United States’ longstanding legacy of systemic racism.

Our goal at Got Science? is to show how science can make the world a better place. That means we must acknowledge that Black, Indigenous, People of Color have been harmed, and systemic racism has been upheld, in the name of science.

One particularly egregious example is the Tuskegee experiment, which ran from 1932 to 1972. Researchers wanting to study the progression of syphilis recruited hundreds of Black men who were not informed about the true purpose of the study. They were given placebo treatments and became subject to medical experimentation disguised as “free medical care.” 15 years into the study, penicillin had become widely available as a safe and effective treatment for syphilis—but the study participants with the disease were not treated.

This infamous ethical breach has created an understandable distrust of public health officials in Black communities. But there are bad actors online who have nothing to contribute to conversations about racism in public health—who only seek to exploit this distrust on social media. Today the hashtag #TuskegeeExperiments is being used to spread fear and disinformation among black communities surrounding vaccines for COVID-19. During a pandemic that already disproportionately impacts Black people, this targeted disinformation can do a lot of damage.

But I’m getting a little ahead of myself. Our guest, Erin McAweeney, can explain what's happening much better than I can. Erin is a senior research analyst at Graphika, a company that does social media analysis. She studies data from online conversations to understand how conversations get manipulated and disinformation spreads.

She explains how the anti-vax community uses #TuskegeeExperiments and other methods to target Black communities online and spread disinformation. And Erin’s social media disinformation research isn’t limited to vaccines. She’s also studying the rise of QAnon, wildfire conspiracies, and climate change deniers.

I ask Erin what kinds of stories the data can tell, what it looks like when disinformation spreads into vulnerable communities, and why she still believes social media is a platform worth protecting.

**Colleen:**Erin, thanks for joining me on the podcast.

Erin: Thanks for having me. I'm excited to be here.

Colleen: So, disinformation is rampant on social media. I think that's a fair statement to make. And you work at a company that does social media analysis. Can you tell me what that is and, kind of, how you go about doing that?

Erin: Yeah. So, I work at a company called Graphika. We are a network analysis firm. And what that means is we build out maps or landscapes of a conversation online. And we use these maps to detect attempts to manipulate online conversations. And this manipulation can be foreign interference in democratic processes, fringe conspiracies becoming more mainstream, or detecting how health misinformation might spread through a network into vulnerable populations. And we do this by, first of all, collecting a lot of social media data. This tends to be Twitter data. And we will build a network based on shared interests, shared behaviors, and shared followers between accounts. And so, you might have seen a network graph before. And sometimes it'll just look like a hairball. And it's kind of chaotic and it's hard to tell what's going on.

And our graphs instead are clustered, not just based on content sharing. So sometimes you'll see a network graph that is based on who's retweeting who, and that can end up looking, again, like that sort of hairball. We will create network graphs based on similar interests, and behaviors, and follows. And so, you start to get these very well-defined clusters of accounts. You know, if we're talking about health misinformation, that can be accounts that are following similar influencers that tend to spread COVID-19 mis- and disinformation, or you'll get a set of accounts that are commonly sharing a particular source of anti-vaccine articles.

Colleen: Let's use an actual example to, illustrate how this happens. And I think anti-vaccination misinformation is so prevalent right now. Tell me how you would go about analyzing those conversations.

Erin: Yeah, definitely. So, I would take a map, like I just described, that Graphika will build. That map will be scoped around hashtags that are common to the anti-vaccine conversation in the COVID-19 context. That will be, for instance, Bill Gates' bioweapon or mandatory vaccination. So, we collect accounts that are engaging with that conversation through these hashtags. A map will be built off of that, and then I can start to identify, you know, within this network of the anti-vaccine conversation that we've built, we'll start to identify various clusters of anti-vaccination accounts. So what we can think of as the anti-vaccination community online.

And once we have those clusters identified, that community identified, we can really start to explore, first of all, from a network standpoint. So, I would first look at the structure of that cluster. That could be influencers within that cluster. Who is central to that cluster? That's based on who has the most followers within that cluster, maybe who is bridging, what accounts are bridging from one cluster to another. So, for example, if we're worried about content spreading from the anti-vaccination group into, say, a black community or a cluster of healthcare workers, these are communities that are vulnerable to anti-vaccination rhetoric. There are certain accounts that we can see that help bridge that content and facilitate that flow of information between that problematic community, that anti-vax community, to that vulnerable population.

Colleen: So how would you see that happening? Would you then see in those vulnerable communities, that hashtag being used?

Erin: Yeah, exactly. This can be based on content ranging from articles being shared, again, to certain hashtags that might be associated with a campaign. Since we have identified accounts as bridges, we'll start to closely monitor those accounts and pay special attention to those accounts that might be, if they're using certain hashtags that are targeting say, the black community. You know, hashtag Tuskegee experiments is one that is a well-worn example and one that we come across a lot, that is commonly used to spread fear and misinformation, targeting the black community surrounding vaccination. It could be articles that are targeting that community. You know, and we're able to... Because we have these communities outlined and then clustered, we can start to pay attention to the types of content that is flowing through those different communities. So if there is an uptick in articles from a problematic domain or an uptick in a hashtag that might be carrying over from that anti-vax group, then we consider that to be a spreading of anti-vax rhetoric into adjacent communities.

Colleen: What sort of story does the data tell?

Erin: You know, it tells a different story every time. I worked on a project with the Labs group. They work on more theory and then applied science to help researchers and analysts answer questions. So I worked with that group to look at the spread and convergence of QAnon throughout the summer with groups in our COVID-19 map. There was a lot of reporting at that time of how the pandemic helped to accelerate membership to QAnon, the conspiracy group, and there was a huge uptake of both use of language and hashtags associated with QAnon that happened online during the summer. This was when the Black Lives Matter protests were going on. This is when there was a lot of rhetoric and disinformation about Antifa going on. And also this is when misinformation, and disinformation, and conspiracies concerning the pandemic continued to spread.

And so there was a lot of reporting and researchers that had noted this major increase in QAnon-related content being shared online. So I worked with researchers in the Graphika Labs group to track how... to really see on a network level, how did this happen? How did QAnon within the context of COVID-19, how did it spread? Was it from a network perspective? Where did it start? Where is it now within the COVID-19 conversation? And when I say “it,” I mean the cluster of QAnon accounts are essentially the QAnon community that we have mapped online. So we used a series of COVID-19 maps that were created every month since the beginning of the pandemic. And we were able to look at six months of data just around who was talking about COVID-19 and what the networks were each month. And you could see this fringe group of QAnon accounts. Before, back in February and March, when we had our first COVID-19 maps, we even saw that there was a small cluster within the larger Trump support group. Over time, this fringe group came out of that Trump support cluster and it became its own group that started to become increasingly more central to that network.

And how we did this was a mix of, again, network analysis, natural language processing, and a method called cultural bridging. And using that natural language processing, we saw a huge uptake of language related to QAnon, not only in that growing group, but also in those adjacent groups that were within that COVID-19 map. And this really happened right around when the "Plandemic" documentary was released. And this is something that many researchers and reporters have theorized accelerated, helped accelerate, conspiratorial thought and content being shared. And we could see that in our maps in the data that was produced from this natural language processing method, we could see a huge increase in QAnon language and, again, most worrisome, not just within the group itself, but we could see it spreading throughout adjacent groups within the COVID-19 map. So it's a story but it is an unsettling story, which tends to be the stories that we see at Graphika in this line of work.


Colleen: How can you tell if an account is real or fake, and how do bots play into this?

Erin: So, networks can tell us a lot about potentially coordinated or fake behavior among fake accounts. Sometimes we'll come across a tight-knit cluster that may be unusual for a network, and that will lead us to a set of accounts to further investigate, whether we see behaviors that might lead us to believe that an account is fake. For example, all of the accounts within this cluster are created on the same day, or they might all have the same profile photo, or they may only have, say, a few friends, and those few friends are all within that tight-knit cluster, or they're sharing what we call copypasta messages. So clearly just the same sentence or sentences that are copy and pasted between one account and another pushing out and trying to amplify a similar message.

So, one of those things taken individually cannot immediately identify whether something's a bot or whether something's a fake account. But taken together, you know, through an investigative process and through investigative methods, we can start to paint a picture and have a better confidence around whether an account or a set of accounts is fake, has a malicious intent, or is trying to amplify misinformation or disinformation. It's hard to differentiate, I will say, between a normal conversation, because normal online conversations are bizarre. And we can't just assume that a bizarre conversation must be malicious or must be a troll. So there will always be this element of mixed methods of like having a human that can manually investigate, that goes into identifying perhaps a botnet or a set of paid bot accounts, or paid fake accounts run by a few people. As much as I would love to have the bot button that I can just hit and it will light up all of the bots, we can't do that. And there will always be that downside to over-quantifying really messy online human behavior.

Colleen: Do you have a sense of what percentage of bots are out there compared to real accounts? I'm trying to understand how serious the bot problem is.

Erin: You know, I've seen estimates around specific conversations. There have been papers that have, of course, come out around how many bots are in the climate denial conversation? How many bots are in the anti-vaccine conversation? But again, I kind of take those estimates with a grain of salt, because going through 100,000 accounts by hand to check whether it appears that there's coordination across those accounts just really is almost impossible to do, to do it thoroughly and to do it well. And again, there's downsides to doing that. We have seen people who aren't first language English speakers because maybe there is some, like, semantic variation in how they're typing. They might be identified as a bot or just weird. If someone is tweeting over a certain volume per day, that might be totally authentic. There's no, like, hard and set threshold of how much somebody can be tweeting before they're identified as a bot. And so, I think taking the care to manually go through and use these investigative processes to identify to a certain confidence level, whether what we see is fake or inauthentic or coordinated, protects everybody on the internet. I think it protects people from being maybe wrongfully de-platformed that are a part of a genuine grassroots cause. And that is the opposite of what we're trying to do.

Colleen: You've done a lot of work looking at climate disinformation across networks. Can you tell me about that work and what you're seeing?

Erin: Yes. So, for a year now, we've worked with a coalition of groups, the Union of Concerned Scientists being one of them. And we have mapped the climate conversation landscape. This includes clusters of climate deniers, and it includes groups of pro-climate science and pro-environmental groups, organizations, and individuals that are uniquely interested in the climate conversation. So, on the climate denial front, I will say what's unique to this group is that it is such a small group. And it really appears that their main objective is to make it seem like there's outsized support for this fringe belief that climate change isn't real or that climate science is false. And making it appear that there is a false equivalency between the pro-environmental organizations and accounts and pro-climate science accounts, and the argument against that, which is so small and diminished compared to that pro-science online group that a lot of their behavior just is centered around amplifying and pushing out climate denial content and making it appear as if there is this outsized support.

So, I think what's most worrisome that we've seen over the last year, given this goal is that, they’ve become increasingly tied to, and this is on a network level, they've become increasingly tied to conservative and conspiratorial groups online. So, the most recent map that we've done, there was a large QAnon group within the climate conversation that we've never seen before. And on an individual level, we've seen influencers like Naomi Seibt, start to embrace QAnon, as far as the conspiracies around child trafficking, the political conspiracies, as well as using that to support her climate denial stances. And when this small climate denial group appears to have support from these adjacent groups, we've seen in the past that when an instance takes place where all of their priorities align, I think the fires over the summer on the West Coast are a great example of this. When those fires were going on, the Black Lives Matter protests were also happening, and along with a lot of unfounded conspiracies concerning Antifa.

And when these fires started, we saw both from the QAnon networks and from conservative networks start to push the conspiracy that Antifa had started these fires. And this was supported by articles from the right-wing media ecosystem and decontextualized videos. And all of this content spread rapidly. And, of course, the climate denial group also started participating in amplifying the drumbeat of Antifa and wildfire conspiracies, the climate denial groups were saying that Antifa is the real climate alarmism and pushing their rhetoric around climate alarmism.

And really, the insular borders that we see around the climate denial group that usually is quite tight-knit, quite sort of an echo chamber in which they're really just sharing one another's content, those borders started to open up to conservative influencers, like Andy Ngo, who is a known Antifa provocateur, often spreads misinformation and disinformation around Antifa. We saw him starting to retweet a climate denier who was calling Antifa the real climate alarmism. And so, that behavior is problematic and certainly needs to be curbed because we can see how easily it will devolve into chaos. And we wouldn't really be able to see this without that network perspective of understanding how groups adjacent to one another can start to amplify one another's messages once their goals align.

Colleen: How would you like to see your work impact the social media landscape in the next three to five years?

Erin: I truly believe that the internet can still be a tool for marginalized voices to be heard, to be given a bigger megaphone. I hope with this work that we, you know, continue to shine light on the dark corners of the internet, in order to fight the manipulation and the deceit that's happening online to open up those spaces for what I believe is still an incredibly powerful tool for marginalized voices. And in a practical sense, in the next three to five years, I really see the field moving towards a more formalized method, some more standards because it's... what's a good metaphor, building the plane as we take off, or whatever that metaphor is.

Colleen: So you spend a lot of time in the darker corners of social media. What social media do you use for fun or as a palate cleanser?

Erin: Oh, the eyebleach subreddit tends to be my go-to after a long day of work. Yeah, I actually don't use social media all that much. I'm pretty quiet. I haven't touched my Facebook in like years. And I'm pretty quiet on Instagram. I think I use the messaging capabilities of these tools to stay in touch with, like, friends and family. And I would say that's the major extent of my... And to, like, creep on Twitter. I am constantly on Twitter. I never post on Twitter. But I am just a fly on the wall reading everything people post on Twitter.

Colleen: So, somehow this does not surprise me at all that you are not a big social media user. Before I did this interview, I was a little reluctant because I think what you do is terrifying. But it's been really fun talking to you today, super informative. And it is encouraging to know that what's going on online is being looked at so carefully because when we gather that information and that data, that's, sort of, the first step to solving some of the problems with it.

Erin: Yeah, absolutely.

Colleen: It's been really, really interesting. Thanks so much for joining me.

Erin: Well, thanks for having me and giving me the time and space to talk about what Graphika does. I really enjoy talking to you as well. Return to top


Related resources