Spammers, hackers, political propagandists, and other nefarious users have always tried to game the systems that social media sites put in place to protect their platforms. It’s a never-ending battle; as companies like Twitter and Facebook become more sophisticated, so do the trolls. And so last week, after Facebook shared new details about a tool it built to analyze text found in images like memes, some people began brainstorming how to thwart it.
Social media companies are under tremendous pressure from lawmakers, journalists, and users to be more transparent about how they decide what content should be removed and how their algorithms work, especially after they’ve made a number of high-profile mistakes. While many companies are now more forthcoming, they’ve also been reluctant to reveal too much about their systems because, they say, ill-intentioned actors will use the information to game them.
Last Tuesday, Facebook did reveal the details of how it uses a tool called Rosetta to help automatically detect things like memes that violate its hate speech policy or images that spread already debunked hoaxes; the company says it uses it to process one billion public images and videos uploaded to Facebook each day.
Propagators of the false right-wing conspiracy theory QAnon took interest after “Q”—the anonymous leader who regularly posts nonsensical “clues” for followers—linked to several news articles about the tool, including WIRED’s. Rosetta works by detecting the words in an image and then feeding them through a neural network that parses what they say. The QAnon conspiracy theorists created memes and videos with deliberately obscured fonts, wonky text, or backwards writing, which they believe might trick Rosetta or disrupt this process. Many of the altered memes were first spotted on 8chan by Shoshana Wodinsky, an intern at NBC News.
“Maybe we have to think of content moderation as an ongoing effort that must evolve in the face of innovative adversaries and changing cultural values.”
It’s not clear whether any of these tactics will work (or how seriously they have even been tested), but it’s not hard to imagine that other groups will keep trying to get around Facebook. It’s also incredibly difficult to build a machine-learning system that’s foolproof. Automated tools like Rosetta might get tripped up by wonky text or hard-to-read fonts. A group of researchers from the University of Toronto’s Citizen Lab found that the image-recognition algorithms used by WeChat—the most popular social network in China—could be tricked by changing a photo’s properties, like the coloring or way it was oriented. Because the system couldn’t detect that text was present in the image, it couldn’t process what it said.
It’s hard to create ironclad content-moderation systems in part because it’s difficult to map out what they should accomplish in the first place. Anish Athalye, a PhD student at MIT who has studied attacks against AI, says it’s difficult to account for every type of behavior a system should protect against, or even how that behavior manifests itself. Fake accounts might behave like real ones, and denouncing hate speech can look like hate speech itself. It’s not just the challenge of making the AI work, Athalye says. “We don’t even know what the specification is. We don’t even know the definition of what we’re trying to build.”
When researchers do discover their tools are susceptible to a specific kind of attack, they can recalibrate their systems to account for it, but that doesn’t entirely solve the problem.
“The most common approach to correct these mistakes is to enlarge the training set and train the model again,” says Carl Vondrick, a computer science professor at Columbia University who studies machine learning and vision. “This could take between a few minutes or a few weeks to do. However, this will likely create an arms race where one group is trying to fix the model and the other group is trying to fool it.”
Another challenge for platforms is deciding how transparent to be about how their algorithms work. Often when users, journalists, or government officials have asked social media companies to reveal their moderation practices, platforms have argued that disclosing their tactics will embolden bad actors who want to game the system. The situation with Rosetta appears like good evidence for their argument: Before details of the tool were made public, conspiracy theorists ostensibly weren’t trying to get around it.
But content moderation experts say there are still greater benefits to being open, even if transparency can allow some bad actors to manipulate the system in the short term. “Attacks reveal the limits of the current system and show the designers how to make it stronger. Keeping it away from challenges can mean that its weaknesses won’t get adequately tested until the moment it’s most important that it works,” says Tarleton Gillespie, author of Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media.
“The expectation that they be perfect is part of the problem. That they be flawless; that they be impervious to gaming—it’s a losing proposition for everyone concerned,” says Sarah T. Roberts, an information studies professor at UCLA who studies content moderation.
Content-filtering systems have been gamed as long as they’ve been online. “It’s not a new phenomenon that some users will try to elude or exploit systems designed to thwart them,” says Gillespie. “This is not so different than search engine optimization, trying to trend on Twitter, or misspelling ‘Britney Speers’ on [peer-to-peer] networks to avoid record label copyright lawyers.”
Today many popular platforms, such as Instagram and Snapchat, are dominated by images and videos. Memes in particular have also become a prominent vehicle for spreading political messages. In order to keep itself free of things like hate speech, violent threads, and fake news, Facebook needed to find a way to comprehensively process all of that visual data uploaded to its sites each day. And bad actors will continue to search for new ways to outsmart those systems.
Effectively moderating a sprawling internet platform is a task often described as in need of a single solution, but the reality of the problem is more complex. “Maybe we have to think of content moderation as an ongoing effort that must evolve in the face of innovative adversaries and changing cultural values,” Gillespie says. “This is not to give platforms a pass, it’s to hold them to the right standard of how to actually address what is a constantly developing challenge, and be both transparent and effective.”