AI Is Still Bad at Detecting Hate Speech—Here’s A Way to Make It Better
Using data sourced from women journalists—one of the prime targets of online harassment—researchers are developing a unique approach with potentially wide implications for mitigating toxic content
Last month, social media platforms lost yet another high-profile battle against hate speech when trolls unleashed a torrent of racist abuse against three British soccer players after their team lost a championship game.
The major platforms said it wasn’t for lack of trying. Twitter, for one, reported deleting over 1,000 tweets and shutting down “a number of accounts.” Yet many posts initially overwhelmed detection efforts, in some instances because the trolls simply used emojis instead of words.
Designing tools for detecting speech that is vicious or derogatory remains one of the thorniest problems for AI researchers.
That’s because one of the biggest obstacles to training AI models to adequately detect it has been the speech itself, which can involve work-arounds such as the aforementioned emojis and which routinely come with precious few of the kinds of contextual clues that the tools need to distinguish toxic speech from more innocuous forms. Lacking our innate understanding of language, machines still need to be told exactly what’s what, even when it’s a clear no-brainer to a human.
To get at the root of this problem, a pair of Columbia researchers developed a unique approach—sourcing real-world incidents of verbal abuse directly from, and specifically labeled by, the recipients themselves—to better inform models about the nuances of online harassment. And to do that, the team focused on a community that has suffered disproportionately from online harassment: women journalists.
To understand how they did it, Columbia Engineering magazine caught up with the team’s two lead researchers: Julia Hirschberg and Susan McGregor. Hirschberg, the Percy K. and Vida L. W. Hudson Professor of Computer Science and head of the Spoken Language Processing Group, is a pioneer in computational linguistics, while McGregor, an associate research scholar with the Data Science Institute at Columbia, has long focused on security and privacy issues affecting journalists and media organizations.
In the edited conversation below, Hirschberg and McGregor reflect on how they developed their particular methodology, how the process can give agency back to those who are targeted, and how it can inform a new way forward for broader detection.
Early on, we decided to take a unique approach to collecting this data: it’s donated to us directly by the data owners, and we are asking them to personally annotate it so that we have the best possible framework for the language.
Q: What inspired the two of you to begin collaborating on detecting abusive speech?
Susan McGregor: I just published a book on information security for journalists; I’ve been thinking for a long time about what a serious problem online harassment was for journalists—and particularly for women journalists, from really serious physical security threats like stalking and doxing to negatively affecting their mental health and productivity. It’s imposing a huge cost on the industry as well. We’re risking losing a lot of talent as a result of this problem. In the course of my book research, the people that I talked to found the available tools for detecting abusive speech, particularly the ones provided by social media platforms, just inadequate, to the point of being useless. In conducting some of my previous work, I had already seen how poor a lot of the existing datasets were. So I pitched this project to Julia with the idea that if we could get really good data, we could create a tool that would give women who are targets of harassment something useful that they could do with their data: an opportunity for them to make something constructive out of what was almost universally a negative experience.
Julia Hirschberg: The data is at the heart of the problem. My lab started to work on hate speech in 2012. It was one of the earlier things that anyone had done on the issue. One thing that we discovered was that there is no one-size-fits-all solution for hate speech—it is very different for people who are in the group, who can say some things that people who are not in the group cannot say, for instance. We wrote a paper on that subject and had some students who worked with us who became very interested. But from the beginning, the problem really has been getting the data.
Q: One of the things that makes your approach so promising is that you’re getting your data directly from the journalists themselves. But from a research perspective, that also seems to be one of the biggest challenges you’ve faced.
JH: Our data is basically a large corpus of toxic tweets. Early on, we decided to take a unique approach to collecting this data: it’s donated to us directly by the data owners, and we are asking them to personally annotate it so that we have the best possible framework for the language. That means the biggest challenge is getting the data and getting our journalists to annotate it. Often the question is, do they have time? A big problem with journalists over the last year was that they were really busy covering the COVID-19 pandemic, as well as last year’s presidential election and the racial justice protests. We still don’t have enough data.
SM: As far as I know, we are the first set of researchers to directly engage journalists in the annotation process and in the data acquisition process. We’re not scraping the data, we are acquiring it. We had one person who donated her data early on and gave us the pilot information that we needed to design the annotation platform and build the technologies that we use for it. Our platform presents the tweets in context—so you actually see a tweet in the thread in which it originally appeared—and then it only shows annotation options on the ones that were sent to you. All of that was an enormous amount of work that the students did last year.
JH: My former postdoc Sarah Ita Levitan , who is now on the faculty at Hunter College, has played a crucial role throughout the project.
Q: Some interesting things seem to have emerged from your unique dataset. For instance, from a detection standpoint, “understanding” the words is half the battle: first the tool has to be able to “see” the words and connect them to the target. What have you learned about how the major platforms have been trying to block content and about the inventive work-arounds that harassers are using?
SM: We learned that Twitter was a platform that was especially problematic because, while there are some ways in which the platforms try to limit harassment tactics, there are also some pretty straightforward ways harassers can get around them. For example, if you block a harasser on Twitter and they want to see your tweets anyway, they can just look at Twitter in incognito mode and then they can see your tweets again. And typically they will take a screenshot of a tweet that you posted and tweet it out themselves in order to directly or indirectly encourage their followers to go after you on Twitter—if not more broadly. So a lot of harassment follows this “screen capture” strategy.
Q: How does that inform your approach to building detection models?
JH: We are using the standard natural language processing (NLP) techniques, such as keyword filters. But in addition to a keyword filter, we’ve learned from our participants about many other ways to identify hate speech, and homing in on heuristics, like the screen captures, has been extremely useful in helping us to understand what kinds of things we should be paying attention to when we build our models on this data. But we don’t yet know what features are going to be the most useful until we have the data and try it out—which is true of all speech and NLP tasks.
Q: Much of the effort to combat abusive speech has been reactive—third party trying to remove toxic content as quickly as possible, for example. Your research illustrates the need to take a proactive approach—that is, giving targets some agency in designing new tools for preventing the content from ever reaching its target.
SM: One of the most promising things about a proactive approach like ours is that it allows journalists to make the most valuable use of these platforms without being distracted by the harassing messages they receive. For journalists, being on Twitter is essentially an employment requirement. It’s a really valuable tool for identifying and contacting sources, as well as promoting their work. And journalists want to hear from their real audience on Twitter, to interact and engage with people who have legitimate comments and critiques about their work. The problem is that when you’re getting vast quantities of harassing tweets—sometimes thousands in a day—you simply can’t look at all of them. Just to weed out the legitimate interactions from the harassment is an entire job itself. Without a proactive solution, it damages the platforms as tools. If the tool isn’t usable because you’re getting inundated with harassment, then as a journalist, you’re just not able to do your job as well.
JH: Not only is our approach proactive, it’s meant to be individualized. Some people don’t care as much about abusive messages, but that’s why it has to be tailored to the individual. But again, that’s another reason why we need more individuals so we can better hone the tool. If any women journalists would like to contribute their data to help expand the tool, we have a sign-up form where we’d love to hear from you.