Want AI that flags hateful content? Build it.

Participating developers will make two different models. The first, for those with intermediate skill sets, is one that identifies hateful images; the second, considered an advanced challenge, is a model that attempts to fool the first one. “That actually mimics how it works in the real world,” says Chowdhury. “The do-gooders make one approach, and then the bad guys make an approach.” The goal is to engage machine-learning researchers on the topic of mitigating extremism, which may lead to the creation of new models that can effectively screen for hateful images.

A core challenge of the project is that hate-based propaganda can be very dependent on its context. And if someone doesn’t have a deep understanding of certain symbols or signifiers, they may not be able to tell what even qualifies as propaganda for a white nationalist group.

“If [the model] never sees an example of a hateful image from a part of the world, then it’s not going to be any good at detecting it,” says Jimmy Lin, a professor of computer science at the University of Waterloo, who is not associated with the bounty program.

This effect is amplified around the world since many models don’t have a vast knowledge of cultural contexts. That’s why Humane Intelligence decided to partner with a non-US organization for this particular challenge. “Most of these models are often fine tuned to US examples, which is why it’s important that we’re working with a Nordic counterterrorism group,” says Chowdhury.

Lin, though, warns that the solution to these issues may have to go further than any algorithmic changes. “We have models that generate fake content. Well, can we develop other models that can detect fake generated content? Yes, that is certainly one approach to it,” says Lin. “But I think overall, in the long run, training, literacy, and education efforts are actually going to be more beneficial and have a longer lasting impact. Because you’re not going to be subjected to this cat-and-mouse game.”

The challenge will run till November 7, 2024. Two winners will be selected, one for the intermediate challenge and another for the advanced, and will receive $4,000 and $6,000 dollars, respectively. Additionally, participants will have their models reviewed by Revontulet, which may decide to add the models to their current suite of tools to combat extremism.

Categories

Want AI that flags hateful content? Build it.

Leave a Reply Cancel reply