Emojis are making it harder for tech giants to track down online abuse


Abusive online posts are less likely to be identified if they feature emojis, new research suggests.

Some algorithms designed to track down hateful content – including a Google product – are not as effective when these symbols are used.

Harmful posts can end up being missed altogether while acceptable posts are mislabelled as offensive, according to the Oxford Internet Institute.

After England lost in the Euro 2020 final, Marcus Rashford, Bukayo Saka and Jadon Sancho received a torrent of racist abuse on social media, many featuring monkey emojis.

The start of the Premier League brings fears that more will follow unless social media companies can better filter out this content.

Many of the systems currently used are trained on large databases of text that rarely feature emojis. They may struggle to work as well when they then come across the symbols posted online.

Sky News analysis showed Instagram accounts posting racist abuse that featured emojis were over three times less likely to be shut down, compared with those posting hateful messages that just contained text.

To help tackle this problem, researchers created a database of almost 4,000 sentences – most of which included emojis being used offensively.

This database was used to train an artificial intelligence model to understand which messages were and were not abusive.

By using humans to guide and tweak the model, it was better able to learn the underlying patterns that indicate if a post is hateful.

The researchers tested the model on abuse related to race, gender, gender identity, sexuality, religion and disability.

They also examined different ways that emoji can be used offensively. This included describing groups by an emoji – a rainbow flag to represent gay people, for example – and adding hateful text.

Perspective API, a Google-backed project that offers software designed to identify hate speech, was just 14% effective at recognising hateful comments of this type in the database.

This tool is widely used, and currently processes over 500 million requests per day.

The researchers’ model delivered close to a 30% improvement in correctly identifying hateful and non-hateful content, and up to an 80% improvement relating to some types of emoji-based abuse.

Yet even this technology will not prove fully effective. Many comments may only be hateful in particular contexts – next to a picture of a black footballer for example.

And problems with hateful images were highlighted in a recent report by the Woolf Institute, a research group examining religious tolerance. They showed that – even when using Google’s SafeSearch feature – 36% of the images shown in response to the search “Jewish jokes” were antisemitic.

The evolving use of language makes this task even more difficult.

Research from the University of Sao Paulo showed that one algorithm rated Twitter accounts belonging to drag queens as more toxic than some white supremacist accounts.

That was because the technology failed to recognise that language used by someone about their own community might be more offensive if used by someone else.

Incorrectly categorising non-hateful content has significant downsides.

“False positives risk silencing the voices of minority groups,” said Hannah Rose Kirk, lead author of the Oxford research.

Solving the problem is made more difficult by the fact that social media companies tend to guard their software and data tightly – meaning the models they use are not available for scrutiny.

“More can be done to keep people safe online, particularly people from already-marginalised communities,” Ms Kirk added.

The Oxford researchers are sharing their database online, enabling other academics and companies to use it to better their own models. – Sky News