Google’s AI hate speech detector is easily fooled by a few typos or the insertion of words like “love”

Researchers at Aalto University in Finland find simple keyword filters & complex AIs are equally vulnerable to workarounds, in 7 systems used for hate-speech detection.

New Scientist reports that some words were particularly effective at masking hateful content because of their strong positive connotations. …For example, a sentence that Perspective assigned a “toxicity” score of 0.99 – with 1 being peak obscenity, could be reduced to 0.15 simply by adding the word “love”.”

Image: Council of Europe