London Analytics Article - Profanity Words API

Posted: June 2, 2022 Posted By: Abhinay Mehta

If you happen to have user generated content, your users will (occasionally, mostly?) say naughty words on your website, social media, reviews, comments, etc. That’s one of the blessings/curse of being anonymous on the internet, you can talk like a drunken sailor.

Fortunately, we at londonanalytics.co.uk are somewhat experts at cursing so when we had to build a solution to detect bad language in user generated (British) text, we were already the domain experts.

So we built an easy to use API that doesn’t require you to be a Machine Learning expert to use the latest techniques in the world of Data Science to detect Profanity.

Example With Python

(OMG this was so much fun to test!)

This prints the following:

Under the hood there are a few techniques being used, there’s a separate machine learning model to make the prediction and separate processes to try and figure out the bad words list. Therefore it’s not necessary every positive prediction will have accompanying list of bad words. Let’s take a look at an example of this scenario:

This prints the following:

Our machine learning model has detected a lot of anger and hate in that sentence so has decided there probably is some profanity in there somewhere even though we couldn’t find a specific word.

We‘re of course very polite people, so let’s test a sentence we would actually write:

This prints the following:

Feel free to contact us and play with the API. There’s so much fun to be had here!