Named Entity Recognition for British text

Posted: May 1, 2022   Posted By: Abhinay Mehta

Performing Named Entity Recognition (NER) to automatically detect important bits of information, such as personally identifiable information (PII) in text documents can be useful for several reasons:

  • To anonymize text
  • Help compliance with regulatory requirements such as GDPR
  • Label training data for Machine Learning projects

We’ve discussed several techniques to perform NER and automatically find useful words and phrases in text with this series of articles. We explored popular open source tools and techniques which found ways to identify names of people, email addresses, phone numbers, credit card numbers, names of places, etc.

Most of the efforts in the A.I. community to solve this problem for English text is targeting the American audience. Therefore if you try and use most of the tools available on British text data, the results are usually not as good as text with USA address format, phone numbers, names, etc.

Introducing London Analytics

We work in the UK so regularly deal with British text data. Out-the-box solutions weren’t accurate enough for us so we had to build our own NER models and APIs.

Which made us think that maybe those APIs might be useful for other UK folks?

Therefore we’re making available our beta API, London Analytics, for other people to try out. Just contact us for an API Key and give it a test drive.

Example With Python

This prints the following:

Some of the types of data currently supported:

And a lot more.

Conclusion

If you deal with British text data and want to try our API, then please get in touch and we’ll give you access. The API is in Beta so expect more improvements and features.