Exploring more capabilities of Google’s pre-trained model BERT (github), we are diving in to check how good it is to find entities from the sentence.

What is NER?

In any text content, there are some terms that are more informative and unique in context. Named Entity Recognition (NER) also known as information extraction/chunking is the process in which algorithm extracts the real world noun entity from the text data and classifies them into predefined categories like person, place, time, organization, etc.

Importance of NER in NLP

Natural Language Processing includes various tasks like Machine Translation, Question and Answering, Sentiment Analysis, Part-of-speech (POS) Tagging, etc. for better understanding and processing of language. NER also one of the NLP Task. It is a sub-classification task of Information Extraction (IE) in Natural Language Processing. Many blogs, articles, and other long contents are being posted on websites, web portals and social media on a daily basis. NER is the right tool to find people, organizations, places, time, etc information included in the article and getting the major out of the long descriptions and categorizing them. NER also can be used in the NLP tasks such as text summarization, information retrieval, question answering system, semantic parsing, and coreference resolution.

What is BERT?

BERT (Bidirectional Encoder Representations from Transformers) is a general-purpose language model trained on the large dataset. This pre-trained model can be fine-tuned and used for different tasks such as sentimental analysis, question answering system, sentence classification and others. BERT is the state-of-the-art method for transfer learning in NLP.

For our demo, we have used the BERT-base uncased model as a base model trained by the HuggingFace with 110M parameters, 12 layers, , 768-hidden, and 12-heads.

Datasets for NER

There are many datasets for finetuning the supervised BERT Model. The Most Basic Dataset is CONLL 2003, concentrating on four types of named entities related to persons, locations, organizations, and names of miscellaneous entities. CONLL 2003 follow BIO schema which contain four columns separated by a single space.

BIO (Beginning, Inside, Outside) schema is a common tagging format for tagging sentence tokens for NER. Here B-prefix indicates that the tag is at the beginning of every chunk. Same I-prefix for Inside of Chunk and O-prefix for no entity inside chunk.

Let’s take an example, for the input “Joseph Wu Is Chairman of Taiwan’s Mainland Affairs Council”, the entities would be:


We have converted this dataset into a dataset containing only two columns which are word for sentence and name entity tag.

CONLL 2003 dataset has only 4 entities. To increase the categories of the entities we have merged other 4 datasets: Ontonote-5.0, GMB(Groningen Meaning Bank), NAACL 2019, wnut2017.

CONLL 2003

Entities: Miscellaneous, Person, Location.


Entities: Organization, Art Work, Numbers in word, Numbers, Quantity, Person, Location, Geopolitical Entity, Time, Date, Facility, Event, Law, Nationalities or religious or political groups, Language, Currency, Percentage, Product.

GMB(Groningen Meaning Bank)

Entities: Natural Phenomenon, Person, Geographical, Organization, Art Work, Event, Time, Geopolitical.

NAACL 2019

Entities: Organization, Person, Location, Geopolitical, Facility, Vehicles.


Entities: Location, Person, Product, Groups, Corporations, Creative.

These all datasets had a different format. We have merged and converted them into a single format. We have not used the whole dataset from all these five datasets, but selected part of them based on the number of entities, to generate an unbiased dataset.

In the final merged dataset with more than 40K sentences has a total of 17 entities with 45 tags (As per BIO schema).


BERT is a powerful NLP model but using it for NER without fine-tuning it on NER dataset won’t give good results.

So, once the dataset was ready, we fine-tuned the BERT model.

We have used the merged dataset generated by us to fine-tune the model to detect the entity and classify them in 22 entity classes.

In the evaluation of the fine-tuned model, we got an accuracy of 93.11%.


If you are eager to know how the NER system works and how accurate our trained model’s result, have a look at our demo:

Bert Based Named Entity Recognition Demo

To test the demo provide a sentence in the Input text section and hit the submit button. In a few seconds, you will have results containing words and their entities.

The fine-tuned model used on our demo is capable of finding below entities:

  • Person
  • Facility
  • Location
  • Organization
  • Work Of Art
  • Event
  • Date
  • Time
  • Nationality / Religious / Political group
  • Law Terms
  • Product
  • Percentage
  • Currency
  • Langauge
  • Quantity
  • Ordinal Number
  • Cardinal Number

We would love to get your feedback on our demo. Do check out our demo of the BERT based named entity Recognition system and let us know in the comment section below.

Make your own NER using BERT + CONLL

We have created this colab file using which you can easily make your own NER system:

BERT Based NER on Colab

It includes training and fine-tuning of BERT on CONLL dataset using transformers library by HuggingFace.

Further Roadmap

We believe in “There is always a scope of improvement!” philosophy.

This is the initial version of NER system we have created using BERT and we have already planned many improvements in that.

  • Add more and more entities as much as possible to categories the entities in more specific manners.
  • Find or prepare a good dataset in any other languages then fine-tune a model for other languages.
  • Fine-tune the model for domain-specific datasets like medical, political, education, etc.

Purchase BERT Based NER

If you liked our demo and want to set up the same on your own server, then you can purchase it.

The basic version with 4 entities can be created easily by using the Colab file we have shared above so if you just want to do that then no need to purchase. If you want to have it with more entities then you can buy our model which is fine-tuned on 5 datasets.

Find more details on Buy BERT based Named Entity Recognition (NER) fine-tuned model and PyTorch based Python + Flask code.


We are thankful to Google Research for releasing BERT, Huggingface for open sourcing pytorch transformers library and Kamalraj for his fantastic work on BERT-NER.

If you are looking for custom BERT based NER then do contact us or send email at letstalk@pragnakalp.com to avail our Natural Language Processing services.

Categories: Natural Language Processing NLP

2 Replies to “BERT Based Named Entity Recognition (NER) Tutorial and Demo”

  1. Your tutorial and demo for the BERT Based NER system is excellent! The BERT Based NER on Colab is very useful. I want to fine tune the BERT Based NER for Gujarati language using WikiAnn dataset from the Huggingface. Will you, please, give me any suggestion about it? Thanks a lot!

    1. Hello Chandrakant Bhogayata,

      As discussed in this blog, your dataset should be in the BIO schema same as the CONLL dataset. The dataset for the Gujarati language will thus be available to you by using the hugging face method. Then, use the BERT NER Colab file given in the blog and follow the given instructions in the Colab file.

      You will mostly need to replace the Gujarati labelled dataset files in the “data” directory after cloning the Github repository and then fine-tune the model as instructed in the Colab file. After completion of training, the fine-tuned model should recognize entities like location, name and person for Gujarati sentences.

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>