BERT based QnA, Information Extractor and Closed-domain Chatbot

We have been working on BERT based QnA system for more than a year now. What started with a small experiment to create a QnA demo using BERT has spanned out to a big project thanks to superior capabilities of BERT NLP model.

Let’s see how we have progressed thus far!

Version 1

The first version of our QnA was very basic one. You input one paragraph, pose some questions and the system will find answer from the paragraph.

The demo of first version of QnA :

More details about how it was created can be found in the case study:

Currently it is available in 13 languages including English, Hindi, Arabic, Spanish, French, German, Chinese, Portuguese, Italian, Turkish, Russian, Korean and Japanese. The limitation of this demo is that it takes around 20 seconds to find the answer from the paragraph of 1000 characters length.

Version 2

We improved upon first version and got success to find the answer within 3-4 seconds which was huge jump from the version 1 in terms of result time. As the responses are fast, we call this version closed-domain chatbot using BERT.

The demo can be accessed at

More details can be found on case study:

This closed-domain chatbot is available in 12 languages and again limitation is that if we increase the paragraph length then it takes longer to find the answer.

We have also prepared API access of this chatbot. So if anyone wants to use it from their system then they can make call to API by passing paragraph and question. The API will respond with answer.

Version 3

In our latest demo, we have eliminated the text length limitation.

You can check our large-text demo at

In this demo, the text length is nearly 500,000 characters and result is generated in 2-3 seconds. We have used the book “India Under British Rule” as a context for this demo. (The link of the book and sample questions are provided on the demo page for reference.)

Interesting info

All above demos are working without GPU. If we add more RAM/CPU and GPU then result speed will increase further and also we can take extract answers from longer text.

Possible Use Cases of Large-Text Version

As the result generation time has reduced tremendously, our Machine Learning based QnA system could be used at many places and scenarios. Some of them are:

  • Information extractor
    There is lot of text based data in so many fields. Let’s take example of Law or Health sector. They have lots tonnes of text based data so to find some specific information out of that overwhelming data can be cumbersome. Using our system, once we index that data, then user can find answers faster. Also, they don’t need to write queries. They can ask questions in natural language and system will find answers for them.
  • Small scale search engine
    It can be used to give instant answer on website. If user is looking for some specific information on website then he can open up a chat bubble and ask question. The system will reply instantly so that user won’t need to browse through so many pages. He can get the exact answer he is looking for and also he can be taken to the same page where he can find further related information.
  • QnA Builder
    It can be also used to make a QnA Builder. From text or FAQs or raw information, our system can build QnA which will respond to user’s questions.
  • ML based chatbot
    Currently most of the chatbots are logical-tree-decision based. We have to define whole conversation flow and chatbot will work accordingly. Using our system, we can build a ML based chatbot which will find answers from the given text.

From the progress we have done so far, there seems to be large number of use cases we can create using this system.

Further Roadmap

We are so excited to continue working on this project and improve it in various aspects.

  1. First of all, we want to make our large-text-demo multi-lingual. Currently it is available in English only.
  2. Improve the accuracy of large-text-demo. It is good but it can be improved by a great extent. We’ll be improving it significantly in future.
  3. Improve our SQuAD like datasets for all the languages we have: Hindi, Arabic, Spanish, French, German, Chinese, Portuegese, Italian, Turkish, Russian, Korean and Japanese.
  4. Create API based access for customers.

Praises for our work

Till now, we have received dozens of emails praising our work. Especially our multi-lingual QnA demos got very good feedback from users in number of countries.

Some notable references we have got:

We have also received dozens of emails appreciating our work.

Top Spot in Google Search Results!

As our case studies and demos got popular, we started ranking higher on Google search. For many of the keywords related to “BERT”, now we are on top spot or in top 5!

This is bringing in lot of NLP enthusiasts on our site.

We are committed to take this project on new level. As the training of model, research and development, dataset creation are time and resource consuming tasks, we are actively seeking investors to fund our project. This is becoming a profound commercial product and we have already got couple of paying customers.

If you find it worth investing, then do get in touch with us at