Question Answering System in Python using BERT NLP

Latest Update (6th March, 2020)

THREE more languages added to our BERT QnA demo: RUSSIAN, KOREAN and JAPANESE.

NLP Based Question Answering System in RUSSIAN using BERT / Python

NLP Based Question Answering System in KOREAN using BERT / Python

NLP Based Question Answering System in JAPANESE using BERT / Python (Alpha version. Needs a lot of improvement.)

Latest Update (26th February, 2020)

One more language added to our BERT QnA demo: TURKISH.

NLP Based Question Answering System in TURKISH using BERT / Python

Latest Update (12th February, 2020)

Now, we have added one more language to our BERT QnA demo: ITALIAN.

NLP Based Question Answering System in ITALIAN using BERT / Python

Latest Update (10th February, 2020)

With improved datasets of Hindi, Arabic, Chinese, German, Portuguese, Spanish, French languages, the accuracy of those demos are improved!

Latest Update (17th December, 2019)

On popular demand, we have now published NLP Tutorial: Question Answering System using BERT + SQuAD on Colab TPU which provides step-by-step instruction on fine tuning BERT pre-trained model on SQuAD 2.0 dataset to setup question answering system.

Code and fine-tuned model of same exact replica of our Question n Answering System Demo using BERT in Python + Flask can be purchased now.

Latest Update (10th December, 2019)

We have published our BERT QnA for Chinese Simplified language. You can test it out at: NLP Based Question Answering System in CHINESE using BERT / Python.

We have published our BERT QnA for Chinese Simplified language. You can test it out at: NLP Based Question Answering System in GERMAN using BERT / Python.

Latest Update (14th September, 2019)

In our endeavor to make a BERT based chatbot, we have got some success and reduced the inference time to couple of seconds. For more details, please check our latest case study Closed-domain Chatbot using BERT in Python.

Update ( 14th September, 2019)

Succeeding in our endeavour of making BERT based QnA multi-lingual, we have created fine-tuned models of Hindi, Arabic, Spanish and French. Below are the links to try the demo in all the available languages. More languages will be added in future.

NLP Based Question Answering System in ENGLISH using BERT / Python

NLP Based Question Answering System in HINDI using BERT / Python

NLP Based Question Answering System in ARABIC using BERT / Python

NLP Based Question Answering System in SPANISH using BERT / Python

NLP Based Question Answering System in FRENCH using BERT / Python

(Note: Currently the multi-lingual SQuAD datasets and fine-tuned models we created are not to be published / open-sourced. Feel free to contact us if you think this can be used for any of your project or a commercial product.)

What is Question Answering system?

Question Answering (QnA) model is one of the very basic systems of Natural Language Processing. In QnA, the Machine Learning based system generates answers from the knowledge base or text paragraphs for the questions posed as input. Various machine learning methods can be implemented to build Question Answering systems.


Create a Question Answering Machine Learning model system which will take comprehension and questions as input, process the comprehension and prepare answers from it.

Using Natural Language Processing, we can achieve this objective. NLP helps the system to identify and understand the meaning of any sentences with proper contexts.

Implementation or Usage of QnA model in industry/project

  • To develop a Common sense reasoning model that mimics likes a Human reasoning.
  • Prepare FAQs from knowledge base, product manual or documentation.
  • For creating smart chatbot that can answer FAQs for different industries like Healthcare, Travel, Agriculture, Education, Manufacturing, Online commerce, etc.


With the massive growth of the web, we have a large amount of data. And only some text data are annotated. For a task in field like Natural Language Processing we need lot of annotated data for supervising learning or unannotated data for unsupervised learning. Various researchers prefer unsupervised learning. They highlighted a few techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). BERT is one such pre-trained model developed by Google which can be fine-tuned on new data which can be used to create NLP systems like question answering, text generation, text classification, text summarization and sentiment analysis. As BERT is trained on huge amount of data, it makes the process of language modeling easier. The main benefit for using pre-trained model of BERT is achievment in substantial accuracy improvements compared to training on these datasets from scratch.

BERT builds upon recent work in pre-training contextual representations, it is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus. BERT represents Contextual representation with both left context and right. BERT is conceptually simple and empirically powerful. BERT is better than previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP having features of Domain Adaptation. As per the BERT paper it can be established that, with proper language model training method, the Transformer(self-attention) based encoder could be potentially used as an alternative to the previous language models.


An RNN (theoretically) gives infinite left context (words to the left of the target word). But what we might like is to use each left and right contexts to see how well the word fits within the sentence.

RNNs is network architecture used for translation, processing language sequentially. The sequential nature makes difficult to fully achieve the power of parallel processing units like TPUs. RNN suffers from vanishing and exploding gradient problems. RNNs have short term memories as it’s not good for remembering their inputs over a long period

While a Transformer network applies self-attention mechanism which scans through every word and appends attention scores(weights) to the words. Transformers’ training efficiency and superior performance in capturing long-distance dependencies is better compared to recurrent neural network architecture.


The usage of LSTM models restricts the prediction ability to a short range. While BERT uses a “masked language model” (MLM). MLM objective permits the representation of both the left and the right context, which allows to pre-train a deep bidirectional Transformer.


When applying fine-tuning based approaches to token-level tasks such as SQuAD question-answering, it is crucial to incorporate context from both directions while with OpenAI GPT, it uses a left-to-right architecture, where every token can only be attended to previous tokens in the self-attention layers of the Transformer.

  • GPT uses a sentence separator ([SEP]) and classifier token ([CLS]) which are only introduced at a fine-tuning time.
  • BERT learns [SEP](special token), [CLS](classifier token) and sentence embeddings throughout pre-training.
  • GPT used a similar learning rate of 5e-5 for all fine-tuning experiments. BERT chooses a task-specific fine-tuning learning rate that performs the most effective on the development set.


  • As we have lots of training data it becomes quite difficult to train even with a GPU, so we used Google’s TPU for fine-tuning task.
  • The time taken for inference was very large. Hence, we tweaked hyperparameters to make system accurate and give result in optimal time so we maintained a log for each hyperparameter and took an optimized combination of hyperparameters.


Our version of QnA using BERT can be tested at BERT NLP QnA Demo using Python.

(As the system is hosted on low-end configuration server, it currently takes around 50 seconds to process the sample comprehension and prepare answers from it. By increasing the resources the process can be completed in less time.)

Future Roadmap/improvement plan

  • Train model further on CoQA to get better accuracy
  • Improve the inference speed to make the system production ready
  • Multi-language support (for example, Hindi or Gujarati comprehensions should also work)
  • Investigate the linguistic phenomena that may or may not be captured by this system.
  • To add voice assistant support.