Latest Update (14/09/2019)
In our endeavor to make a BERT based chatbot, we have got some success and reduced the inference time to couple of seconds. For more details, please check our latest case study Chatbot using BERT in Python.
Succeeding in our endeavour of making BERT based QnA multi-lingual, we have created fine-tuned models of Hindi, Arabic, Spanish and French. Below are the links to try the demo in all the available languages. More languages will be added in future.
(Note: Currently the multi-lingual SQuAD datasets and fine-tuned models we created are not to be published / open-sourced. Feel free to contact us if you think this can be used for any of your project or a commercial product.)
What is Question Answering system?
Question Answering (QnA) model is one of the very basic systems of Natural Language Processing. In QnA, the Machine Learning based system generates answers from the knowledge base or text paragraphs for the questions posed as input. Various machine learning methods can be implemented to build Question Answering systems.
Create a Question Answering Machine Learning model system which will take comprehension and questions as input, process the comprehension and prepare answers from it.
Using Natural Language Processing, we can achieve this objective. NLP helps the system to identify and understand the meaning of any sentences with proper contexts.
Implementation or Usage of QnA model in industry/project
- To develop a Common sense reasoning model that mimics likes
- Prepare FAQs from
knowledgebase, product manual or documentation.
- For creating
smartchatbot that can answer FAQs for different industries like Healthcare, Travel, Agriculture, Education, Manufacturing, Online commerce, etc.
With the massive growth of the web, we have a large amount of data. And only some text data are annotated. For a task in
BERT builds upon recent work in pre-training contextual representations, it is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus. BERT represents Contextual representation with both left context and right. BERT is conceptually simple and empirically powerful. BERT is better than previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP having features of Domain Adaptation. As per the BERT
BERT vs RNN
An RNN (theoretically) gives infinite left context (words to the left of the target word). But what we might like is to use each left and right contexts to see how well the word fits within the sentence.
RNNs is network architecture used for translation, processing language sequentially. The sequential nature makes difficult to fully achieve the power of parallel processing units like TPUs. RNN suffers from vanishing and exploding gradient problems. RNNs have short term memories as it’s not good for remembering their inputs over a long period
While a Transformer network applies self-attention mechanism which scans through every word and appends attention scores(weights) to the words. Transformers’ training efficiency and superior performance in capturing long-distance dependencies
BERT vs LSTM
The usage of LSTM models restricts the prediction ability to a short range. While BERT uses a “masked language model” (MLM). MLM objective permits the representation of both the left and the right context, which allows to pre-train a deep bidirectional Transformer.
BERT vs OPENAI GPT
When applying fine-tuning based approaches to token-level tasks such as SQuAD question-answering, it is crucial to incorporate context from both directions while with OpenAI GPT, it uses a left-to-right architecture, where every token can only be attended to previous tokens in the self-attention layers of the Transformer.
- GPT uses a sentence separator ([SEP]) and classifier token ([CLS]) which are only introduced at a fine-tuning time.
- BERT learns [SEP](special token), [CLS](classifier token) and sentence embeddings throughout pre-training.
- GPT used a similar learning rate of 5e-5 for all fine-tuning experiments. BERT chooses a task-specific fine-tuning learning rate that performs the most effective on the development set.
- As we have lots of training data it becomes quite difficult to train even with a GPU, so we used Google’s TPU for
- The time taken for inference was very large. Hence, we tweaked hyperparameters to make system accurate and give result in optimal time so we maintained a log for each hyperparameter and took an optimized combination of hyperparameters.
Our version of QnA using BERT can be tested at BERT NLP QnA Demo using Python.
(As the system is hosted on low-end configuration server, it currently takes around 50 seconds to process the sample comprehension and prepare answers from it. By increasing the resources the process can be completed in less time.)
Future Roadmap/improvement plan
- Train model further on CoQA to get better accuracy
- Improve the inference speed to make the system production ready
- Multi-language support (for example, Hindi or Gujarati comprehensions should also work)
- Investigate the linguistic phenomena that may or may not be captured by this system.
- To add voice assistant support.