NLP Tutorial: Question Answering System using BERT + SQuAD on Colab TPU

Our case study Question Answering System in Python using BERT NLP and BERT based Question and Answering system demo, developed in Python + Flask, got hugely popular garnering hundreds of visitors per day. We got a lot of appreciative and lauding emails praising our QnA demo. Along with that, we also got number of people asking about how we created this QnA demo. And till the day, we keep getting requests on how to develop such a QnA system using BERT pre-trained model open-sourced by Google.

To start with, the readme file on the official GitHub repository of BERT provides a good amount of information about how to fine-tune the model on SQuAD 2.0 but we could see that developers are still facing issues. So, we decided to publish a step-by-step tutorial to fine-tune the BERT pre-trained model and generate inference of answers from the given paragraph and questions on Colab using TPU.

In this tutorial, we are not going to cover how to create web-based interface using Python + Flask. We’ll just cover the fine-tuning and inference on Colab using TPU. You can create your own interface using Flask or Django. And if you want the exact same demo like ours then we will provide it with some nominal charges. For more information please refer Buy Question n Answering Demo using BERT in Python + Flask.


In this tutorial we will see how to perform a fine-tuning task on SQuAD using Google Colab, for that we will use BERT GitHub Repository, BERT Repository includes:
1) TensorFlow code for the BERT model architecture.
2) Pre-trained models for both the lowercase and cased version of BERT-Base and BERT-Large.

You can also refer or copy our colab file to follow the steps.

Steps to perform BERT Fine-tuning on Google Colab

1) Change Runtime to TPU

On the main menu, click on Runtime and select Change runtime type. Set “TPU” as the hardware accelerator. Below are the images to show how you can change the runtime to TPU.

After Clicking on “Change runtime type”, Select TPU from the dropdown option as given in the below figure.

Select TPU from the dropdown.

2) Clone the BERT github repository

BERT, or Bidirectional Embedding Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. You can find the academic paper here: BERT has two stages: Pre-training and fine-tuning.
Pre-training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one-time procedure. BERT has released a number of pre-trained models. Most NLP researchers will never need to pre-train their own model from scratch.
Fine-tuning is inexpensive. One can replicate all the results given in the paper, in at most 1 hour on a single Cloud TPU, or a few hours on a GPU. For example, SQuAD can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%.

So our first step is to Clone the BERT github repository, below is the way by which you can clone the repo from github. Now get inside the Bert repo using “cd” command


BERT Pretrained Model List :

  • BERT-Large, Uncased (Whole Word Masking): 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Large, Cased (Whole Word Masking): 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Large, Uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Cased: 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Base, Multilingual Uncased (Orig, not recommended): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

BERT has release BERT-Base and BERT-Large models, that have uncased and cased version. Uncased means that the text is converted to lowercase before performing Workpiece tokenization, e.g., John Smith becomes john smith, on the other hand, cased means that the true case and accent markers are preserved.

When using a cased model, make sure to pass –do_lower=False at the time of training.

You can download any model of your choice. We have used the BERT-Large-Uncased Model.

4) Download the SQUAD2.0 Dataset

For the Question Answering task, we will be using SQuAD2.0 Dataset.

SQuAD Stanford Question Answering Dataset is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. You can download the dataset from SQUAD site

5) Set up your TPU environment

  • Verify that you are connected to a TPU device
  • You will get know your TPU Address which is used at the time of fine-tuning
  • Perform Google Authentication to access your bucket
  • Upload your credentials to TPU to access your GCS bucket

6) Create an output directory

Prerequisite: You will need a GCP (Google Compute Engine) account and a GCS (Google Cloud Storage) bucket to run the colab file. Please follow the Google Cloud for how to create a GCP account and GCS bucket. You have $300 free credit to start with any GCP product, learn more about it at
You can create your GCS bucket from here

As we are using the Cloud TPU, we need to store the pre-trained model and the output directory in the Google Cloud Storage. If you are not storing it on Bucket you may face the following error :

You will get your fine_tuned model in the Google cloud storage bucket after completion of training. For that, you need to provide your BUCKET name and OUPUT DIRECTORY name.

7) Move Pretrained Model to GCS Bucket

Need to move Pre-trained Model at GCS (Google Cloud Storage) bucket, as Local File System is not Supported on TPU. If you don’t move your pre-trained model to TPU you may face the error.

The gsutil mv command allows you to move data between your local file system and the cloud, move data within the cloud, and move data between cloud storage providers.

8) Training

Below is the command to run the training. To run the training on TPU you need to make sure about below Hyperparameter, that is tpu must be true and provide the tpu_address that we have found above.


Create Testing File

We are creating input_file.json as a blank JSON file and then add the data in the file in the SQuAD dataset format.

  • touch is used to create a file
  • %%writefile is used to write a file in the colab

You can pass your own questions and context in the below file.


Below is the command to perform your own custom prediction, that is you can change the input_file.json by providing your paragraph and questions after then execute the below command.

To make it easier for you, we have already created a Colab file which you can copy in your Google Drive and execute the commands. Access the colab file: Question Answering System using BERT + SQuAD on Colab TPU.

If you have any further questions or doubt then please feel free to post them in comments. We’ll get back to you.

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">