NLP Tutorial: Question Answering System using BERT + SQuAD on Colab TPU

Latest Update (6th March 2020)

All our demos Question Answering System In Python Using BERT and Closed-Domain Chatbot Using BERT In Python can be purchased now.

Visit Buy Question N Answering Demo Using BERT In Python + Flask or Buy Closed-Domain BERT Based Chatbot In Python + Flask or contact us at

Our case study Question Answering System in Python using BERT NLP and BERT based Question and Answering system demo, developed in Python + Flask, got hugely popular garnering hundreds of visitors per day. We got a lot of appreciative and lauding emails praising our QnA demo. Along with that, we also got number of people asking about how we created this QnA demo. And till the day, we keep getting requests on how to develop such a QnA system using BERT pre-trained model open-sourced by Google.

To start with, the readme file on the official GitHub repository of BERT provides a good amount of information about how to fine-tune the model on SQuAD 2.0 but we could see that developers are still facing issues. So, we decided to publish a step-by-step tutorial to fine-tune the BERT pre-trained model and generate inference of answers from the given paragraph and questions on Colab using TPU.

In this tutorial, we are not going to cover how to create web-based interface using Python + Flask. We’ll just cover the fine-tuning and inference on Colab using TPU. You can create your own interface using Flask or Django. And if you want the exact same demo like ours then we will provide it with some nominal charges. For more information please refer Buy Question n Answering Demo using BERT in Python + Flask.


In this tutorial we will see how to perform a fine-tuning task on SQuAD using Google Colab, for that we will use BERT GitHub Repository, BERT Repository includes:
1) TensorFlow code for the BERT model architecture.
2) Pre-trained models for both the lowercase and cased version of BERT-Base and BERT-Large.

You can also refer or copy our colab file to follow the steps.

Steps to perform BERT Fine-tuning on Google Colab

1) Change Runtime to TPU

On the main menu, click on Runtime and select Change runtime type. Set “TPU” as the hardware accelerator. Below screeenshot will help you understand how you can change the runtime to TPU.

After Clicking on “Change runtime type”, Select TPU from the dropdown option as given in the below figure.

Select TPU from the dropdown.

2) Clone the BERT github repository

BERT, or Bidirectional Embedding Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. You can find the academic paper here: BERT has two stages: Pre-training and fine-tuning.
Pre-training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one-time procedure. BERT has released a number of pre-trained models. Most NLP researchers will never need to pre-train their own model from scratch.
Fine-tuning is inexpensive. One can replicate all the results given in the paper, in at most 1 hour on a single Cloud TPU, or a few hours on a GPU. For example, SQuAD can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%.

So our first step is to Clone the BERT github repository, below is the way by which you can clone the repo from github. Now get inside the Bert repo using “cd” command

!git clone
cd bert


BERT Pretrained Model List :

  • BERT-Large, Uncased (Whole Word Masking): 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Large, Cased (Whole Word Masking): 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Large, Uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Cased: 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Base, Multilingual Uncased (Orig, not recommended): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

BERT has released BERT-Base and BERT-Large models, that have uncased and cased version. Uncased means that the text is converted to lowercase before performing Workpiece tokenization, e.g., John Smith becomes john smith, on the other hand, cased means that the true case and accent markers are preserved.

When using a cased model, make sure to pass –do_lower=False at the time of training.

You can download any model of your choice. We have used the BERT-Large-Uncased Model.

# Unzip the pretrained model

4) Download the SQUAD2.0 Dataset

For the Question Answering task, we will be using SQuAD2.0 Dataset.

SQuAD (Stanford Question Answering Dataset) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

SQuAD2.0 combines the 100,000+ questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. You can download the dataset from SQUAD site

#Download the SQUAD train and dev dataset

5) Set up your TPU environment

  • Verify that you are connected to a TPU device
  • You will get know your TPU Address which is used at the time of fine-tuning
  • Perform Google Authentication to access your bucket
  • Upload your credentials to TPU to access your GCS bucket

Using code below you can do the above mentioned 4 points:

import datetime
import json
import os
import pprint
import random
import string
import sys
import tensorflow as tf

assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!'
TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print('TPU address is => ', TPU_ADDRESS)

from google.colab import auth
with tf.Session(TPU_ADDRESS) as session:
  print('TPU devices:')

  # Upload credentials to TPU.
  with open('/content/adc.json', 'r') as f:
    auth_info = json.load(f), credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.

6) Create an output directory

Prerequisite: You will need a GCP (Google Compute Engine) account and a GCS (Google Cloud Storage) bucket to run the colab file. Please follow the Google Cloud for how to create a GCP account and GCS bucket. You have $300 free credit to start with any GCP product, learn more about it at
You can create your GCS bucket from here

As we are using the Cloud TPU, we need to store the pre-trained model and the output directory in the Google Cloud Storage. If you are not storing it on Bucket you may face the following error :

ERROR:tensorflow:Error recorded from training_loop: From /job:worker/replica:0/task:0:
Unsuccessful TensorSliceReader constructor: Failed to get matching files on uncased_L-24_H-1024_A-16/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: 'uncased_L-24_H-1024_A-16/bert_model.ckpt')
[[node checkpoint_initializer_14 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ ]]

You will get your fine_tuned model in the Google cloud storage bucket after completion of training. For that, you need to provide your BUCKET name and OUPUT DIRECTORY name.

BUCKET = 'bertnlpdemo' #@param {type:"string"}
assert BUCKET, '*** Must specify an existing GCS bucket name ***'
output_dir_name = 'bert_output' #@param {type:"string"}
BUCKET_NAME = 'gs://{}'.format(BUCKET)
OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET,output_dir_name)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))

7) Move Pretrained Model to GCS Bucket

Need to move Pre-trained Model at GCS (Google Cloud Storage) bucket, as Local File System is not Supported on TPU. If you don’t move your pre-trained model to TPU you may face the error.

The gsutil mv command allows you to move data between your local file system and the cloud, move data within the cloud, and move data between cloud storage providers.

!gsutil mv /content/bert/uncased_L-24_H-1024_A-16 $BUCKET_NAME

8) Training

Below is the command to run the training. To run the training on TPU you need to make sure about below Hyperparameter, that is tpu must be true and provide the tpu_address that we have found above.


!python \
  --vocab_file=$BUCKET_NAME/uncased_L-24_H-1024_A-16/vocab.txt \
  --bert_config_file=$BUCKET_NAME/uncased_L-24_H-1024_A-16/bert_config.json \
  --init_checkpoint=$BUCKET_NAME/uncased_L-24_H-1024_A-16/bert_model.ckpt \
  --do_train=True \
  --train_file=train-v2.0.json \
  --do_predict=True \
  --predict_file=dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --use_tpu=True \
  --tpu_name=grpc:// \
  --max_seq_length=384 \
  --doc_stride=128 \
  --version_2_with_negative=True \

Create Testing File

We are creating input_file.json as a blank JSON file and then add the data in the file in the SQuAD dataset format.

  • touch is used to create a file
  • %%writefile is used to write a file in the colab

You can pass your own questions and context in the below file.

!touch input_file.json
%%writefile input_file.json
    "version": "v2.0",
    "data": [
            "title": "your_title",
            "paragraphs": [
                    "qas": [
                            "question": "Who is current CEO?",
                            "id": "56ddde6b9a695914005b9628",
                            "is_impossible": ""
                            "question": "Who founded google?",
                            "id": "56ddde6b9a695914005b9629",
                            "is_impossible": ""
                            "question": "when did IPO take place?",
                            "id": "56ddde6b9a695914005b962a",
                            "is_impossible": ""
                    "context": "Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14 percent of its shares and control 56 percent of the stockholder voting power through supervoting stock. They incorporated Google as a privately held company on September 4, 1998. An initial public offering (IPO) took place on August 19, 2004, and Google moved to its headquarters in Mountain View, California, nicknamed the Googleplex. In August 2015, Google announced plans to reorganize its various interests as a conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and will continue to be the umbrella company for Alphabet's Internet interests. Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the CEO of Alphabet."                


Below is the command to perform your own custom prediction, that is you can change the input_file.json by providing your paragraph and questions after then execute the below command.

!python \
  --vocab_file=$BUCKET_NAME/uncased_L-24_H-1024_A-16/vocab.txt \
  --bert_config_file=$BUCKET_NAME/uncased_L-24_H-1024_A-16/bert_config.json \
  --init_checkpoint=$OUTPUT_DIR/model.ckpt-10859 \
  --do_train=False \
  --max_query_length=30  \
  --do_predict=True \
  --predict_file=input_file.json \
  --predict_batch_size=8 \
  --n_best_size=3 \
  --max_seq_length=384 \
  --doc_stride=128 \

To make it easier for you, we have already created a Colab file which you can copy in your Google Drive and execute the commands. You can access the colab file at: Question Answering System using BERT + SQuAD on Colab TPU.

Feel free to comment your doubts/questions. We would be glad to help you.

If you are looking for Chatbot Development or Natural Language Processing services then do contact us or send your requirement at We would be happy to offer our expert services.

Categories: Natural Language Processing NLP

28 Replies to “NLP Tutorial: Question Answering System using BERT + SQuAD on Colab TPU”

  1. With out creating bucket , we canable to run the squad or not .
    And you didn’t used any Tensor flow or pytorch transformers .how it is possible .

    1. Hello Uma,
      We have to create the bucket on Google Storage as the code doesn’t work with local files.
      And we are using tensorflow. You will find that in requirements.txt
      tensorflow >= 1.11.0
      So, tensorflow is being installed and used to run the script.

      1. Hi , which exact version of Tensorflow are you using? I am using
        class AdamWeightDecayOptimizer(tf.train.Optimizer):
        AttributeError: module ‘tensorflow._api.v2.train’ has no attribute ‘Optimizer’

        1. Hello Matteo,
          We use tensorflow version 1.14. Please use that.
          The error you have mentioned will be solved if you use 1.14. Give it a try and let us know please.

  2. And how do you given the Id number to the newly creating json file , do you the id numbers .
    How we are able to know the which id number we have to give .

    1. Hi Uma,
      We have randomly given the ID numbers, You can give the ID of your choice, but for each question there should be a unique ID.

  3. How did you generate the value for id in the testing file ?

    1. Hi Sonam,
      We have randomly given the ID numbers, You can give the ID of your choice, but for each question there should be a unique ID.

  4. Hi, Thanks for providing the code snippet it is very helpful. I had one doubt, where does we get the output/answers of the questions used in the test file? I tried to check in the output/ folder but was not able to find the output.

    1. Hello Yogesh,
      We are glad that you found our code snippet useful!

      Regarding your question, the output will be generated in the directory which you have given in the –output_dir parameter. The “output” named directory will be created in the Colab file system in BERT folder, and it contains three files named
      (i) eval.tf_record
      (ii) nbest_predictions.json
      (iii) predictions.json
      You need to check the prediction.json file for your answers.

      Please let us know if you need further assistance.

  5. Hi, thank you for this article it was extremely helpful.

    For a QA system with a UI, I assume the backend runs the file programmatically. I was wondering how the hyperparameters are passed along with the command to run the file.

    1. Thanks Sameer. We are glad that you found it useful.

      To run python file with passing parameters, you can use os.system(command). It will let you run the python file as command and you can also pass the parameters.

      Hope that answers your question!

      1. I have another question.

        Is the command for running the model input.json file as shown in the article only for SQuAD version 2.0 or can the same command be used for SQuAD v1.1 (with a different checkpoint file ofcourse). If not, could you tell me the command for v1.1.

        Thank you so much for your help.

        1. Hi Sameer,
          You can use the same command for inference on SQuAD v1.1 too.

  6. Hi. Thank you for your article.

    After training on SQuAD, we get 3 checkpoints with the following extensions:, .ckpt.index and .ckpt.meta. I wanted to ask which ckpt file to use while running “” for our input file.

    Thank you!

    1. Hi Hemant,
      While running you will need all 3 files, you need to pass “model.ckpt-7299” in the “” command. The script will utilize all 3 ckpt-7299 files automatically.

      Hope this answers you question, Please let us know if you have any other questions.

  7. Hi, thank you so much for this incredible tutorial! I’m confused as to how to find my tpu_name — I’ve set up the GCS bucket, but haven’t figured out where to find the tpu_name address?

    1. Hello C,
      You should get the TPU address in step “5) Set up your TPU environment”. Can you please try it and check?
      If you still don’t get the TPU address then take a screenshot of step 5 and post here please.

  8. hi thanks for the blog..its very useful..
    Can you please guide me how to train squad1.1 for bert.
    I am getting error as KeyError: ‘is_impossible’..

    1. Hi Anuj,
      We are glad that you found it useful. To train squad1.1 you need to set the “version_2_with_negative”:True hyper-parameter to the False i.e version_2_with_negative:False.

  9. Great article! Thank you for sharing. It would be great if this ran on TF 2 as 1.14 seems very outdated at this time.

    1. Hello Jean,

      If you want to run on TF 2 then you can do that by changing all the outdated methods name to the new one in the BERT repo.
      Do let us know how it works for you if you give it a try.

  10. Hey! Great work! I couldn’t figure it out from the BERT repo but this was perfect! Can you tell me how much time it’ll take to run and what F1 score does this code achieve? Thank you!

    1. Hi Bhargav,
      You may require about 2 hours on TPU. And we haven’t tried the F1 score but according to BERT official repo for BERT-Large, Uncased (Original) Model F1 score is 91.0.
      Hope that helps!

  11. Hey! If I use Squad 2, I’m getting a warning “Failed to connect to the Tensorflow master. The TPU worker may not be ready (still scheduling) or …” and the training is stuck.
    In Squad 1.1, it’s stuck with this example “I0512 06:26:55.105924 140589080323968] start_position: 53
    INFO:tensorflow:end_position: 54
    I0512 06:26:55.106061 140589080323968] end_position: 54
    INFO:tensorflow:answer: february 1848
    I0512 06:26:55.106162 140589080323968] answer: february 1848” The training is again stuck. Can you help me on this?

    1. Hello Dharun,
      It looks like that TPU is not being allocated in your colab file. You can try resetting the runtime (From Menu RUNTIME -> Factory Reset Runtime) or use colab with any other Google account.

  12. How can I do this in a closed domain way ? Like i would like to get answers from my own document for the questions asked.

  13. Where can I find dataset in german for fine tuning the model?

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>