NLP Tutorial: Question Answering System using ELECTRA + SQuAD on Colab TPU

After massive popularity of BERT pre-trained model, Google has now come up with another update ELECTRA! As per official blog of Google, Electra is more efficient NLP Model Pre-training method.

With this, Google has also open-sourced pre-trained models which can be used to fine-tune further for various Natural Language Processing (NLP) tasks like question and answering and sentiment analysis

What is ELECTRA?

ELECTRA is a new self-supervised Natural language Learning and understanding method developed by Google AI. ELECTRA means “Efficiently Learning an Encoder that Classifies Token Replacements Accurately”, efficiently utilizing the methods of LM(Language Modelling ) model and MLM(Masked Language Modelling) Model. ELECTRA Models are trained to understand the difference between “real” input tokens and “fake” input tokens.

ELECTRA methodology

The recent available methods for training are divided into categories: First language models (LMs) like GPT which perform the task left-to-right in unidirectional ways, and predicting the next word given the previous context. Second method which performed by masked language models (MLMs) e.g. BERT, RoBERTa, and ALBERT. These models masked out some of the words from the given input tokens and performed in bidirectional side for the input tokens. MLM techniques have the advantage of bidirectional performance but 15% input tokens masked out, reducing the amount learned from each sentence.

ELECTRA uses a novel pre-training method, called replaced token detection (RTD), that trains a bidirectional model (like a MLM) while learning from all input positions (like a LM). Inspired by generative adversarial networks (GANs), ELECTRA trains the model to understand the difference between “real” and “fake” input data. In this method instead of changing some of the input tokens by replacing tokens with “[MASK]” as in BERT, the input tokens with replacing some incorrect input tokens, plausibly we can call fake tokens. For example in the sentence ‘Rahul rides a new bicycle’ , the ‘bicycle’ token is replaced with the ‘bike’ token.

In the replaced token detection (RTD) method, The replacement tokens come from another neural network called the generator, which replaces the masked token with “fake” tokens. Generator can be any small masked language model(MLM) that is trained jointly with another model called Discriminator. Discriminator models have been structured like GAN (generative adversarial networks). The generator and discriminator share the same input word embeddings. During pre-training the model trains maximum with generator due to the difficulty of applying the GAN model to text. After pre-training, the generator is removed and only fine-tune the discriminator (the ELECTRA model). It considerably improves over previous methods while using less than 25% compute budget, performing comparably to RoBERTa and XLNet. Small ELECTRA models can train quickly on one GPU. ELECTRA models achieve state-of-the-art results on the SQuAD question answering.

Question and Answering (QnA) using Electra

We tried our hands to create Question and Answering system using Electra and we could do it very easily as the official github repository of Electra offers the code to fine-tune pre-trained model on SQuAD 2.0 dataset.

SQuAD 2.0 dataset is most popular Question and Answering dataset in English language which contains pair of thousands of questions and answers.

To test the QnA system using Electra, we used Colab.

Here is the Colab file Question Answering System using ELECTRA + SQuAD 2.0 on Colab TPU you can copy in your Drive and run it for your own test.


To use the colab file we have created, you will need following thigs:

You will need a GCP (Google Compute Engine) account and a GCS (Google Cloud Storage) bucket to run this colab file.

Please follow the Google Cloud for how to create GCP account and GCS bucket. You have $300 free credit to get started with any GCP product. You can learn more about it at

You can create your GCS bucket from here


The accuracy of ELECTRA based Question and Answering system is good. It generated almost same answers as BERT based Question and Answering system.

Though, we were expecting that it would be faster than BERT based QnA but actually it took longer to generate result.

Categories: Natural Language Processing NLP

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>