June 18, 2022 No Comments

Introduction

Facebook/meta AI has introduced a new large language model trained on billions of parameters called OPT (Open Pre-trained Transformers), ranging from 125M to 175B parameters. It can be used to generate creative text, solve simple math problems, answer reading comprehension questions, and address other Natural Language Processing related issues.

We tried a few different things with the OPT model to generate the text. By following the instructions below, we will learn how to use the OPT-350M model for next word generation up to 30 words.

Resource Requirements

We have tested 125M, 350M, and 1.3B models on the Google Colab pro with one GPU and 2.7 model on 2 T4 AWS GPU instances. We have also tried to test the 6.7B model but for that 2 T4 GPUs were not sufficient.

Colab Pro:
1.10 GB RAM
2 x vCPU  
T4 GPU

AWS Instance:
8.89 GB RAM
32 x vCPU
2 T4 GPU

Prerequisites

Before you use OPT models, you must ensure that all the required packages are installed in your system. Do the following to install all the required packages:

1. Install Pytorch

				
					pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
				
			

2. Install Megatron

				
					git clone https://github.com/patrickvonplaten/Megatron-LM.git 
cd Megatron-LM 
pip3 install six regex 
pip3 install -e .
				
			

3. Install fairscale

				
					pip install fairscale==0.4.1
				
			

4. Install metaseq

				
					git clone https://github.com/patrickvonplaten/metaseq.git 
cd metaseq 
pip3 install -e .
				
			

5. Install transformers

				
					pip install transformers
				
			

Clone the Github Repository for the model

After our packages got installed successfully, it is time to clone the model repository. In this tutorial we are going to use 350M, you can clone your required mode repo from OPT Models as shown in below:

To clone repo for 350M model:

				
					git lfs install
git clone https://huggingface.co/patrickvonplaten/opt_metaseq_350m
				
			

To clone repo for 150M model:

				
					git lfs install git clone https://huggingface.co/patrickvonplaten/opt_metaseq_150m
				
			

To clone repo for 1.3B model:

				
					git lfs install git clone https://huggingface.co/patrickvonplaten/opt_metaseq_1300m
				
			

To clone repo for 2.7B model:

				
					git lfs install
git clone https://huggingface.co/patrickvonplaten/opt_metaseq_2700m
				
			

Load the OPT model and use it for text generation

Now we are going to focus on how to load the model and how to generating next words upto 30 words. Please create a python file named “run_model.py” and paste a below code in that file.

				
					import os
from transformers import AutoTokenizer, GPT2Tokenizer
from megatron.initialize import initialize_megatron
from metaseq import checkpoint_utils
import torch

path = "/content/opt_metaseq_350m/model"
metaseq_path = "/content/metaseq"

# arguments taken from: https://arxiv.org/pdf/2205.01068.pdf | table 1
initialize_megatron(args_defaults={
    "micro_batch_size": 1, 
    "num_layers": 24, 
    "hidden_size": 1024, 
    "num_attention_heads": 16,
    "max_position_embeddings": 2048, # TODO check if it is the correct args
    "encoder_seq_length": 2048 # TODO check if it is the correct args
})

tokenizer = GPT2Tokenizer.from_pretrained("facebook/bart-large")
tokenizer.save_pretrained(path)

checkpoint = checkpoint_utils.load_model_ensemble_and_task(
    [os.path.join(path, "reshard.pt")],
#    [os.path.join(path, "reshard-model_part-0.pt"), os.path.join(path, "reshard-model_part-1.pt")],
    arg_overrides={
        "vocab_filename": os.path.join(path, "vocab.json"),
        "merges_filename": os.path.join(path, "merges.txt"),
    }
)

model = checkpoint[0][0].eval()
model.to('cuda')
start = 'Natural language processing is a subfield of'
indexed_tokens = tokenizer.encode(start)
for i in range(30):
  tokens_tensor = torch.tensor([indexed_tokens])
  tokens_tensor = tokens_tensor.to('cuda')
  with torch.no_grad():
    outputs = model(tokens_tensor)
    predictions = outputs[0]
    predicted_index = torch.argmax(predictions[0, -1, :]).item()
    # print(i,tokenizer.decode(predicted_index))
    indexed_tokens = indexed_tokens + [predicted_index]

predicted_text = tokenizer.decode(indexed_tokens)
print("------------------------------------")
print(start)
print("------------------------------------")
print(predicted_text)
				
			

Run the following command

				
					torchrun run_model.py --pipeline-model-parallel-size 1 --tensor-model-parallel-size 1 
				
			

Results

The result will show us the words that can come next to the sentence we provided using 350M. The output should look like the picture below.

We have taken the same text as an input and tested for different models. Here are the results.

Result of OPT-125M

Test 1:
————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of linguistics 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090

Test 2:
————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to thank everyone who participated!” exclaimed 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090

Test 3:
————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique whereby algorithms compute probabilities based upon probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from

Result of OPT-350M

————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of computer science that focuses on the development of computer programs that can be used to process and interpret text.
The term “language processing” is used

————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to thank you for all the wonderful things you do for us.
I love you.
I love you.
I love you.
I love

————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique that uses machine learning to classify data. The classification process is based on the classification of data by using a set of rules. The classification process is based

Result of OPT-1.3B

————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of machine learning that uses artificial intelligence techniques to analyze text documents to extract meaning from them. Natural language processing techniques are used to analyze text documents such as emails

————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to share some photos from our trip to Yosemite Valley yesterday afternoon.
We drove up Yosemite Valley Road from Yosemite Falls Road and parked at Yosemite Falls Road Parking

————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique that identifies patterns in data sets using statistical models. Classification algorithms classify data sets based on similarities between data sets. Classification algorithms classify data sets based on similarities

Result of OPT-2.7B

————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of artificial intelligence concerned with translating languages into understandable forms. Languages differ greatly across cultures and languages differ greatly across languages. Languages differ greatly across cultures and languages differ

————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to celebrate it with you!”
She smiled brightly at him.
He smiled back gratefully.
They walked together toward the lakefront path along Lake

————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique whereby computers classify objects based upon attributes extracted from images captured by cameras mounted atop automobiles. Classification algorithms classify objects based upon attributes extracted from images captured by cameras

We hope you liked our OPT Model Experiment and now you are able to produce the text by following the given steps. Comment below if you faced any issue while following the document. We are further working generating more text with the OPT model and fine-tuning it.

 

Write a comment

Your email address will not be published. Required fields are marked *

Want to talk to an Expert Developer?

Our experts in Generative AI, Python Programming, and Chatbot Development can help you build innovative solutions and scale your business faster.