![](http://www.pragnakalp.com/wp-content/uploads/2022/06/exploring-the-text-generation-with-opt.jpg)
Introduction
Facebook/meta AI has introduced a new large language model trained on billions of parameters called OPT (Open Pre-trained Transformers), ranging from 125M to 175B parameters. It can be used to generate creative text, solve simple math problems, answer reading comprehension questions, and address other Natural Language Processing related issues.
We tried a few different things with the OPT model to generate the text. By following the instructions below, we will learn how to use the OPT-350M model for next word generation up to 30 words.
Resource Requirements
We have tested 125M, 350M, and 1.3B models on the Google Colab pro with one GPU and 2.7 model on 2 T4 AWS GPU instances. We have also tried to test the 6.7B model but for that 2 T4 GPUs were not sufficient.
Colab Pro:
1.10 GB RAM
2 x vCPU
T4 GPU
AWS Instance:
8.89 GB RAM
32 x vCPU
2 T4 GPU
Prerequisites
Before you use OPT models, you must ensure that all the required packages are installed in your system. Do the following to install all the required packages:
1. Install Pytorch
pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
2. Install Megatron
git clone https://github.com/patrickvonplaten/Megatron-LM.git
cd Megatron-LM
pip3 install six regex
pip3 install -e .
3. Install fairscale
pip install fairscale==0.4.1
4. Install metaseq
git clone https://github.com/patrickvonplaten/metaseq.git
cd metaseq
pip3 install -e .
5. Install transformers
pip install transformers
Clone the Github Repository for the model
After our packages got installed successfully, it is time to clone the model repository. In this tutorial we are going to use 350M, you can clone your required mode repo from OPT Models as shown in below:
To clone repo for 350M model:
git lfs install
git clone https://huggingface.co/patrickvonplaten/opt_metaseq_350m
To clone repo for 150M model:
git lfs install git clone https://huggingface.co/patrickvonplaten/opt_metaseq_150m
To clone repo for 1.3B model:
git lfs install git clone https://huggingface.co/patrickvonplaten/opt_metaseq_1300m
To clone repo for 2.7B model:
git lfs install
git clone https://huggingface.co/patrickvonplaten/opt_metaseq_2700m
Load the OPT model and use it for text generation
Now we are going to focus on how to load the model and how to generating next words upto 30 words. Please create a python file named “run_model.py” and paste a below code in that file.
import os
from transformers import AutoTokenizer, GPT2Tokenizer
from megatron.initialize import initialize_megatron
from metaseq import checkpoint_utils
import torch
path = "/content/opt_metaseq_350m/model"
metaseq_path = "/content/metaseq"
# arguments taken from: https://arxiv.org/pdf/2205.01068.pdf | table 1
initialize_megatron(args_defaults={
"micro_batch_size": 1,
"num_layers": 24,
"hidden_size": 1024,
"num_attention_heads": 16,
"max_position_embeddings": 2048, # TODO check if it is the correct args
"encoder_seq_length": 2048 # TODO check if it is the correct args
})
tokenizer = GPT2Tokenizer.from_pretrained("facebook/bart-large")
tokenizer.save_pretrained(path)
checkpoint = checkpoint_utils.load_model_ensemble_and_task(
[os.path.join(path, "reshard.pt")],
# [os.path.join(path, "reshard-model_part-0.pt"), os.path.join(path, "reshard-model_part-1.pt")],
arg_overrides={
"vocab_filename": os.path.join(path, "vocab.json"),
"merges_filename": os.path.join(path, "merges.txt"),
}
)
model = checkpoint[0][0].eval()
model.to('cuda')
start = 'Natural language processing is a subfield of'
indexed_tokens = tokenizer.encode(start)
for i in range(30):
tokens_tensor = torch.tensor([indexed_tokens])
tokens_tensor = tokens_tensor.to('cuda')
with torch.no_grad():
outputs = model(tokens_tensor)
predictions = outputs[0]
predicted_index = torch.argmax(predictions[0, -1, :]).item()
# print(i,tokenizer.decode(predicted_index))
indexed_tokens = indexed_tokens + [predicted_index]
predicted_text = tokenizer.decode(indexed_tokens)
print("------------------------------------")
print(start)
print("------------------------------------")
print(predicted_text)
Run the following command
torchrun run_model.py --pipeline-model-parallel-size 1 --tensor-model-parallel-size 1
Results
The result will show us the words that can come next to the sentence we provided using 350M. The output should look like the picture below.
We have taken the same text as an input and tested for different models. Here are the results.
Result of OPT-125M
Test 1:
————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of linguistics 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090
Test 2:
————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to thank everyone who participated!” exclaimed 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090 4090
Test 3:
————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique whereby algorithms compute probabilities based upon probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from probabilities derived from
Result of OPT-350M
————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of computer science that focuses on the development of computer programs that can be used to process and interpret text.
The term “language processing” is used
————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to thank you for all the wonderful things you do for us.
I love you.
I love you.
I love you.
I love
————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique that uses machine learning to classify data. The classification process is based on the classification of data by using a set of rules. The classification process is based
Result of OPT-1.3B
————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of machine learning that uses artificial intelligence techniques to analyze text documents to extract meaning from them. Natural language processing techniques are used to analyze text documents such as emails
————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to share some photos from our trip to Yosemite Valley yesterday afternoon.
We drove up Yosemite Valley Road from Yosemite Falls Road and parked at Yosemite Falls Road Parking
————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique that identifies patterns in data sets using statistical models. Classification algorithms classify data sets based on similarities between data sets. Classification algorithms classify data sets based on similarities
Result of OPT-2.7B
————————————
Input: Natural language processing is a subfield of
————————————
Output: Natural language processing is a subfield of artificial intelligence concerned with translating languages into understandable forms. Languages differ greatly across cultures and languages differ greatly across languages. Languages differ greatly across cultures and languages differ
————————————
Input: Today is a beautiful day and I want to
————————————
Output: Today is a beautiful day and I want to celebrate it with you!”
She smiled brightly at him.
He smiled back gratefully.
They walked together toward the lakefront path along Lake
————————————
Input: Text classification is a machine learning technique
————————————
Output: Text classification is a machine learning technique whereby computers classify objects based upon attributes extracted from images captured by cameras mounted atop automobiles. Classification algorithms classify objects based upon attributes extracted from images captured by cameras
We hope you liked our OPT Model Experiment and now you are able to produce the text by following the given steps. Comment below if you faced any issue while following the document. We are further working generating more text with the OPT model and fine-tuning it.