In a quest to replicate OpenAI’s GPT-3 model, the researchers at EleutherAI have been releasing powerful Language Models. After GPT-NEO, the latest one is GPT-J which has 6 billion parameters and it works on par compared to a similar size GPT-3 model.
In terms of zero-short learning, performance of GPT-J is considered to be the best compared to other open sourced language models.
As it was trained on large corpus of coding text (from GitHub and StackExchange), GPT-J performance is better than GPT-3 in writing code.
The demo of GPT-J 6B was made available by Eleuther AI earlier. As it is open source, we can download the model and run it on our own server.
Using https://github.com/kingoflolz/mesh-transformer-jax/ we could run GPT-J on our server and we are planning to publish docker container of the same so that anyone can download the container and run it on their server.
But we have been waiting for GPT-J to be included in the Huggingface repo so that we can use it directly via Huggingface.
The wait is finally over! Huggingface has finally added the GPT-J model in their repo. We have tried it and the results are very impressive.
Huggingface GPT-J model can be accessed from https://huggingface.co/EleutherAI/gpt-j-6B .
In our experiment, we tried to run it with and without GPU. Huggingface makes it very easy to use the model. Let us take you through how to run it on your own server.
GPT-J with CPU ( without GPU)
If you run GPT-J without GPU then you will need a system with approximately 50 GB of RAM.
Once you have the system with required RAM, python and virtualenv library installed, follow the steps below:
1. Create & activate Virtual Environment
virtualenv env_cpu --python=python3
source env_cpu/bin/activate
2. Clone & setup the transformers repository
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
3. Install torch
pip3 install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
4. Now create a python file and paste the below block of code.
from transformers import AutoTokenizer, AutoModelForCausalLM
import time
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
print("Model Loaded..!")
start_time = time.time()
input_text = "Google was founded by"
inputs = tokenizer(input_text, return_tensors="pt")
input_ids = inputs["input_ids"]
output = model.generate(
input_ids,
attention_mask=inputs["attention_mask"],
do_sample=True,
max_length=150,
temperature=0.8,
use_cache=True,
top_p=0.9
)
end_time = time.time() - start_time
print("Total Taken => ",end_time)
print(tokenizer.decode(output[0]))
5. That’s it! Run the python file and you will get the output.
GPT-J with GPU
We used T4 GPU on AWS EC2 instance. While using GPU, the required RAM on system is approximately 38GB.
While using GPU, the major difference is that we have to install torch with CUDA and use it in the code.
Follow the first 2 steps as provided in “GPT-J with CPU” and then
3. Install torch with CUDA
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
Note: Install compatible torch version according to your platform from here https://pytorch.org/
4. Create a python file and paste the below block of code.
from transformers import AutoTokenizer, GPTJForCausalLM
import time
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16).to("cuda")
print("Model Loaded..!")
start_time = time.time()
input_text = "Google was founded by"
inputs = tokenizer(input_text, return_tensors="pt")
input_ids = inputs["input_ids"].to("cuda")
output = model.generate(
input_ids,
attention_mask=inputs["attention_mask"].to("cuda"),
do_sample=True,
max_length=150,
temperature=0.8,
use_cache=True,
top_p=0.9
)
end_time = time.time() - start_time
print("Total Time => ",end_time)
print(tokenizer.decode(output[0]))
5. Run the file and you will get the output text generated by GPT-J.
Multiple GPU Support
In our above example, we used a single T4 GPU. If you would like to use multiple GPUs, then you can add it in your system and then make the following change in the code, so that model will utilize multiple GPUs available on the system.
from transformers import AutoTokenizer, GPTJForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
model.parallelize()
As we can see, we just need to add “model.parallelize()” in the code.
Go give it a try and run GPT-J on your own server! If you face any difficulty then do let us know in the comment.
We offer Natural Language Processing consultation services. Do reach out to us at letstalk@pragnakalp.com for any GPT-J, GPT-3, BERT or NLP related project.