Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model used for text generation created by OpenAI. GPT-3 showed the amazing potential for a really smart language model to generate text and has the ability to do amazing task such as Question-Answering, Summarization, Semantic Search, Chatbot, Writing poetry, or an essay. Among them, we have already experimented with Question Answering using GPT3, Ads Generation, Sentence Paraphrasing and Intent Classification. Now let’s do some experiments for a semantic search task using GPT-3 API endpoint provided by OpenAI.

OpenAI’s API for search allows you to do a semantic search among a group of documents. Based on the semantically related query text, it provides the scores to each document and gives them ranks.

As it is API based access, it is easy to use. We just have to provide text in form of documents and then query the text. API will respond back with multiple results matching the query sorted by relevance score.

Below are steps to use OpenAI API for semantic search.

Installing openAI for semantic search

Here we are using python for API calls. However, you can also make a cURL request.

Let’s create virtualenv by following steps:

virtualenv env_gpt --python=python3
source env_gpt/bin/activate

Next, install OpenAI python package to use its API and engines.

pip install openai

Semantic search using GPT3

To perform semantic search, first we need to upload our documents in the jsonl file format. The following is a .jsonl file format sample.

{"text": "Hello OpenAI", "metadata": "sample data"}

Next we will create a .jsonl file for Semantic Search, name it sample_search.jsonl and copy the following code in it:

{“text”: “The rebuilding of economies after the COVID-19 crisis offers a unique opportunity to transform the global food system and make it resilient to future shocks, ensuring environmentally sustainable and healthy nutrition for all. To make this happen, United Nations agencies like the Food and Agriculture Organization, the United Nations Environment Program, the Intergovernmental Panel on Climate Change, the International Fund for Agricultural Development, and the World Food Program, collectively, suggest four broad shifts in the food system.”, “metadata”: “Economic reset”}
{“text”: “In the past few weeks healthcare professionals have been fully focussed caring for enormous numbers of people infected with COVID-19. They did an amazing job. Not in the least because healthcare professionals and leaders have been using continues improvement as part of their accreditation program for many years. It has become part of their DNA. This has enabled them to change many processes as needed during COVID-19, using a cross-functional problem solving approach in (very) rapid improvement cycles.”, “metadata”: “Supporting adaptive healthcare”}

Now it’s time to upload this jsonl file using API key by setting purpose as search for semantic search. Create a file name upload_file.py and copy the below code and provide your OpenAI API Key.

import openai
openai.api_key = "YOUR-API-KEY"
response = openai.File.create(file=open("sample_doc.jsonl"), purpose="search")
print(response)

When you run the upload_file.py file, you will get the response below.

Copy ID from the response of the above step.

Now let’s test it. To test the capability of GPT-3 semantic search, provide your query in the query text parameter.

import openai
openai.api_key = "YOUR-API-KEY"

search_response = openai.Engine("davinci").search(
    search_model="davinci", 
    query="healthcare", 
    max_rerank=5,
    file="file-8ejPA5eM13J4J0dWy3bBbvTf",
    return_metadata=True
)

print(search_response)

Let’s understand parameters of the openai.Engine.search.

  • search_model:
    • OpenAI’s API let us use different engines like, Davinci, Babbage, Ada, Curie, etc.
    • In which Davinci is the most powerful engine and costliest too.
  • query:
    • Query text is the text used for the semantic search.
  • max_rerank:
    • The output documents are re-ranked by semantic search in the response. Where response contains documents with most max_rerank.
  • file:
    • File id which we have got while uploading the documents.
  • return_metadata
    • Enable to get metadata in the response.

And the response will look like as per below image:

In JSON response, we get the document text which was matched with the query and “score” shows the relevance of the result. In our test, we provided only 1 document, if we provide multiple documents then we will get multiple results we different score.

As we can see it is simple to perform semantic search using GPT-3 for a given query. GPT-3’s results, on the other hand, are quite amazing.

Limitation

There is a limitation on size of the document we upload. There must be no more than 2048 tokens in the document. And we can upload maximum 200 documents.


Do let us know in comments if you have any query regarding OpenAI Semantic Search.


Categories: GPT-3 How To Natural Language Processing NLP

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*