In today’s rapidly evolving digital landscape, the rise of Generative AI has been nothing short of remarkable. This fascinating branch of artificial intelligence has gained significant momentum, enabling us to harness the limitless potential of AI-powered creativity. At the forefront of this creative revolution stands Vertex AI, Google Cloud’s comprehensive AI platform, armed with an impressive resource of Generative AI tools.

But what makes the story even more compelling is the seamless integration of Vertex AI with the Langchain framework. Langchain is the framework that binds everything together, making it easier for us to blend the power of Generative AI with Vertex AI.

In this blog, we’re about to embark on an exciting journey. We’ll explore how to leverage Vertex AI’s Generative AI tools in combination with the Langchain framework, all while creating a dynamic Question and Answer system.

Initial steps to setup the VertexAI

Before we jump into the programming part, there are a few essential initialization steps you’ll want to take care of to ensure everything runs smoothly.

If you’re a new user of Google Cloud, you have to sign up for a Google Cloud account. When you do this, you will receive a $300 free credit as part of a free trial. This credit can be used to explore and try out various Google Cloud services and products. For this, you need to follow the below steps:

  1. To sign up, visit the Google Cloud website.
  2. You will find the “Get started for free” button on the page, click on that button. It will redirect you to the signup page.
  3. You’ll be asked to provide some basic information and set up your Google Cloud account.
  4. During the registration process, you may need to provide a credit card for account verification purposes, but the free trial credit should cover your usage up to $300.

Once you have successfully set up your project, it’s time to dive into creating a Question and Answer (QNA) system using the powerful combination of Langchain and the VertexAI API.

Step 1:

To begin, we’ll need to install the Google Cloud SDK (gcloud) on our local system. The installation process can be completed by following the instructions provided in the link below:

Step 2:

After successfully installing the Google Cloud SDK, you need to run the below command in the command line:

gcloud auth application-default login

This command will assist you in authenticating your Google account with Google Cloud. When executed, it will open a web page where you should select the Google account associated with the project you created earlier for testing VertexAI API. After selecting the account, grant the necessary permissions. This step will generate the required credentials.

You can check the active account by using the below command:

gcloud auth list

Step 3:

Now, we will install all the Python libraries required for the code.

pip install chromadb==0.3.29
pip install langchain==0.0.319
pip install unstructured==0.10.25
pip install google-cloud-aiplatform==1.35.0

Step 4:

Now, let’s start things off by creating a Python script. In this stage, we’ll begin by setting up the VertexAI project and initializing the Language Model (LLM) for both text embedding and response generation. For this, we will be using “textembedding-gecko” as the embedding model and the “text-bison@001” model for generating responses.

To learn about the embedding model, you can visit the following link:

For information about the LLM model, please refer to this link:

import vertexai

from langchain.embeddings import VertexAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import VertexAI
from langchain.document_loaders import PyPDFLoader
# init the project which you want to use
project_name = "<Your_project_name>"
location = "<Your_project_location>"

vertexai.init(project=project_name, location=location)

# init the LLM model
llm = VertexAI(model_name="text-bison@001", max_output_tokens=200)

# init the embedding model
embeddings_model = VertexAIEmbeddings()

Ensure that you replace the GCP project name, which you are using for VertexAI exploration, as well as its location, in the variable values for “project_name” and “location,” respectively.

Step 5:

Looking ahead, we will proceed by reading the PDF data using the PyPDFLoader from Langchain. You need to provide the web link where your PDF is hosted or the local path of your system where the PDF is located.

For this blog, we have used a research paper in PDF format to perform Question and Answer (QnA) tasks. The research paper, titled “DEEP LEARNING APPLICATIONS AND CHALLENGES IN BIG DATA ANALYTICS,” is available at the link below. You can download the PDF, place it in your current working directory and give its path to the variable named “pdf_link”.

Once the PDF data is loaded, we will process it in chunks using the “RecursiveCharacterTextSplitter” from Langchain. This tool will take the data and divide it into manageable chunks for further processing.

# Ingest the PDF data file
pdf_link = "your_pdf_link"
loader = PyPDFLoader(pdf_link, extract_images=False)
data = loader.load_and_split()

# Initialize the text splitter and then split the data into chunk
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 4000,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
chunks = text_splitter.split_documents(data)

Step 6:

Once we’ve successfully ingested and transformed the data, the next step involves storing it in the Chroma database. For this, we will provide the data chunks we want to store in the database, along with the name of the embedding model. The system will then internally create embeddings of the text data from the chunks and store them in the database. The name of the database in this example is “test_database,” but feel free to change it according to your preferences.

db = Chroma.from_documents(chunks, embedding = embeddings_model, persist_directory="test_database")

Step 7:

Once the data is successfully stored in the database, there’s no need to repeat the previous steps each time. You can simply load the preloaded database as outlined in the following lines of code.

Following that, we’ll initialize the retriever, which is responsible for fetching the most suitable chunk from the database that may contain the answer to the user’s question. In this context, “search_kwargs” with “k” set to 3 means it will retrieve the top 3 most relevant chunks from the database.

Next, we’ll load a QNA chain, which involves using the LLM model to generate a response, along with specifying the type of the chain.

vectordb = Chroma(persist_directory="test_database", embedding_function = embeddings_model)

retriever = vectordb.as_retriever(search_kwargs = {"k" : 3})

chain = load_qa_chain(llm, chain_type="stuff")

Step 8:

In this next step, we will define a helper function designed to generate a response. This function will take the user’s question as input. Within this function, we will pass the user’s question to the retriever. The retriever, in turn, will internally match the embedding of the question with the stored documents in the database and fetch the most suitable chunk. We will then pass this chunk, along with the question, to the QNA chain, which will generate the answer.

def ask(question):
    # fetch the most suitable chunks for the question
    context = retriever.get_relevant_documents(question)
    # Generate response
    answer = (chain({"input_documents": context, "question": question}, return_only_outputs=True))['output_text']
    return answer

Step 9:

Now, with everything in place, we are ready to conduct Question and Answer (QnA) on the PDF data. To ask a question, simply add the following lines of code to your script:

user_question = input("User: ")
answer = ask(user_question)
print("Answer:", answer)

Test results:

Here are a few test examples of how our QnA system handles different questions and provides responses.

Q1: Which are the 2 high focuses of data science?

Q2: What is feature engineering?

Q3: What are the 2 main focuses of the paper?

Q4: List down the 4 Vs of Big Data characteristics.

Q5: What is the full form of SIFT?

We’ve been on an exciting journey, creating a Question and Answer (QnA) system using VertexAI’s clever AI service, using the Langchain framework. We’ve taken a look at the steps for getting everything ready, handling PDF data, and seeing how well our system does through tests.

Throughout this blog, we’ve harnessed the potential of AI to unlock answers within our PDF content, making information readily accessible. While our system has demonstrated its capabilities, there’s always room for improvement, and the world of AI is ever-evolving.

Retrieval Augmented Generation (RAG) Tutorial Using Mistral AI And Langchain:

Retrieval Augmented Generation (RAG) Tutorial Using OpenAI And Langchain:

Retrieval Augmented Generation (RAG) Using Azure And Langchain Tutorial:

Leverage Phi-3: Exploring RAG Based Q&A With Microsoft’s Phi-3:

Categories: Natural Language Processing NLP

2 Replies to “Retrieval Augmented Generation (RAG) tutorial using VertexAI Gen AI and Langchain”

  1. Hi mate, I’m still a bit confused on how to run the python script. Can you tell me how to do it?

    1. Hi Anas,

      To run the code, you need to copy and paste each block of code into a single Python file. Once you’ve done that, you need to open the terminal, create a virtual environment, activate it, and install all the required libraries. Finally, you can run the Python file using the command “python”.

      Alternatively, if you find the above approach a bit complex, you can use Colab or Jupyter Labs to run the code. Simply copy and paste each block of code into a single cell and run them one by one.

      I hope this helps! Let me know if you have any further questions or concerns.


Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>