Microsoft’s Phi-3 model represents a significant advancement in the field of language models, offering remarkable capabilities in a compact size. This model stands out as a game-changer, providing functionalities comparable to larger models while requiring less training data. Microsoft’s decision to launch Phi-3 reflects its commitment to enhancing AI models’ contextual understanding and response accuracy.

Phi-3’s advanced capabilities are particularly beneficial for tasks such as document summarization, market research analysis, content generation, and leveraging the RAG (Retrieval Augmented Generation) framework for question answering. The RAG framework significantly improves AI-generated text quality by connecting the model to external knowledge sources. This integration enhances the model’s understanding and response accuracy by incorporating both internal and external data, making it a pivotal advancement in AI-driven question answering.

In our blog, we are specifically focusing on leveraging Microsoft’s Phi-3 model as the Language Model for the RAG framework. This combination highlights the model’s ability to enhance question answering through the RAG framework’s retrieval-augmented generation approach. Before we dive in, let’s take a quick look at some key points about the Phi-3 model.

About Microsoft’s Phi-3

  • The Phi-3-mini model comes in two context-length versions: 4K and 128K tokens. It’s the first model ever to handle up to 128,000 words at once, all while keeping the quality top-notch.
  • It is trained on the dataset comprising 3,000 words. This dataset included an equal number of nouns, verbs, and adjectives, providing a balanced and diverse set of linguistic elements for training and testing purposes.
  • This model is instruction-tuned, which means it’s trained to understand and follow various types of instructions, making it easy to use right away.
  • Phi-3 models don’t perform as well on factual knowledge tests like TriviaQA because their smaller size limits their ability to remember large amounts of information.

In this blog, we’ll be using the “Phi-3-mini-4k-instruct” model from the family of Phi models, which you can find on Hugging Face. Let’s get started!

Steps to Implement RAG using Phi-3 Langchain

Step 1: Install the required libraries

To begin, we need to install all the necessary libraries for our project. We can do this by using the following command:

!pip install git+
!pip install langchain chromadb pypdf openai sentence-transformers accelerate
Step 2: Initialize the Embedding and LLM model

Next, we’ll set up the default embeddings of the HugginFace and load the Language Model (LLM) “microsoft/Phi-3-mini-4k-instruct” using the following code snippet:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain import HuggingFacePipeline
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate

model_kwargs = {'device': 'cuda'}
embeddings = HuggingFaceEmbeddings(model_kwargs=model_kwargs)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", device_map='auto', torch_dtype="auto", trust_remote_code=True,)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=300)
llm = HuggingFacePipeline(pipeline=pipe)

You have to use the GPU to run the above code snippet. You can try this using the T4 on Google Colab also.

Step 3: Load and process the PDF data

For this blog, we will use a PDF file to perform the QnA on it. We’ve selected a research paper titled “DEEP LEARNING APPLICATIONS AND CHALLENGES IN BIG DATA ANALYTICS,” which can be accessed at the following link:

Please download the PDF and place it in your working directory. Then, provide the path to the variable named “pdf_link” in the code. After extracting the data from the PDF, we’ll use Langchain’s RecursiveCharacterTextSplitter tool to divide the data into smaller chunks suitable for our LLM models.

# Load the PDF file
pdf_link = "Link_to_the_PDF_file"
loader = PyPDFLoader(pdf_link, extract_images=False)
pages = loader.load_and_split()

# Split data into chunks
text_splitter = RecursiveCharacterTextSplitter(
   chunk_size = 4000,
   chunk_overlap  = 20,
   length_function = len,
   add_start_index = True,
chunks = text_splitter.split_documents(pages)
Step 4: Create embeddings and store them in the vector database

In the next step, we’ll create embeddings for the chunks of data extracted from the PDF and store them in the Chroma vector database. We’ll need to provide the chunk data, specify the embedding model used, and indicate the directory where we want to store the database for future use. You can use the below code snippet to replicate the same:

# Store data into database
Step 5: Load the database and initialize the retriever

Once the data is securely stored in the database, you won’t need to repeat the previous steps each time. You can load the existing database using the provided code snippet.

After loading the database, we’ll initialize the retriever. This component is responsible for fetching the most suitable chunk from the database that might contain the answer to the user’s question. In this context, the “search_kwargs” parameter, with “k” set to 3, ensures retrieval of the top 3 most relevant chunks from the database.

# Load the database
vectordb = Chroma(persist_directory="test_index", embedding_function = embeddings)

# Load the retriver
retriever = vectordb.as_retriever(search_kwargs = {"k" : 3})
Step 6: Define the custom prompt and initialize the QnA chain

Next, we’ll define a custom prompt for the QnA task in a format that best suits the Phi-3 model. The prompt format for the Phi-3 model is as follows:

"Instructions to the system" <|end|>
"User's question or context if provided"<|end|>

We’ll create a prompt in the format suitable for the Phi-3 model as mentioned earlier. Then, we’ll pass this prompt to the model. Following that, we’ll load a QnA chain and use the LLM model to generate a response. To do this, use the below code snippet:

# Define the custom prompt template suitable for the Phi-3 model
You have been provided with the context and a question, try to find out the answer to the question only using the context information. If the answer to the question is not found within the context, return "I dont know" as the response.<|end|>

Question: {question}<|end|>
PROMPT = PromptTemplate(
   template=qna_prompt_template, input_variables=["context", "question"]

# Define the QNA chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=PROMPT)
Step 7: Define the utility function

Now, we’ll create a utility function to generate responses. This function will take the user’s question as input. Inside the function, the question goes to the retriever, which matches its embedding with the stored documents in the database and gets the most relevant chunk. Then, this chunk and the original question are passed to the QnA chain, which gives us the answer.

# A utility function for answer generation
def ask(question):
   context = retriever.get_relevant_documents(question)

   answer = (chain({"input_documents": context, "question": question}, return_only_outputs=True))['output_text']
   return answer
Step 8: Test the code

Now, we are all set to perform the QnA on the PDF. To do this, please use the below lines of code:

# Take the user input and call the function to generate output
user_question = input("User: ")
answer = ask(user_question)
answer = (answer.split("<|assistant|>")[-1]).strip()
print("Answer:", answer)

Test results

Here are some test examples to demonstrate how our QnA system handles different questions and generates responses.

Q1: Which are the 2 high focuses of data science?

A1: The two high focuses of data science are Velocity and Variety, which are characteristics of Big Data. Velocity refers to the increasing rate at which data is collected and obtained, while Variety refers to the different types and sources of data. These two aspects are crucial in today’s data-intensive technology era, as they impact the speed and complexity of data processing and analysis.

However, it’s important to note that the context provided also discusses other key aspects of data science, such as Veracity, which deals with the trustworthiness or usefulness of results obtained from data analysis, and the challenges faced in Big Data Analytics, including data quality, validation, and scalability of algorithms. Additionally, the context highlights the role of Deep Learning in extracting meaningful abstract representations from Big Data, which is an important focus in the field of data science.

In summary, while Velocity and Variety are two high focuses of data science, the context also emphasizes other critical aspects and challenges in the field.

Q2: What is feature engineering?

A2: Feature engineering is the process of manually creating new features or modifying existing ones to improve the performance of machine learning algorithms. It involves selecting, transforming, and combining data attributes to extract meaningful information that can be used for analysis and prediction. In the context provided, feature engineering is mentioned as a more automated and general approach that would allow practitioners to automatically extract features without direct human input. This would be a major breakthrough in machine learning, as it would enable the development of algorithms that can learn and represent data in a more efficient and effective manner.

Q3: What are the 2 main focuses of the paper?

A3: The two main focuses of the paper are:

1. Applications of deep learning in big data analytics: The paper discusses how deep learning algorithms can extract meaningful abstract representations of raw data through a hierarchical multi-level learning approach. It highlights the advantages of deep learning in learning from large amounts of unlabeled/unsupervised data, extracting non-local and global relationships and patterns, and enabling broad application to different data types.

2. Deep learning in data mining and machine learning: The paper emphasizes the concept of automating the extraction of representations (abstractions) from data using deep learning algorithms. It explains how deep learning architectures can generalize in non-local and global ways, leading to richer generalization and the ability to disentangle the factors of variation in data. The paper also describes the hierarchical nature of deep learning algorithms, where data is passed through multiple transformation layers to learn a complicated and abstract representation of the data.

Q4: List down the 4 Vs of Big Data characteristics.

A4: Volume, Variety, Velocity, and Veracity.

Q5: What is the full form of SIFT?

A5: The full form of SIFT is a Scale-Invariant Feature Transform.


Through this blog, we’ve gained a deeper understanding of how to effectively utilize the Phi-3 model within the RAG (Retrieval-Augmented Generation). We’ve learned about Phi-3’s compact yet powerful capabilities, its support for a wide context window, and how it’s instruction-tuned for natural language understanding.

This blog has equipped us with the knowledge and skills to leverage Phi-3’s strengths in enhancing AI-driven tasks and generating accurate responses, making it a valuable tool in the realm of language processing and AI technologies.

Retrieval Augmented Generation (RAG) Tutorial Using Mistral AI And Langchain:

Retrieval Augmented Generation (RAG) Tutorial Using VertexAI Gen AI And Langchain:

Retrieval Augmented Generation (RAG) Tutorial Using OpenAI And Langchain:

Retrieval Augmented Generation (RAG) Using Azure And Langchain Tutorial:

Categories: Artificial Intelligence How To NLP Python

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>