In the world of AI, large language models have changed the way computers understand human language. Their potential for business solutions relies on strong information retrieval. Retrieval Augmented Generation enhances large language models by incorporating real-time information from external knowledge bases, ensuring more accurate and up-to-date responses. This AI framework optimizes generative processes, providing users with insights into valuable sources beyond the model’s training data. The choice of this system is crucial, as it shapes inputs to the language model and determines the quality of generated responses.

Here’s where Azure Cognitive Services comes in. In the context of RAG, a robust information retrieval system forms the basis for meaningful insights. Azure Cognitive Search is key to this, making it easy and efficient to search through large business datasets. By using Azure Cognitive Search, we make sure that the retrieval part of RAG is not only thorough but also flexible, creating a foundation for an intelligent and context-aware system.

Additionally, our exploration into Large Language Models (LLMs) and embeddings is smoothly facilitated by Azure OpenAI, orchestrated through the innovative LangChain. This dynamic integration ensures that the LLM, crucial for the generative aspect of RAG, operates at its best. Azure OpenAI, with its cutting-edge language understanding capabilities, becomes the cornerstone for embedding models, improving the system’s understanding of context.

Join us as we dive deeper into the specific roles that Azure Cognitive Services plays in strengthening RAG, and bridging the gap between information retrieval and advanced language models for a comprehensive and intelligent system.

Getting Started with Azure

To initiate your journey into the world of RAG using Azure, the first step is to acquire an Azure subscription. You can create a free subscription by following the below link:

Upon reaching the webpage, locate the “Start free” button, click on it, and proceed to sign up for Azure at no cost.

After successfully obtaining your subscription and logging into your account, you’ll see a screen like below:

Setting Up Azure Cognitive Search Index

To implement Retrieval Augmented Generation (RAG) on our custom document, we establish an index that allows us to search for the chunk (part of our data) potentially containing the user’s answer. For this task, we’ll leverage the Azure Search service to create an index of our data.

The Azure Search service is essential for enhancing our search functionality. It utilizes a search index to efficiently store and retrieve our data. This index improves our source content through various fields, resulting in more efficient and precise searches.

After successfully logging into the Azure portal, let’s proceed with the steps to create an Azure Search service:

Step 1

In the Azure portal, you will find the “Create a resource” button, as displayed below. Click on this button to initiate the process of creating a new resource.

Step 2

After clicking the “Create a resource” button, you’ll be directed to a screen displaying a list of Azure resources. In the search box, type and search for “Azure AI Search,” then press Enter. The search results will include “Azure AI Search.” Click on this resource to proceed with its creation.

Step 3

Once you’ve located the “Azure AI Search” resource, click on the “Create” button, as illustrated below:

Step 4

After clicking on the “Create” button, you’ll encounter a form similar to the one below. Please fill in the necessary details based on your account preferences. Note that, for the pricing tier, we have currently selected the free tier. Ensure all required information is provided accurately before proceeding, and then hit the “Review and Create” button.

Step 5

After clicking on “Review + Create,” the system will start validating your request. Once the validation is complete, the screen will reflect the status as shown below. Proceed by clicking the “Create” button to advance to the next steps.

Step 6

Clicking the “Create” button initiates the deployment of your search service. Once the resource is successfully deployed, the screen will resemble the one below. To view the newly created resource, simply click on the “Go to resource” button.

Step 7

You can see the overview of the created search service resource as depicted below:

Step 8

Now that we have successfully created a search service, we can proceed to create an index for our project to store the embedding of our custom dataset. In the free tier, you can create up to 3 indexes per search service.

To create an index, first, click on the “Indexes” button located on the left panel of the screen. Then, click on the “Add index” dropdown and select “Add index,” as illustrated below:

Step 9

The above steps will open a screen where we need to enter the name of the index. After entering the name, click the “Add field” button to enter the data fields that we want to store in the index, as outlined below:

Step 10

After clicking the “Add field” button, a form will appear like below. In this form, enter the name of the field as “data” to store the chunks’ data. Select “Retrievable” and “Searchable” as the configured attributes, then hit the “Save” button to create the field. Repeat the same process to create another data field named “source.”

Step 11

After successfully creating both fields, you’ll see a screen similar to the one below. To finalize and create the index, click the “Create” button.

Step 12

Having successfully created the index, our next step is to collect the endpoint and key for later use in our Python SDK script to connect with the index.

To find the endpoint, navigate to the “Overview” section on the left panel. From this screen, locate the “URL” section; this is our “endpoint.” Copy and save this information for later use.

To obtain the key, navigate to the “Keys” section on the left panel. Copy the “primary admin key”; this serves as our key. Save this information for later use in our script.

Deploying Embedding and LLM Models on Azure OpenAI for RAG

The significance of the embedding model and LLM in RAG cannot be overdrawn. The embedding model plays a crucial role in transforming our data into numerical representations, known as embeddings, facilitating efficient storage and retrieval in our search index.

On the other hand, the Language Model, often an LLM, is essential for response generation. It ensures that the answers provided in response to user queries are not only relevant but also contextually accurate.

To make these important parts work together smoothly in our solution, we’ll use the Azure OpenAI service. This helps us set up the embedding model and LLM model in a way that makes our system strong and efficient, making our answers super helpful in Retrieval Augmented Generation.

To utilize Azure’s OpenAI service, the initial step involves creating and deploying it. To proceed, follow the steps below:

Step 1

First, open the Azure portal, and click on the “Create a resource” button as depicted below:

Step 2

It will open a page displaying various resources. In the search box, type “Azure OpenAI” and press enter. This action will display the “Azure OpenAI” resource in the results. Choose this specific resource from the list and hit enter.

Step 3

After selecting the resource, the page will appear as shown below. Now, press the “Create” button to proceed.

Step 4

After clicking the “Create” button, a form will appear as described below. Please provide the necessary information according to your project and then click on the “Next” button.

Step 5

In the “Network” tab, choose the first option as illustrated below, and then click the “Next” button.

Step 6

After clicking the “Next” button, you will encounter the “Tags” section; skip it for now and proceed by clicking the “Next” button again. Following this, your request will undergo review, and you’ll be presented with a window as shown below. To finalize the process, click the “Create” button. It will start the deployment of resources.

Step 7

Once the deployment is complete, you will be directed to a page as shown below. To access the resource, click the “Go to resource” button.

Step 8

After successfully creating and deploying the Azure OpenAI service, the next step is to deploy the embedding and LLM models for our script. To do this, navigate to the “Model deployments” section in the left panel, as shown below. In that section, find the “Manage Deployments” button, click on it, and it will redirect you to the “Deployments” section of the Azure OpenAI studio.

Step 9

In the “Deployment section”, click the “Create new deployment” button as illustrated below.

Step 10

Now, let’s start by deploying the embedding model. Choose the embedding model as “text-embedding-ada-002,” assign it a unique deployment name, and click the “Create” button to initiate the model deployment.

Step 11

Now, repeat the same steps to deploy the LLM model. Proceed by clicking the “Create new deployment” button again. This time, deploy the “gpt-3.5-turbo-instruct” as the LLM model. Assign a unique name to the deployment and click the “Create” button to initiate the deployment of the LLM model.

After deploying both models, the deployment section of the Azure OpenAI Studio will appear as shown below. Please note down the deployment name of both models for future use.

Step 12

Now, let’s gather the endpoint and key, which will be used in the script to utilize the embedding and LLM models.

First, go to the Azure portal using the following link:

Next, select the Azure OpenAI service that we created earlier. This will take you to a page like the one shown below. On that page, find the “Endpoints” section and click on the URL. This will direct you to the “Keys and Endpoint” section.

In the “Keys and Endpoint” section, as depicted below, you will find the endpoint and key. Save this information for later use in the script.

Scripting with LangChain

Now that we’ve set up the Azure OpenAI service and models, we’re all set to start creating our script. Let’s get started with crafting a script that uses the embedding and LLM models for Retrieval Augmented Generation (RAG) using Azure Cognitive services and Langchain.

Step 1

To install the necessary Python libraries for RAG, you can use the following lines of code in the terminal:

pip install azure-search-documents==11.4.0
pip install langchain==0.1.0
pip install openai==1.7.2
pip install pypdf==4.0.1
pip install tiktoken==0.5.2
pip install unstructured==0.12.3
pip install langchain-openai==0.0.2.post1

Step 2

Now that our environment is set up, we can begin creating a script. Let’s start by importing the necessary libraries. You can achieve this by using the following lines of code:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain_openai import AzureOpenAI

from azure.core.credentials import AzureKeyCredential
from import SearchClient

import os

Step 3

Continuing with our script, the next step is to set up the connection to the Azure Search service using the Python client and initializing the index by using the following lines of code:

# Define Azure search index properties
index_name = "azure-rag-demo-index"

# Init the search index client
credential = AzureKeyCredential(key)
client = SearchClient(endpoint=endpoint,

Ensure to replace the name of your index within the variable ‘index_name,’ replace the earlier saved Azure Search service endpoint in place of ‘YOUR_AZURE_SEARCH_ENDPOINT,’ and replace the Azure Search service key in place of ‘YOUR_AZURE_SEARCH_KEY.

Step 4

Next, we will define the LLM model that we deployed earlier on the Azure OpenAI service using the following lines of code:

# Define Azure ML properties
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"

# Init the Azure OpenAI model
llm = AzureOpenAI(deployment_name = "azure-blog-llm-model", 
                  model = "gpt-35-turbo-instruct",

Ensure to replace your Azure OpenAI endpoint, key, and the deployed model’s name in the variable.

Step 5

Now, our next step is to read our custom data using LangChain and process it to store it in the Azure Search index. Here, we are using the PDF titled ‘Deep Learning Applications and Challenges in Big Data Analytics.’ You can find it at the following link:

Just download it and place it in your current working directory. We will read the PDF using the PyPDFLoader of LangChain and then create chunks of the data using the text splitter. You can replicate the same using the following lines of code:

# Read the PDF file using the langchain loader
pdf_link = "demo_paper.pdf"
loader = PyPDFLoader(pdf_link, extract_images=False)
data = loader.load_and_split()

# Split data into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 5000,
    chunk_overlap  = 20,
    length_function = len
chunks = text_splitter.split_documents(data)

Make sure to replace the path of your downloaded PDF file in place of “demo_paper.pdf”.

Step 6

As we have prepared data, we will store it one by one in the Azure Search Index ‘azure-rag-demo-index,’ which we created in the Azure Search service. Here, we will store the data from the PDF file in the field named ‘data’ and the path of the data in the field named ‘source’. The below line of code will create an embedding of our custom data and store it in the Azure Search Index.

# Store the data into Azure search index
for index, chunk in enumerate(chunks):
    data = {
        "id" : str(index + 1),
        "data" : chunk.page_content,
        "source": chunk.metadata["source"]

    result = client.upload_documents(documents=[data])

Step 7

In this step, we will define a utility function that takes ‘user_question’ as input, searches for the best chunk for the user’s question in our index, and returns the top 2 most suitable chunks. Then, we will extract the chunk data from the search results, and pass it to the LLM prompt along with the question to generate the response. Finally, we will call the LLM model and return the response. You can use the following lines of code to achieve this.

# A utility function to generate the response
def generate_response(user_question):

    # Fetch the appropriate chunk from the database
    context = """"""
    results =, top = 2)
    for doc in results:
        context += "\n" + doc['data']

    # Append the chunk and the question into prompt
    qna_prompt_template = f"""You will be provided with the question and a related context, you need to answer the question using the context.



Make sure to answer the question only using the context provided, if the context doesn't contain the answer then return "I don't have enough information to answer the question".


    # Call LLM model to generate response
    response = llm(qna_prompt_template)
    return response

Step 8

Now, we are all set to perform Q&A using Azure Cognitive Services. You need to pass your question into the variable ‘user_question’ and call the utility function ‘generate_response’ with it. You can achieve this by using the following lines of code:

# Take the user input and call the utility function to generate the response
user_question = “YOUR_QUESTION”
response = generate_response(user_question)

Test results

Q1: Which are the 2 high focuses of data science?

Q2: Give me the name of the transformation algorithms.

Q3: What is the RBMs?

Q4: Which part of the human brain is capable of extracting the features and abstractions from the data?

Q5: What is the core of Big Data Analytics?


We’ve successfully navigated the creation of a Retrieval Augmented Generation (RAG) system using Azure Cognitive Services. Our script, powered by LangChain, connects to Azure Search, processes custom data, and utilizes language models for generating context-aware responses. This comprehensive guide ensures a strong foundation for implementing RAG solutions, combining Azure’s capabilities for intelligent and precise answers.

Stay tuned for more insights on advancing RAG systems!

Retrieval Augmented Generation (RAG) Tutorial Using Mistral AI And Langchain:

Retrieval Augmented Generation (RAG) Tutorial Using VertexAI Gen AI And Langchain:

Retrieval Augmented Generation (RAG) Tutorial Using OpenAI And Langchain:

Leverage Phi-3: Exploring RAG Based Q&A With Microsoft’s Phi-3:

Ready to harness the power of Azure for your RAG-based applications or chatbots? Connect with us at or share your requirements here to kickstart your journey to success!

Categories: Azure ChatGPT

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>