Mistral AI has emerged as a pioneering force in AI-driven language comprehension and manipulation. It offers accessible models, API endpoints, and tools tailored for diverse language-based tasks and applications. This marks the onset of a new era in AI innovation and accessibility. For further insights, delve into our second blog post where we have compared Mistral 7B results with LLaMa 2 models result, Exploring the Potential of Mistral 7B: A Comparative Analysis with LLaMa 2.
Welcome to the world of Mistral AI, where revolutionary developments in Language Model Technology meet user-friendly innovation! Mistral AI proudly presents Mistral 7B, an intelligent solution designed to understand and manipulate language in a manner similar to human perception. It’s a powerful but simple artificial intelligence that learns from a large amount of data to help computers better speak and understand human language. You can do a variety of exciting things with Mistral 7B, from writing and summarizing texts to assisting with coding tasks. What distinguishes Mistral 7B from other LLM is that it is smaller in size but packs a punch with its incredible abilities, performing remarkably well in various tasks.
This blog delves into Mistral AI’s open-source model and API, offering a hands-on exploration through code snippets.
Mistral AI – Open-source models
Mistral AI has open-sourced pre-trained and fine-tuned models. If you wish to deploy the Mistral AI LLM on your infrastructure or system, they have published these models on the Hugging Face Models platform. Here is a comprehensive list of available models and the resources required to run each model (ref: https://docs.mistral.ai/models). The raw model weights are downloadable from the documentation and on GitHub.
1. Mistral-Tiny
This generative model is well-suited for managing large batch processing tasks that don’t necessitate complex reasoning capabilities. To utilize these models on Google Colab or a system, a minimum of a 16 GB RAM GPU (T4/V100) is needed. Running it on a CPU (without GPU) is feasible, but it will significantly increase the execution time.
- Mistral-7B-v0.1: The Mistral-7B-v0.1 Large Language Model (LLM) is a pre-trained generative text model equipped with 7.3 billion parameters.
- Mistral-7B-Instruct-v0.1: The Mistral-7B-Instruct-v0.1 LLM is fine-tuned for conversation and question-answering, the instructive version is derived from the Mistral-7B-v0.1 generative text model. It has been refined using various publicly available conversation datasets.
- Mistral-7B-Instruct-v0.2: The Mistral-7B-Instruct-v0.2 LLM is an improved, instruct, fine-tuned version of the Mistral-7B-Instruct-v0.1.
2. Mistral-Small
Mistral AI’s Mixtral 8x7B stands out as an exceptional open-weight model, outperforming even GPT-3.5 in terms of performance. Mixtral’s sparse mixture-of-experts network utilizes a fraction of its vast 46.7B total parameters, providing enhanced affordability without sacrificing processing speed. This game-changing model not only outperforms LLaMA 2 70B but also defines itself as a dependable and fast solution suitable for a wide range of AI agent applications, ushering in the era of open-weight LLMs. To utilize these models on Google Colab or a system for inference, a minimum of a 100 GB RAM GPU (2xA100 or 2xH100 GPU) is needed.
- Mixtral-8x7B-v0.1: Mixtral-8x7B-v0.1 model possesses expanded capabilities and enhanced reasoning abilities. It can generate and reason about code in English, French, German, Italian, and Spanish.
- Mixtral-8x7B-Instruct-v0.1: The Mixtral-8x7B-Instruct-v0.1 LLM is fine-tuned and the instructive version is derived from the Mixtral-8x7B-v0.1 generative text model.
3. Mistral-Medium
These models are under development
Mistral AI - API
Mistral AI API (paid) launches its beta platform, which includes three distinct chat endpoints — mistral-tiny, mistral-small, and mistral-medium — that are designed to balance performance and cost-effectiveness. These endpoints range from low-cost to high-quality and use various models such as Mistral 7B Instruct v0.2 and Mixtral 8x7B, which support multiple languages and coding capabilities. Mistral’s API is designed to seamlessly integrate powerful AI tools into applications, with user-friendly chat interface specifications and available Python and JavaScript client libraries. Furthermore, the platform provides a system prompt feature for enhanced moderation, which is critical in sensitive situations, as well as a transparent pricing structure for different tiers of usage. Please refer to this source about API pricing and rate limit.
If you lack the resources or prefer to avoid manual setup for deploying Mistral AI, you can opt to use this paid API instead.
Code to Test Mistral 7B
Let’s implement the code for inferences using the Mistral 7B model in Google Colab. We’ll utilize the free version with a single T4 GPU and load the model from Hugging Face.
This particular model has undergone fine-tuning to align with a specific prompting format, necessitating the inclusion of instructions within designated [INST] and [/INST] tags. The initial instruction should commence with a specific sentence identifier: <s>, while the model’s response concludes with an end-of-sentence identifier: </s>.
Prompt Format
[INST] [/INST]
[INST] [/INST]
Fortunately, the model’s creators have integrated a prompt template, known as a chat template, simplifying the process. By utilizing a Python list of dictionaries, this template automatically organizes the sections of our prompt and generates the necessary tags—[INST], [/INST], <s>, </s>—eliminating manual formatting complexities. This functionality can be accessed through the apply_chat_template() method.
Install required dependencies
!pip install -q -U git+https://github.com/huggingface/transformers.git
Code Block
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2",torch_dtype = 'auto')
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
messages = [
{"role": "user", "content": "I am curious to know about chemisty..."},
{"role": "assistant", "content": "You can ask anything about chemistry, I know very well"},
{"role": "user", "content": "How many bonds are there in alkine?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
Once we have loaded the model, it is time to test it. You can provide any input prompt of your choice, and also increase the “max_new_tokens” parameter to the number of tokens you would like to generate.
Testing Results
Test 1
Prompt:
messages = [
{"role": "user", "content": "I am curious to know about chemistry..."},
{"role": "assistant", "content": "You can ask anything about chemistry, I know very well"},
{"role": "user", "content": "How many bonds are there in alkene?"}
]
Response:
An alkene is a hydrocarbon that contains carbon-carbon unsaturated bonds. An alkene with the molecular formula CnH(2n) can form a maximum of n C=C bonds. Each C=C bond consists of a sigma bond and a pi bond, making it a total of n x 2 = 2n bonds between the carbon atoms in the alkene molecule. So, an alkene with the molecular formula CnH(2n) forms 2n bonds in total. For example, an ethylene molecule (C2H4) has two carbons and forms two C=C bonds, resulting in a total of 2 x 2 = 4 bonds.
Test 2
Prompt:
{"role": "user", "content": "Translate this text from English to French: Mistral AI is a French company, that maintains a Discord community for discussing their models and interacting with engineers"}
Response:
Mistral AI est une entreprise française, elle entretient une communauté Discord pour discuter de ses modèles et d'interagir avec les ingénieurs.
Test 3
Prompt:
messages = [
{"role": "user", "content": "I need your help"},
{"role": "assistant", "content": "You are a helpful code assistant. Your task is to summarize text into two to three sentences"},
{"role": "user", "content": "The transformer (represented by the T in ChatGPT, GPT-2, GPT-3, GPT-3.5, etc.) is the key element that makes generative AI so, well, transformational. Transformer models are a type of neural network architecture designed to process sequential material, such as sentences or time-series data. The concept of a transformer, an attention-layer-based, sequence-to-sequence (“Seq2Seq”) encoder-decoder architecture, was conceived in a 2017 paper authored by pioneer in deep learning models Ashish Vaswani et al called “Attention Is All You Need”. Since then, in the realms of AI and machine learning, transformer models have emerged as a groundbreaking approach to various language-related tasks. Compared with traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), transformers differ in their ability to capture long-range dependencies and contextual information. The transformer “requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets,” notes Wikipedia. From machine translation to natural language processing (NLP) to computer vision, plus audio and multi-modal processing, transformers have revolutionized the field with their ability to capture long-range dependencies and efficiently process sequential data. They’re used widely in neural machine translation (NMT). They’re used to perform or improve AI and NLP business tasks, as well as streamline enterprise workflows. Transformer technology has also heralded generative pretrained transformers (GPTs) and Bidirectional Encoder Representations from Transformers (BERT)."}
]
Response:
The Transformer, a neural network architecture introduced in a 2017 paper by Ashish Vaswani et al., is a key component in generative AI and has revolutionized language-related tasks in AI and machine learning. Transformers, which use attention layers and a sequence-to-sequence encoder-decoder design, differ from traditional RNNs and CNNs in their ability to capture long-range dependencies and contextual information. Compared to RNNs, transformers require less training time and have been widely adopted for tasks such as machine translation, natural language processing, and generative models like GPTs and BERT. From business applications to computational fields like vision, audio, and multi-modal processing, transformers have significantly impacted various industries with their efficiency and ability to process sequential data.
Test 4
Prompt:
{"role": "user", "content": "Write code to drop the second column of dataframe using pandas"}
Response:
Here's some sample code to drop the second column of a dataframe using Pandas:
```
import pandas as pd
# create a sample dataframe
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
# drop the second column
df.drop('col2', axis=1, inplace=True)
# print the updated dataframe
print(df)
```
In the above code, we first import the Pandas library and create a sample dataframe with two columns `col1` and `col2`. We then use the `drop()` method of the dataframe to remove the `col2` column. The `axis` parameter is set to `1` to indicate that we want to drop a column. The `inplace=True` parameter is used to modify the original data frame in place instead of creating a new one. Finally, we print the updated data frame.