--- library_name: vllm language: - en - fr - es - de - it - pt - nl - zh - ja - ko - ar license: apache-2.0 inference: false extra_gated_description: >- If you want to learn more about how we process your personal data, please read our Privacy Policy. base_model: - mistralai/Mistral-Large-3-675B-Base-2512 tags: - mistral-common --- # Mistral Large 3 675B Instruct 2512 BF16 From our family of large models, **Mistral Large 3** is a state-of-the-art general-purpose **Multimodal granular Mixture-of-Experts** model with **41B active parameters** and **675B total parameters** trained from the ground up. This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases. Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows. This version corresponds to the **BF16** weights, Mistral Large 3 is deployable on-premises in: - [FP8](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-FP8) on a single node of B200s or H200s. - [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s. ## Key Features Mistral Large 3 consists of two main architectural components: - **A Granular MoE Language Model with 673B params and 39B active** - **A 2.5B Vision Encoder** The Mistral Large 3 Instruct model offers the following capabilities: - **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text. - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic. - **System Prompt**: Maintains strong adherence and support for system prompts. - **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting. - **Frontier**: Delivers best-in-class performance. - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes. - **Large Context Window**: Supports a 256k context window. ## Use Cases With powerful long-context performance, stable and consistent cross-domain behavior, Mistral Large 3 is perfect for: - Long Document Understanding - Powerful Daily-Driver AI Assistants - State-of-the-Art Agentic and Tool-Use Capabilities - Enterprise Knowledge Work - General Coding Assistant And enterprise-grade use cases requiring frontier capabilities. ## Recommended Settings We recommend deploying Large 3 in a client-server configuration with the following best practices: - **System Prompt**: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems. - **Sampling Parameters**: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings. - **Tools**: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools. - **Vision**: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance. ### Known Issues / Limitations - **Not a dedicated reasoning model**: Dedicated reasoning models can outperform Mistral Large 3 in strict reasoning use cases. - **Behind vision-first models in multimodal tasks**: Mistral Large 3 can lag behind models optimized for vision tasks and use cases. - **Complex deployment**: Due to its large size and architecture, the model can be challenging to deploy efficiently with constrained resources or at scale. ## Benchmark Results We compare Mistral Large 3 to similar sized models. ### Text ### Vision ## Usage The model can be used with the following frameworks; - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm) ### vLLM We recommend using this model with [vLLM](https://github.com/vllm-project/vllm) in FP8 or NVFP4. #### Installation Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0): ``` pip install vllm --upgrade ``` Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6). To check: ``` python -c "import mistral_common; print(mistral_common.__version__)" ``` You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39). #### Serve The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations. A simple launch command is: ```bash vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \ --tensor-parallel-size 8 \ --enable-auto-tool-choice --tool-call-parser mistral ``` Key parameter notes: * enable-auto-tool-choice: Required when enabling tool usage. * tool-call-parser mistral: Required when enabling tool usage. Additional flags: * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios. * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency. #### Usage of the model Here we asumme that the model `mistralai/Mistral-Large-3-675B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
Vision Reasoning Let's see if Mistral Large 3 knows when to pick a fight ! ```python from datetime import datetime, timedelta from openai import OpenAI from huggingface_hub import hf_hub_download # Modify OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 262144 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday) SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": [ { "type": "text", "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.", }, {"type": "image_url", "image_url": {"url": image_url}}, ], }, ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, ) print(response.choices[0].message.content) ```
Function Calling Let's solve some equations thanks to our simple Python calculator tool. ```python import json from openai import OpenAI from huggingface_hub import hf_hub_download # Modify OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 262144 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() return system_prompt SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg" def my_calculator(expression: str) -> str: return str(eval(expression)) tools = [ { "type": "function", "function": { "name": "my_calculator", "description": "A calculator that can evaluate a mathematical equation and compute its results.", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "The mathematical expression to evaluate.", }, }, "required": ["expression"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "Rewrite a given text for improved clarity", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "The input text to rewrite", } }, }, }, }, ] messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": [ { "type": "text", "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.", }, { "type": "image_url", "image_url": { "url": image_url, }, }, ], }, ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, tools=tools, tool_choice="auto", ) tool_calls = response.choices[0].message.tool_calls results = [] for tool_call in tool_calls: function_name = tool_call.function.name function_args = tool_call.function.arguments if function_name == "my_calculator": result = my_calculator(**json.loads(function_args)) results.append(result) messages.append({"role": "assistant", "tool_calls": tool_calls}) for tool_call, result in zip(tool_calls, results): messages.append( { "role": "tool", "tool_call_id": tool_call.id, "name": tool_call.function.name, "content": result, } ) response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, ) print(response.choices[0].message.content) ```
Text-Only Request Mistral Large 3 can follow your instructions down to the letter. ```python from openai import OpenAI from huggingface_hub import hf_hub_download # Modify OpenAI's API key and API base to use vLLM's API server. openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 262144 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() return system_prompt SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt") messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.", }, ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, ) assistant_message = response.choices[0].message.content print(assistant_message) ```
## License This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt). *You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*