File size: 4,850 Bytes
e74d21d 115facb e74d21d 115facb b7e1725 115facb 0e7369d 115facb 99c74c4 115facb e74d21d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
library_name: transformers
tags:
- prime-rl
- verifiers
- prime-intellect
- reinforcement-learning
- reasoning
- agentic
- mixture-of-experts
license: mit
language:
- en
base_model: PrimeIntellect/INTELLECT-3
pipeline_tag: text-generation
---
# INTELLECT-3 AWQ - INT4
## Model Details
### Quantization Details
- **Quantization Method:** cyankiwi AWQ v1.0
- **Bits:** 4
- **Group Size:** 32
- **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
- **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
### Memory Usage
| **Type** | **INTELLECT-3** | **INTELLECT-3-AWQ-4bit** |
|:---------------:|:----------------:|:----------------:|
| **Memory Size** | 199.0 GB | 59.0 GB |
| **KV Cache per Token** | 61.3 kB | 15.3 kB |
| **KV Cache per Context** | 7.7 GB | 1.9 GB |
## Inference
### Prerequisite
```bash
pip install -U vllm
```
### Basic Usage
```bash
vllm serve cyankiwi/INTELLECT-3-AWQ-4bit \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--reasoning-parser deepseek_r1
```
## Additional Information
### Known Issues
- `tensor-parallel-size > 2` requires `--enable-expert-parallel`
- No MTP implementation
### Changelog
- **v0.9.0** - Initial quantized release without MTP implementation
### Authors
- **Name:** Ton Cao
- **Contacts:** ton@cyan.kiwi
# INTELLECT-3
<div align="center">
<img src="banner.png" alt="Prime Intellect Logo" />
</div>
<p align="center">
<strong>INTELLECT-3: A 100B+ MoE trained with large-scale RL</strong>
<br><br>
Trained with <a href="https://github.com/PrimeIntellect-ai/prime-rl">prime-rl</a> and <a href="https://github.com/PrimeIntellect-ai/verifiers">verifiers</a>
<br>
Environments released on <a href="https://app.primeintellect.ai/dashboard/environments">Environments Hub</a>
<br>
Read the <a href="https://primeintellect.ai/blog/intellect-3">Blog</a> & <a href="https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf">Technical Report</a>
<br>
<a href="https://x.com/primeintellect">X</a> | <a href="https://discord.gg/RC5GvMbfDf">Discord</a> | <a href="https://app.primeintellect.ai/dashboard/create-cluster">Prime Intellect Platform</a>
</p>
## Introduction
**INTELLECT-3** is a 106B (A12B) parameter Mixture-of-Experts reasoning model post-trained from [GLM-4.5-Air-Base](https://huggingface.co/zai-org/GLM-4.5-Air-Base) using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL).

Training was performed with [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) using environments built with the [verifiers](https://github.com/PrimeIntellect-ai/verifiers) library.
All training and evaluation environments are available on the [Environments Hub](https://app.primeintellect.ai/dashboard/environments).
The model, training frameworks, and environments are open-sourced under fully-permissive licenses (MIT and Apache 2.0).
For more details, see the [technical report](https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf).
## Evaluation
INTELLECT-3 achieves best-in-class performance on math, coding, and reasoning benchmarks:
| Benchmark | MATH-500 | AIME24 | AIME25 | LCB | GPQA | HLE | MMLU-Pro |
|-----------|----------|---------|---------|--------|------|-----|----------|
| INTELLECT-3 | **98.1** | **90.8** | **88.0** | 69.3 | 74.4 | 14.6 | 81.9 |
| GLM-4.5-Air | 97.8 | 84.6 | 82.0 | 61.5 | 73.3 | 13.3 | 73.9 |
| GLM-4.5 | 97.0 | 85.8 | 83.3 | 64.5 | 77.0 | 14.8 | 83.5 |
| DeepSeek R1 0528 | 87.3 | 83.2 | 73.4 | 62.5 | 77.5 | 15.9 | 75.3 |
| DeepSeek v3.2 | 96.8 | 88.1 | 84.7 | **71.6** | **81.4** | **17.9** | **84.6** |
| GPT-O5S 120B | 96.0 | 75.8 | 77.7 | 69.9 | 70.0 | 10.6 | 67.1 |
## Model Variants
| Model | HuggingFace |
|-------|-------------|
| INTELLECT-3 | [PrimeIntellect/INTELLECT-3](https://huggingface.co/PrimeIntellect/INTELLECT-3) |
| INTELLECT-3-FP8 | [PrimeIntellect/INTELLECT-3-FP8](https://huggingface.co/PrimeIntellect/INTELLECT-3-FP8) |
## Serving with vLLM
The BF16 version can be served on 2x H200s:
```bash
vllm serve PrimeIntellect/INTELLECT-3 \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--reasoning-parser deepseek_r1
```
The FP8 version can be served on a single H200:
```bash
vllm serve PrimeIntellect/INTELLECT-3-FP8 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--reasoning-parser deepseek_r1
```
## Citation
```bibtex
@misc{intellect3,
title={INTELLECT-3: Technical Report},
author={Prime Intellect Team},
year={2025},
url={https://huggingface.co/PrimeIntellect/INTELLECT-3}
}
```
|