Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,7 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen3-8B
|
| 4 |
- Qwen/Qwen3-0.6B
|
|
@@ -8,10 +11,6 @@ tags:
|
|
| 8 |
- Light weight
|
| 9 |
- Agentic
|
| 10 |
- Conversational
|
| 11 |
-
language:
|
| 12 |
-
- en
|
| 13 |
-
pipeline_tag: text-generation
|
| 14 |
-
library_name: transformers
|
| 15 |
---
|
| 16 |
# Qwen3 Quantized Models – Lexicons Edition
|
| 17 |
|
|
@@ -26,7 +25,7 @@ This repository provides quantized versions of the **Qwen3** language models, op
|
|
| 26 |
|
| 27 |
## Model Overview
|
| 28 |
|
| 29 |
-
**Qwen3** is the latest open-source LLM series developed by Alibaba Group. Released on **April 28, 2025**, the models were trained on **36 trillion tokens** across **119 languages and dialects**. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities.
|
| 30 |
|
| 31 |
The quantized versions provided here use **4-bit Q4_K_M** precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
|
| 32 |
|
|
@@ -34,10 +33,10 @@ The quantized versions provided here use **4-bit Q4_K_M** precision ensuring hig
|
|
| 34 |
|
| 35 |
## Key Features
|
| 36 |
|
| 37 |
-
-
|
| 38 |
-
-
|
| 39 |
-
-
|
| 40 |
-
-
|
| 41 |
|
| 42 |
|
| 43 |
---
|
|
@@ -46,15 +45,13 @@ The quantized versions provided here use **4-bit Q4_K_M** precision ensuring hig
|
|
| 46 |
|
| 47 |
| Model Name | Parameters | Quantization | Context Length | Recommended Use |
|
| 48 |
|--------------------------|------------|--------------|----------------|--------------------------------------|
|
| 49 |
-
| Qwen_Qwen3-0.6B-Q4_K_M
|
| 50 |
-
| Qwen_Qwen3-1.7B-Q4_K_M
|
| 51 |
-
| Qwen_Qwen3-4B-Q4_K_M
|
| 52 |
-
| Qwen3-8B-Q4_K_M
|
| 53 |
-
|
| 54 |
---
|
| 55 |
-
|
| 56 |
## Performance Insights
|
| 57 |
-
|
| 58 |
Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings ([arXiv:2505.02214](https://arxiv.org/abs/2505.02214)), Qwen3 models are robust even under lower bit quantization when used appropriately.
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
pipeline_tag: text-generation
|
| 3 |
+
library_name: transformers
|
| 4 |
+
license: apache-2.0 # Assuming Apache 2.0 license, adjust if different.
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen3-8B
|
| 7 |
- Qwen/Qwen3-0.6B
|
|
|
|
| 11 |
- Light weight
|
| 12 |
- Agentic
|
| 13 |
- Conversational
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
# Qwen3 Quantized Models – Lexicons Edition
|
| 16 |
|
|
|
|
| 25 |
|
| 26 |
## Model Overview
|
| 27 |
|
| 28 |
+
**Qwen3** is the latest open-source LLM series developed by Alibaba Group. Released on **April 28, 2025**, the models were trained on **36 trillion tokens** across **119 languages and dialects**. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in [An Empirical Study of Qwen3 Quantization](https://arxiv.org/abs/2505.02214).
|
| 29 |
|
| 30 |
The quantized versions provided here use **4-bit Q4_K_M** precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
|
| 31 |
|
|
|
|
| 33 |
|
| 34 |
## Key Features
|
| 35 |
|
| 36 |
+
- **Efficient Quantization**: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
|
| 37 |
+
- **Multilingual Mastery**: Trained on a massive, diverse corpus covering 119+ languages.
|
| 38 |
+
- **Instruction-Tuned**: Fine-tuned to follow user instructions effectively.
|
| 39 |
+
- **Scalable Sizes**: Choose from 0.6B to 8B parameter models based on your use case.
|
| 40 |
|
| 41 |
|
| 42 |
---
|
|
|
|
| 45 |
|
| 46 |
| Model Name | Parameters | Quantization | Context Length | Recommended Use |
|
| 47 |
|--------------------------|------------|--------------|----------------|--------------------------------------|
|
| 48 |
+
| Qwen_Qwen3-0.6B-Q4_K_M | 0.6B | Q4_K_M | 4K tokens | Lightweight devices, microservices |
|
| 49 |
+
| Qwen_Qwen3-1.7B-Q4_K_M | 1.7B | Q4_K_M | 4K tokens | Fast inference, chatbots |
|
| 50 |
+
| Qwen_Qwen3-4B-Q4_K_M | 4B | Q4_K_M | 4K tokens | Balanced performance & efficiency |
|
| 51 |
+
| Qwen3-8B-Q4_K_M | 8B | Q4_K_M | 128K tokens | Complex reasoning, long documents |
|
|
|
|
| 52 |
---
|
|
|
|
| 53 |
## Performance Insights
|
|
|
|
| 54 |
Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings ([arXiv:2505.02214](https://arxiv.org/abs/2505.02214)), Qwen3 models are robust even under lower bit quantization when used appropriately.
|
| 55 |
+
---
|
| 56 |
+
## Code
|
| 57 |
+
The project is released on [Github](https://github.com/Efficient-ML/Qwen3-Quantization) and [Hugging Face](https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b).
|