SandLogicTechnologies commited on
Commit
e7176b5
·
verified ·
1 Parent(s): 0263d8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -18
README.md CHANGED
@@ -1,4 +1,7 @@
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen3-8B
4
  - Qwen/Qwen3-0.6B
@@ -8,10 +11,6 @@ tags:
8
  - Light weight
9
  - Agentic
10
  - Conversational
11
- language:
12
- - en
13
- pipeline_tag: text-generation
14
- library_name: transformers
15
  ---
16
  # Qwen3 Quantized Models – Lexicons Edition
17
 
@@ -26,7 +25,7 @@ This repository provides quantized versions of the **Qwen3** language models, op
26
 
27
  ## Model Overview
28
 
29
- **Qwen3** is the latest open-source LLM series developed by Alibaba Group. Released on **April 28, 2025**, the models were trained on **36 trillion tokens** across **119 languages and dialects**. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities.
30
 
31
  The quantized versions provided here use **4-bit Q4_K_M** precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
32
 
@@ -34,10 +33,10 @@ The quantized versions provided here use **4-bit Q4_K_M** precision ensuring hig
34
 
35
  ## Key Features
36
 
37
- - **Efficient Quantization**: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
38
- - **Multilingual Mastery**: Trained on a massive, diverse corpus covering 119+ languages.
39
- - **Instruction-Tuned**: Fine-tuned to follow user instructions effectively.
40
- - **Scalable Sizes**: Choose from 0.6B to 8B parameter models based on your use case.
41
 
42
 
43
  ---
@@ -46,15 +45,13 @@ The quantized versions provided here use **4-bit Q4_K_M** precision ensuring hig
46
 
47
  | Model Name | Parameters | Quantization | Context Length | Recommended Use |
48
  |--------------------------|------------|--------------|----------------|--------------------------------------|
49
- | Qwen_Qwen3-0.6B-Q4_K_M | 0.6B | Q4_K_M | 4K tokens | Lightweight devices, microservices |
50
- | Qwen_Qwen3-1.7B-Q4_K_M | 1.7B | Q4_K_M | 4K tokens | Fast inference, chatbots |
51
- | Qwen_Qwen3-4B-Q4_K_M | 4B | Q4_K_M | 4K tokens | Balanced performance & efficiency |
52
- | Qwen3-8B-Q4_K_M | 8B | Q4_K_M | 128K tokens | Complex reasoning, long documents |
53
-
54
  ---
55
-
56
  ## Performance Insights
57
-
58
  Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings ([arXiv:2505.02214](https://arxiv.org/abs/2505.02214)), Qwen3 models are robust even under lower bit quantization when used appropriately.
59
-
60
- ---
 
 
1
  ---
2
+ pipeline_tag: text-generation
3
+ library_name: transformers
4
+ license: apache-2.0 # Assuming Apache 2.0 license, adjust if different.
5
  base_model:
6
  - Qwen/Qwen3-8B
7
  - Qwen/Qwen3-0.6B
 
11
  - Light weight
12
  - Agentic
13
  - Conversational
 
 
 
 
14
  ---
15
  # Qwen3 Quantized Models – Lexicons Edition
16
 
 
25
 
26
  ## Model Overview
27
 
28
+ **Qwen3** is the latest open-source LLM series developed by Alibaba Group. Released on **April 28, 2025**, the models were trained on **36 trillion tokens** across **119 languages and dialects**. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in [An Empirical Study of Qwen3 Quantization](https://arxiv.org/abs/2505.02214).
29
 
30
  The quantized versions provided here use **4-bit Q4_K_M** precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
31
 
 
33
 
34
  ## Key Features
35
 
36
+ - **Efficient Quantization**: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
37
+ - **Multilingual Mastery**: Trained on a massive, diverse corpus covering 119+ languages.
38
+ - **Instruction-Tuned**: Fine-tuned to follow user instructions effectively.
39
+ - **Scalable Sizes**: Choose from 0.6B to 8B parameter models based on your use case.
40
 
41
 
42
  ---
 
45
 
46
  | Model Name | Parameters | Quantization | Context Length | Recommended Use |
47
  |--------------------------|------------|--------------|----------------|--------------------------------------|
48
+ | Qwen_Qwen3-0.6B-Q4_K_M | 0.6B | Q4_K_M | 4K tokens | Lightweight devices, microservices |
49
+ | Qwen_Qwen3-1.7B-Q4_K_M | 1.7B | Q4_K_M | 4K tokens | Fast inference, chatbots |
50
+ | Qwen_Qwen3-4B-Q4_K_M | 4B | Q4_K_M | 4K tokens | Balanced performance & efficiency |
51
+ | Qwen3-8B-Q4_K_M | 8B | Q4_K_M | 128K tokens | Complex reasoning, long documents |
 
52
  ---
 
53
  ## Performance Insights
 
54
  Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings ([arXiv:2505.02214](https://arxiv.org/abs/2505.02214)), Qwen3 models are robust even under lower bit quantization when used appropriately.
55
+ ---
56
+ ## Code
57
+ The project is released on [Github](https://github.com/Efficient-ML/Qwen3-Quantization) and [Hugging Face](https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b).