SandLogicTechnologies
/

Qwen3-GGUF

@@ -1,4 +1,7 @@
 ---
 base_model:
 - Qwen/Qwen3-8B
 - Qwen/Qwen3-0.6B
@@ -8,10 +11,6 @@ tags:
 - Light weight
 - Agentic
 - Conversational
-language:
-- en
-pipeline_tag: text-generation
-library_name: transformers
 ---
 # Qwen3 Quantized Models – Lexicons Edition
@@ -26,7 +25,7 @@ This repository provides quantized versions of the **Qwen3** language models, op
 ## Model Overview
-**Qwen3** is the latest open-source LLM series developed by Alibaba Group. Released on **April 28, 2025**, the models were trained on **36 trillion tokens** across **119 languages and dialects**. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities.
 The quantized versions provided here use **4-bit Q4_K_M** precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
@@ -34,10 +33,10 @@ The quantized versions provided here use **4-bit Q4_K_M** precision ensuring hig
 ## Key Features
--  **Efficient Quantization**: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
--  **Multilingual Mastery**: Trained on a massive, diverse corpus covering 119+ languages.
--  **Instruction-Tuned**: Fine-tuned to follow user instructions effectively.
--  **Scalable Sizes**: Choose from 0.6B to 8B parameter models based on your use case.
 ---
@@ -46,15 +45,13 @@ The quantized versions provided here use **4-bit Q4_K_M** precision ensuring hig
 | Model Name | Parameters | Quantization | Context Length | Recommended Use |
 |--------------------------|------------|--------------|----------------|--------------------------------------|
-| Qwen_Qwen3-0.6B-Q4_K_M   | 0.6B       | Q4_K_M       | 4K tokens      | Lightweight devices, microservices   |
-| Qwen_Qwen3-1.7B-Q4_K_M   | 1.7B       | Q4_K_M       | 4K tokens      | Fast inference, chatbots             |
-| Qwen_Qwen3-4B-Q4_K_M     | 4B         | Q4_K_M       | 4K tokens      | Balanced performance & efficiency    |
-| Qwen3-8B-Q4_K_M          | 8B         | Q4_K_M       | 128K tokens    | Complex reasoning, long documents    |
 ---
 ## Performance Insights
 Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings ([arXiv:2505.02214](https://arxiv.org/abs/2505.02214)), Qwen3 models are robust even under lower bit quantization when used appropriately.
----

 ---
+pipeline_tag: text-generation
+library_name: transformers
+license: apache-2.0 # Assuming Apache 2.0 license, adjust if different.
 base_model:
 - Qwen/Qwen3-8B
 - Qwen/Qwen3-0.6B
 - Light weight
 - Agentic
 - Conversational
 ---
 # Qwen3 Quantized Models – Lexicons Edition
 ## Model Overview
+**Qwen3** is the latest open-source LLM series developed by Alibaba Group. Released on **April 28, 2025**, the models were trained on **36 trillion tokens** across **119 languages and dialects**. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in [An Empirical Study of Qwen3 Quantization](https://arxiv.org/abs/2505.02214).
 The quantized versions provided here use **4-bit Q4_K_M** precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
 ## Key Features
+- **Efficient Quantization**: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
+- **Multilingual Mastery**: Trained on a massive, diverse corpus covering 119+ languages.
+- **Instruction-Tuned**: Fine-tuned to follow user instructions effectively.
+- **Scalable Sizes**: Choose from 0.6B to 8B parameter models based on your use case.
 ---
 | Model Name | Parameters | Quantization | Context Length | Recommended Use |
 |--------------------------|------------|--------------|----------------|--------------------------------------|
+| Qwen_Qwen3-0.6B-Q4_K_M | 0.6B | Q4_K_M | 4K tokens | Lightweight devices, microservices |
+| Qwen_Qwen3-1.7B-Q4_K_M | 1.7B | Q4_K_M | 4K tokens | Fast inference, chatbots |
+| Qwen_Qwen3-4B-Q4_K_M | 4B | Q4_K_M | 4K tokens | Balanced performance & efficiency |
+| Qwen3-8B-Q4_K_M | 8B | Q4_K_M | 128K tokens | Complex reasoning, long documents |
 ---
 ## Performance Insights
 Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings ([arXiv:2505.02214](https://arxiv.org/abs/2505.02214)), Qwen3 models are robust even under lower bit quantization when used appropriately.
+---
+## Code
+The project is released on [Github](https://github.com/Efficient-ML/Qwen3-Quantization) and [Hugging Face](https://huggingface.co/collections/Efficient-ML/qwen3-quantization-68164450decb1c868788cb2b).