gpt-oss-20b-pruned-GGUF
Collection
Bascially GPT OSS but for low end device, format in GGUF
β’
4 items
β’
Updated
β’
1
This repository contains multiple quantized versions of the gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts model in GGUF format.
It is intended for efficient inference on consumer hardware, making large model deployment more accessible.
from huggingface_hub import hf_hub_download
model_path = hf_hub_download("leeminwaan/gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF", "gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-q4_k_m.gguf")
print("Downloaded:", model_path)
Quantized versions available:
| Quantization | Size (vs. FP16) | Speed | Quality | Recommended For |
|---|---|---|---|---|
| Q2_K | Smallest | Fastest | Low | Prototyping, minimal RAM/CPU |
| Q3_K_S | Very Small | Very Fast | Low-Med | Lightweight devices, testing |
| Q3_K_M | Small | Fast | Med | Lightweight, slightly better quality |
| Q3_K_L | Small-Med | Fast | Med | Faster inference, fair quality |
| Q4_0 | Medium | Fast | Good | General use, chats, low RAM |
| Q4_1 | Medium | Fast | Good+ | Recommended, slightly better quality |
| Q4_K_S | Medium | Fast | Good+ | Recommended, balanced |
| Q4_K_M | Medium | Fast | Good++ | Recommended, best Q4 option |
| Q5_0 | Larger | Moderate | Very Good | Chatbots, longer responses |
| Q5_1 | Larger | Moderate | Very Good+ | More demanding tasks |
| Q5_K_S | Larger | Moderate | Very Good+ | Advanced users, better accuracy |
| Q5_K_M | Larger | Moderate | Excellent | Demanding tasks, high quality |
| Q6_K | Large | Slower | Near FP16 | Power users, best quantized quality |
| Q8_0 | Largest | Slowest | FP16-like | Maximum quality, high RAM/CPU |
Note:
- Lower quantization = smaller model, faster inference, but lower output quality.
- Q4_K_M is ideal for most users; Q6_K/Q8_0 offer the highest quality, best for advanced use.
- All quantizations are suitable for consumer hardwareβselect based on your quality/speed needs.
BibTeX:
@miscgpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF,
title=gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF Quantized Models},
author={leeminwaan},
year={2025},
howpublished={\url{https://huggingface.co/leeminwaan/gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF}}
}
APA:
leeminwaan. (2025). gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF Quantized Models [Computer software]. Hugging Face. https://huggingface.co/leeminwaan/gpt-oss-6.0b-specialized-all-pruned-moe-only-7-experts-GGUF
2-bit
3-bit