GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper
•
2210.17323
•
Published
•
10
4-bit GPTQ quantized version of FuseChat-Qwen-2.5-7B-Instruct for inference with the Private LLM app.
Base model
FuseAI/FuseChat-Qwen-2.5-7B-Instruct