Why?
Why should I use yours vs unsloth?
Thanks a lot for the question!
As we wrote in the Acknowledgements, our work is deeply inspired by many open-source authors, and we see them as our heroes and pioneers rather than competitors. In particular, we are very grateful to the unsloth team: their BF16 GGUF releases and imatrix files made it much easier for others (including us) to imagine and share additional quantization variants.
Regarding our quantizations : we tend to keep higher precision for the attention part (since the experts are the really “chubby” part, lol), we also try to keep the quantization pattern as consistent as possible across layers. We believe these may be helpful for some edge-deployment scenarios and for people experimenting with customized accelerations.
In the end, which quant to use is entirely your choice. We are happy to answer any questions you have and equally happy if our work is just one more option on your shortlist. Thank you again for your interest and for taking the time to compare different projects!