"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
v1k
xbruce22
AI & ML interests
None yet
Recent Activity
liked
a model
2 days ago
jdopensource/JoyAI-LLM-Flash
liked
a model
2 days ago
MiniMaxAI/MiniMax-M2.5
liked
a model
15 days ago
stepfun-ai/Step-3.5-Flash