Run with llamacpp
This is now supported by llamacpp. Good news especially for "mps" apple silicon users.
First download https://huggingface.co/ggml-org/LightOnOCR-1B-1025-GGUF
and then
llama-server \
--host 0.0.0.0 \
--port 4183 \
-m "LightOnOCR-1B-1025-Q8_0.gguf" \
--mmproj "mmproj-LightOnOCR-1B-1025-Q8_0.gguf" \
-c 8192 --n_predict 8192 --temp 0.2 \
--top-p 0.9 \
--repeat-penalty 1.0 \
--cache-type-k q8_0 \
--threads 16 \
-ub 2048 -b 2048 --jinja \
-ngl -1
Made some more quants for it:
https://huggingface.co/noctrex/LightOnOCR-1B-1025-GGUF
https://huggingface.co/noctrex/LightOnOCR-1B-1025-i1-GGUF
can you share the steps to create mmproj-bf16.gguf please? the following does not create a mmproj file for me:
python3 "convert_hf_to_gguf.py" LightOnOCR-1B-1025 --mmproj --outfile LightOnOCR-1B-1025-bf16.gguf --outtype bf16
If you have the latest github version, that's the correct command, you actually created the mmproj as "LightOnOCR-1B-1025-bf16.gguf".
oh! Thanks.. it works now:
python3 convert_hf_to_gguf.py . --mmproj --outtype bf16
Made some more quants for it:
https://huggingface.co/noctrex/LightOnOCR-1B-1025-GGUF
https://huggingface.co/noctrex/LightOnOCR-1B-1025-i1-GGUF
I have downloaded your BF16 GGUF. It is working perfectly. Very nice results by the model. (y)
Can you please let me know how we can use this using ollama
Made some more quants for it:
https://huggingface.co/noctrex/LightOnOCR-1B-1025-GGUF
https://huggingface.co/noctrex/LightOnOCR-1B-1025-i1-GGUF
I've tried using the Q4_K_S quant with the F16 mmproj on an M1 macbook and it just spits out newlines over and over, regardless of input or instructions.
Update: The Q8 quant and F32 mmproj seems to work, more or less.
@c0bra Then you need higher precision, try to go with the mmproj-F32, and the Q8_0 model, or even BF16
I tried using
python3 convert_hf_to_gguf.py ../lightonocr --mmproj --outtype f16 --dry-run
But I am getting the following error:
ValueError: Unsupported model type: lightonocr_vision
Any idea why I am facing this error? I have installed the transformer from https://github.com/baptiste-aubertin/transformers.git
Note: I have finetuned the original model.
Create a venv and install the official packages as the instructions say, and it will work.