β οΈ ## Notice on Current Model Scope
Please note that Yuuki, in its current state, represents approximately 3.7% of the total training planned for version v0.1.
At this stage, Yuuki should be considered an early and incomplete snapshot of the model. The full v0.1 release, which will include the remaining training stages, additional refinements, and stabilization, will be released at a later time.
As such, performance, behavior, or capability assessments based on the current version of Yuuki do not reflect the final characteristics of the v0.1 model.
Further updates will be provided as development progresses.
πΈ Yuuki v0.1 - The $0 Code LLM
β οΈ WORK IN PROGRESS - Currently training on mobile CPU (Day 3/42)
π― The Mission
Prove that you DON'T need expensive GPUs to train LLMs.
Yuuki is a code generation model trained entirely on a $150 Android phone with:
β No cloud compute
β No GPU
β No data center
β Just determination and time
The Setup
Hardware: Snapdragon 685 (8-core ARM CPU) RAM: 6GB Storage: 128GB NPU: Hexagon 686 (1 TOPS) GPU: Adreno 610 (243 GFLOPS) - NOT USED for training Cost: $0 in compute
π Current Status
Metric Value
Progress 1,417 / 37,500 steps (3.78%) Epoch 0.08 / 2.0 Current Loss ~1.70 - 2.23 Best Loss 1.7053 β Training Time ~3 days ETA ~39 days remaining Speed ~100 sec/step
Loss Progression
Step 0: Loss 3.35 (baseline) Step 500: Loss 2.50 β -25% Step 1000: Loss 2.00 β -40% Step 1265: Loss 1.83 β -45% Step 1292: Loss 1.71 β -49% β RECORD Step 1417: Loss 2.23 (current, oscillating 1.7-2.3)
π What Yuuki Knows (So Far)
Due to alphabetically-ordered dataset:
Language Exposure Quality Status
Agda High 85/100 β Excellent C Starting 30/100 β³ Learning Assembly Low 5/100 π± Minimal Python None 0/100 β Not reached yet
Example Output (Step 1,300)
Agda prompt: module Main where
module Main where (x, f) in a
open import Cubical.Sigma
open import Cubical.Sigma.Core
open import Cubical.Foundations.H
β Real Agda libraries! The model learned actual Cubical type theory modules.
π οΈ Training Configuration
Model: DistilGPT-2 (82M parameters)
Dataset: The Stack (75,000 examples)
Batch size: 1
Gradient accumulation: 4
Effective batch: 4
Learning rate: 5e-5
Max length: 256 tokens
Optimizer: AdamW
Epochs: 2
Total tokens: ~30M (2 epochs)
Why so slow?
100 seconds/step Γ 37,500 steps = 3,750,000 seconds
= 1,042 hours
= 43.4 days
= ~6 weeks of continuous training
No GPU acceleration. Pure CPU grinding. πͺ
π Roadmap
v0.1 (Current - Proof of Concept)
[x] Setup training pipeline
[x] Start training (Step 0)
[x] Reach Step 1,000
[x] Break loss 2.0 barrier
[x] Break loss 1.8 barrier β
[ ] Checkpoint 2,500 (7%)
[ ] Checkpoint 5,000 (13%)
[ ] Checkpoint 10,000 (27%)
[ ] Checkpoint 18,750 (50% - Epoch 1 complete)
[ ] Checkpoint 37,500 (100% - DONE)
[ ] Quantize to INT8
[ ] Convert to ONNX
[ ] Publish final model
ETA: Mid-March 2026
v0.2 (The Full Dataset)
Dataset: 786,387 examples (full Stack)
Duration: 418 days (~14 months)
Epochs: 2.0
Total tokens: ~314M
Dataset fix: SHUFFLED (not alphabetical)
Languages: All 80+ languages balanced
Start: March 2026
End: May 2027
v0.3+ (PC Era)
Hardware upgrade: RTX 4060/4070
Larger models: 350M-1B parameters
Faster training: ~30x speedup
Advanced techniques: LoRA, QLoRA, etc.
π‘ Philosophy
"The barrier to AI isn't money. It's mindset."
This project demonstrates: β
You CAN train LLMs without GPUs
β
Patience > Hardware
β
$0 budget is enough to start
β
Limited resources inspire creativity
β
Anyone can contribute to AI
π Usage (After Training Completes)
from transformers import AutoModelForCausalLM, AutoTokenizer
Load model
model = AutoModelForCausalLM.from_pretrained("OpceanAI/Yuuki")
tokenizer = AutoTokenizer.from_pretrained("OpceanAI/Yuuki")
Generate code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
code = tokenizer.decode(outputs[0])
print(code)
Quantized (4x faster, 4x smaller)
Coming after training completes
model = AutoModelForCausalLM.from_pretrained(
"OpceanAI/Yuuki",
subfolder="yuuki-v0.1-int8"
)
β οΈ Known Limitations
Dataset order: Alphabetical (not shuffled) - learns early languages best
Token count: Only ~30M tokens (vs GPT-2's 40B)
Training speed: Very slow (~100 sec/step)
Model size: Small (82M params)
Language coverage: Incomplete due to alphabetical ordering
These will be addressed in v0.2 with shuffled dataset.
π¬ Technical Details
CPU Training (100 sec/step):
Forward pass: 40 sec
Backward pass: 40 sec
Optimizer: 20 sec
Total: ~100 sec
vs GPU Training (0.5 sec/step):
200x faster
But costs $0.50-$2.00/hour
42 days = $500-$2,000
Mobile: FREE but SLOW
GPU: FAST but EXPENSIVE
For proof of concept: Mobile wins. π
π Benchmarks (Post-Training)
Coming soon after training completes (~March 2026).
Expected performance:
Agda: 85-95/100 (primary language)
C: 85-92/100 (secondary language)
Assembly: 75-85/100 (tertiary)
Python: 10-20/100 (barely seen due to alphabet order)
π Acknowledgments
HuggingFace: Infrastructure and transformers library
BigCode: The Stack dataset
The ML community: For saying "you need GPUs" - best motivation π
π License
Apache 2.0 - See LICENSE file. You can use Yuuki commercially, modify it, distribute it. Just give credit. β
π Links
GitHub: (https://github.com/aguitauwu)
Discord: (https://discord.gg/j8zV2u8k)
Progress updates: Check this model card
π Updates
2026-01-29: Training started
2026-01-29: Step 1,000 reached - Loss 2.00
2026-01-29: Step 1,292 - NEW RECORD Loss 1.7053
2026-01-29: Repository created on HuggingFace
Last updated: 2026-01-29
Follow the journey of training an LLM with $0 budget. One step at a time. πΈ
- Downloads last month
- 12
Model tree for OpceanAI/Yuuki-3.7
Base model
openai-community/gpt2