Spaces:
Sleeping
Sleeping
File size: 4,562 Bytes
ed40a9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# Multi-Model Support Testing Guide
This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.
## Prerequisites
- Mac Studio M3 Ultra or MacBook Pro M4 Max
- Python 3.8+
- All dependencies installed (`pip install -r requirements.txt`)
- Internet connection (for downloading Code-Llama 7B)
## Quick Start
### Step 1: Start the Backend
In one terminal:
```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000
```
**Expected output:**
```
INFO: Loading CodeGen 350M on Apple Silicon GPU...
INFO: β
CodeGen 350M loaded successfully
INFO: Layers: 20, Heads: 16
INFO: Uvicorn running on http://127.0.0.1:8000
```
### Step 2: Run the Test Script
In another terminal:
```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py
```
## What the Test Script Does
The test script runs 10 comprehensive tests:
1. β
**Health Check** - Verifies backend is running
2. β
**List Models** - Shows available models (CodeGen, Code-Llama)
3. β
**Current Model** - Gets info about loaded model
4. β
**Model Info** - Gets detailed architecture info
5. β
**Generate (CodeGen)** - Tests text generation with CodeGen
6. β
**Switch to Code-Llama** - Loads Code-Llama 7B
7. β
**Model Info (Code-Llama)** - Verifies Code-Llama loaded correctly
8. β
**Generate (Code-Llama)** - Tests generation with Code-Llama
9. β
**Switch Back to CodeGen** - Verifies model unloading works
10. β
**Generate (CodeGen again)** - Tests CodeGen still works
## Expected Test Duration
- Tests 1-5 (CodeGen only): ~2-3 minutes
- Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
- Tests 7-10: ~3-5 minutes
**Total first run:** ~15-20 minutes
**Subsequent runs:** ~5-10 minutes (no download)
## Manual API Testing
If you prefer to test manually, use these curl commands:
### List Available Models
```bash
curl http://localhost:8000/models | jq
```
### Get Current Model
```bash
curl http://localhost:8000/models/current | jq
```
### Switch to Code-Llama
```bash
curl -X POST http://localhost:8000/models/switch \
-H "Content-Type: application/json" \
-d '{"model_id": "code-llama-7b"}' | jq
```
### Generate Text
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "def fibonacci(n):\n ",
"max_tokens": 50,
"temperature": 0.7,
"extract_traces": false
}' | jq
```
### Get Model Info
```bash
curl http://localhost:8000/model/info | jq
```
## Success Criteria
Before committing to GitHub, verify:
- β
All tests pass
- β
CodeGen generates reasonable code
- β
Code-Llama loads successfully
- β
Code-Llama generates reasonable code
- β
Can switch between models multiple times
- β
No Python errors in backend logs
- β
Memory usage is reasonable (check Activity Monitor)
## Expected Model Behavior
### CodeGen 350M
- Loads in ~5-10 seconds
- Uses ~2-3GB RAM
- Generates Python code (trained on Python only)
- 20 layers, 16 attention heads
### Code-Llama 7B
- First download: ~14GB, takes 5-10 minutes
- Loads in ~30-60 seconds
- Uses ~14-16GB RAM
- Generates multiple languages
- 32 layers, 32 attention heads (GQA with 8 KV heads)
## Troubleshooting
### Backend won't start
```bash
# Check if already running
lsof -i :8000
# Kill existing process
kill -9 <PID>
```
### Import errors
```bash
# Reinstall dependencies
pip install -r requirements.txt
```
### Code-Llama download fails
- Check internet connection
- Verify HuggingFace is accessible: `ping huggingface.co`
- Try downloading manually:
```python
from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
```
### Out of memory
- Close other applications
- Use CodeGen only (skip Code-Llama tests)
- Check Activity Monitor for memory usage
## Next Steps After Testing
Once all tests pass:
1. **Document any issues found**
2. **Take note of generation quality**
3. **Check if visualizations need updates** (next phase)
4. **Commit to feature branch** (NOT main)
5. **Test frontend integration**
## Files Modified
This implementation modified/created:
**Backend:**
- `backend/model_config.py` (NEW)
- `backend/model_adapter.py` (NEW)
- `backend/model_service.py` (MODIFIED)
- `test_multi_model.py` (NEW)
**Status:** All changes are in `feature/multi-model-support` branch
**Rollback:** `git checkout pre-multimodel` tag if needed
|