Spaces:

visualisable-ai
/

api

Sleeping

api

File size: 4,562 Bytes

ed40a9a

# Multi-Model Support Testing Guide

This guide explains how to test the new multi-model infrastructure locally before committing to GitHub.

## Prerequisites

- Mac Studio M3 Ultra or MacBook Pro M4 Max
- Python 3.8+
- All dependencies installed (`pip install -r requirements.txt`)
- Internet connection (for downloading Code-Llama 7B)

## Quick Start

### Step 1: Start the Backend

In one terminal:

```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python -m uvicorn backend.model_service:app --reload --port 8000
```

**Expected output:**
```
INFO:     Loading CodeGen 350M on Apple Silicon GPU...
INFO:     ✅ CodeGen 350M loaded successfully
INFO:     Layers: 20, Heads: 16
INFO:     Uvicorn running on http://127.0.0.1:8000
```

### Step 2: Run the Test Script

In another terminal:

```bash
cd /Users/garyboon/Development/VisualisableAI/visualisable-ai-backend
python test_multi_model.py
```

## What the Test Script Does

The test script runs 10 comprehensive tests:

1. ✅ **Health Check** - Verifies backend is running
2. ✅ **List Models** - Shows available models (CodeGen, Code-Llama)
3. ✅ **Current Model** - Gets info about loaded model
4. ✅ **Model Info** - Gets detailed architecture info
5. ✅ **Generate (CodeGen)** - Tests text generation with CodeGen
6. ✅ **Switch to Code-Llama** - Loads Code-Llama 7B
7. ✅ **Model Info (Code-Llama)** - Verifies Code-Llama loaded correctly
8. ✅ **Generate (Code-Llama)** - Tests generation with Code-Llama
9. ✅ **Switch Back to CodeGen** - Verifies model unloading works
10. ✅ **Generate (CodeGen again)** - Tests CodeGen still works

## Expected Test Duration

- Tests 1-5 (CodeGen only): ~2-3 minutes
- Test 6 (downloading Code-Llama): ~5-10 minutes (first time only)
- Tests 7-10: ~3-5 minutes

**Total first run:** ~15-20 minutes
**Subsequent runs:** ~5-10 minutes (no download)

## Manual API Testing

If you prefer to test manually, use these curl commands:

### List Available Models
```bash
curl http://localhost:8000/models | jq
```

### Get Current Model
```bash
curl http://localhost:8000/models/current | jq
```

### Switch to Code-Llama
```bash
curl -X POST http://localhost:8000/models/switch \
  -H "Content-Type: application/json" \
  -d '{"model_id": "code-llama-7b"}' | jq
```

### Generate Text
```bash
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "def fibonacci(n):\n    ",
    "max_tokens": 50,
    "temperature": 0.7,
    "extract_traces": false
  }' | jq
```

### Get Model Info
```bash
curl http://localhost:8000/model/info | jq
```

## Success Criteria

Before committing to GitHub, verify:

- ✅ All tests pass
- ✅ CodeGen generates reasonable code
- ✅ Code-Llama loads successfully
- ✅ Code-Llama generates reasonable code
- ✅ Can switch between models multiple times
- ✅ No Python errors in backend logs
- ✅ Memory usage is reasonable (check Activity Monitor)

## Expected Model Behavior

### CodeGen 350M
- Loads in ~5-10 seconds
- Uses ~2-3GB RAM
- Generates Python code (trained on Python only)
- 20 layers, 16 attention heads

### Code-Llama 7B
- First download: ~14GB, takes 5-10 minutes
- Loads in ~30-60 seconds
- Uses ~14-16GB RAM
- Generates multiple languages
- 32 layers, 32 attention heads (GQA with 8 KV heads)

## Troubleshooting

### Backend won't start
```bash
# Check if already running
lsof -i :8000

# Kill existing process
kill -9 <PID>
```

### Import errors
```bash
# Reinstall dependencies
pip install -r requirements.txt
```

### Code-Llama download fails
- Check internet connection
- Verify HuggingFace is accessible: `ping huggingface.co`
- Try downloading manually:
  ```python
  from transformers import AutoModelForCausalLM
  AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
  ```

### Out of memory
- Close other applications
- Use CodeGen only (skip Code-Llama tests)
- Check Activity Monitor for memory usage

## Next Steps After Testing

Once all tests pass:

1. **Document any issues found**
2. **Take note of generation quality**
3. **Check if visualizations need updates** (next phase)
4. **Commit to feature branch** (NOT main)
5. **Test frontend integration**

## Files Modified

This implementation modified/created:

**Backend:**
- `backend/model_config.py` (NEW)
- `backend/model_adapter.py` (NEW)
- `backend/model_service.py` (MODIFIED)
- `test_multi_model.py` (NEW)

**Status:** All changes are in `feature/multi-model-support` branch
**Rollback:** `git checkout pre-multimodel` tag if needed