Krishna Chaitanya Cheedella
commited on
Commit
·
5eb2461
1
Parent(s):
574af27
Fix: Update HuggingFace API to use router.huggingface.co (new endpoint)
Browse files- DEPLOYMENT_SUCCESS.md +183 -0
- backend/api_client.py +17 -56
DEPLOYMENT_SUCCESS.md
ADDED
|
@@ -0,0 +1,183 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎉 SUCCESS! Your LLM Council is Deployed!
|
| 2 |
+
|
| 3 |
+
## ✅ What Was Done
|
| 4 |
+
|
| 5 |
+
### 1. **Completely Refactored** to Use FREE Models
|
| 6 |
+
- ❌ Removed dependency on OpenRouter (which you didn't have)
|
| 7 |
+
- ✅ Added **FREE HuggingFace Inference API** support
|
| 8 |
+
- ✅ Added **OpenAI API** support (using your key)
|
| 9 |
+
|
| 10 |
+
### 2. **Council Members** (Mix of FREE + Low Cost)
|
| 11 |
+
- **Meta Llama 3.3 70B** - FREE via HuggingFace
|
| 12 |
+
- **Qwen 2.5 72B** - FREE via HuggingFace
|
| 13 |
+
- **Mixtral 8x7B** - FREE via HuggingFace
|
| 14 |
+
- **OpenAI GPT-4o-mini** - Low cost (~$0.01/query)
|
| 15 |
+
- **OpenAI GPT-3.5-turbo** - Low cost (~$0.01/query)
|
| 16 |
+
|
| 17 |
+
### 3. **Cost**: ~$0.01-0.03 per query
|
| 18 |
+
- HuggingFace models: **100% FREE!**
|
| 19 |
+
- OpenAI models: Very cheap for synthesis
|
| 20 |
+
|
| 21 |
+
### 4. **Pushed to Your HuggingFace Space**
|
| 22 |
+
✅ Code deployed to: https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council
|
| 23 |
+
|
| 24 |
+
## 🔐 FINAL STEP: Add Secrets to HuggingFace
|
| 25 |
+
|
| 26 |
+
Your space is deployed but **needs API keys** to work. Here's how:
|
| 27 |
+
|
| 28 |
+
### Step 1: Go to Your Space Settings
|
| 29 |
+
Visit: https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council/settings
|
| 30 |
+
|
| 31 |
+
### Step 2: Add Repository Secrets
|
| 32 |
+
Click on "Repository secrets" section and add these two secrets:
|
| 33 |
+
|
| 34 |
+
#### Secret 1: OPENAI_API_KEY
|
| 35 |
+
```
|
| 36 |
+
Name: OPENAI_API_KEY
|
| 37 |
+
Value: <your OpenAI API key from https://platform.openai.com/api-keys>
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
#### Secret 2: HUGGINGFACE_API_KEY
|
| 41 |
+
```
|
| 42 |
+
Name: HUGGINGFACE_API_KEY
|
| 43 |
+
Value: <your HuggingFace token from https://huggingface.co/settings/tokens>
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
### Step 3: Restart the Space
|
| 47 |
+
After adding secrets:
|
| 48 |
+
1. Click "Factory reboot" or just wait a moment
|
| 49 |
+
2. The space will rebuild automatically
|
| 50 |
+
3. Your app will be live!
|
| 51 |
+
|
| 52 |
+
## 🚀 Using Your LLM Council
|
| 53 |
+
|
| 54 |
+
### Web Interface
|
| 55 |
+
Once secrets are added, visit:
|
| 56 |
+
https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council
|
| 57 |
+
|
| 58 |
+
### How to Use
|
| 59 |
+
1. Type your question
|
| 60 |
+
2. Click "Submit"
|
| 61 |
+
3. Wait ~1-2 minutes for the 3-stage process:
|
| 62 |
+
- Stage 1: 5 models answer independently
|
| 63 |
+
- Stage 2: Models rank each other
|
| 64 |
+
- Stage 3: Chairman synthesizes final answer
|
| 65 |
+
|
| 66 |
+
### Example Questions
|
| 67 |
+
- "What is the best programming language to learn in 2025?"
|
| 68 |
+
- "Explain quantum computing in simple terms"
|
| 69 |
+
- "Compare React vs Vue.js for web development"
|
| 70 |
+
|
| 71 |
+
## 📁 New Files Created
|
| 72 |
+
|
| 73 |
+
### Core Functionality
|
| 74 |
+
- `backend/config_free.py` - FREE model configuration
|
| 75 |
+
- `backend/api_client.py` - HuggingFace + OpenAI API client
|
| 76 |
+
- `backend/council_free.py` - 3-stage council logic
|
| 77 |
+
|
| 78 |
+
### Documentation
|
| 79 |
+
- `README.md` - Updated with FREE model info
|
| 80 |
+
- `DEPLOYMENT_SUCCESS.md` - This file!
|
| 81 |
+
- `.gitignore` - Protects secrets
|
| 82 |
+
|
| 83 |
+
### Configuration
|
| 84 |
+
- `.env.example` - Template for local development
|
| 85 |
+
- `requirements.txt` - Updated with openai package
|
| 86 |
+
|
| 87 |
+
## 💡 Cost Breakdown
|
| 88 |
+
|
| 89 |
+
### Per Query
|
| 90 |
+
- HuggingFace models (3): **$0.00** (FREE!)
|
| 91 |
+
- OpenAI GPT-4o-mini: ~$0.01
|
| 92 |
+
- OpenAI GPT-3.5-turbo: ~$0.01
|
| 93 |
+
- **Total**: ~$0.01-0.03 per query
|
| 94 |
+
|
| 95 |
+
### Monthly Estimates
|
| 96 |
+
- Light use (10 queries/day): ~$3-10/month
|
| 97 |
+
- Medium use (50 queries/day): ~$15-50/month
|
| 98 |
+
- Heavy use (200 queries/day): ~$60-200/month
|
| 99 |
+
|
| 100 |
+
## 🔧 Customization Options
|
| 101 |
+
|
| 102 |
+
### Use ALL FREE Models
|
| 103 |
+
Edit `backend/config_free.py` and uncomment the FREE config:
|
| 104 |
+
|
| 105 |
+
```python
|
| 106 |
+
COUNCIL_MODELS = [
|
| 107 |
+
{"id": "meta-llama/Llama-3.3-70B-Instruct", "provider": "huggingface"},
|
| 108 |
+
{"id": "Qwen/Qwen2.5-72B-Instruct", "provider": "huggingface"},
|
| 109 |
+
{"id": "mistralai/Mixtral-8x7B-Instruct-v0.1", "provider": "huggingface"},
|
| 110 |
+
{"id": "google/gemma-2-27b-it", "provider": "huggingface"},
|
| 111 |
+
{"id": "microsoft/Phi-3.5-mini-instruct", "provider": "huggingface"},
|
| 112 |
+
]
|
| 113 |
+
CHAIRMAN_MODEL = {"id": "meta-llama/Llama-3.3-70B-Instruct", "provider": "huggingface"}
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
This would be **100% FREE** (no OpenAI costs)!
|
| 117 |
+
|
| 118 |
+
### Add More OpenAI Models
|
| 119 |
+
If you want higher quality:
|
| 120 |
+
|
| 121 |
+
```python
|
| 122 |
+
COUNCIL_MODELS = [
|
| 123 |
+
{"id": "openai/gpt-4o", "provider": "openai", "model": "gpt-4o"}, # Premium
|
| 124 |
+
{"id": "openai/gpt-4o-mini", "provider": "openai", "model": "gpt-4o-mini"},
|
| 125 |
+
# ... keep HF models too
|
| 126 |
+
]
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
## 🐛 Troubleshooting
|
| 130 |
+
|
| 131 |
+
### "Model is loading" Error
|
| 132 |
+
- HuggingFace models may need to warm up (20-30 seconds)
|
| 133 |
+
- Code automatically waits and retries
|
| 134 |
+
- Normal on first use
|
| 135 |
+
|
| 136 |
+
### OpenAI Errors
|
| 137 |
+
- Check API key is correct in secrets
|
| 138 |
+
- Verify you have OpenAI credits
|
| 139 |
+
- Check usage at https://platform.openai.com/usage
|
| 140 |
+
|
| 141 |
+
### HuggingFace Errors
|
| 142 |
+
- Make sure token has "read" permission
|
| 143 |
+
- Some models may be rate-limited
|
| 144 |
+
- Try using different models
|
| 145 |
+
|
| 146 |
+
## 📊 What's Different from Original
|
| 147 |
+
|
| 148 |
+
| Aspect | Original | Your Version |
|
| 149 |
+
|--------|----------|--------------|
|
| 150 |
+
| **API** | OpenRouter | HuggingFace + OpenAI |
|
| 151 |
+
| **Cost** | $0.05-0.15/query | $0.01-0.03/query |
|
| 152 |
+
| **Free Models** | None | 3 out of 5 |
|
| 153 |
+
| **Setup** | Need OpenRouter account | Use existing keys |
|
| 154 |
+
| **Flexibility** | Fixed models | Can use 100% free |
|
| 155 |
+
|
| 156 |
+
## 🎯 Next Steps
|
| 157 |
+
|
| 158 |
+
### Immediate
|
| 159 |
+
1. ✅ Add secrets to HuggingFace Space (see above)
|
| 160 |
+
2. ✅ Test with a simple question
|
| 161 |
+
3. ✅ Monitor costs in OpenAI dashboard
|
| 162 |
+
|
| 163 |
+
### Optional
|
| 164 |
+
1. Customize model selection in `backend/config_free.py`
|
| 165 |
+
2. Add more FREE HuggingFace models
|
| 166 |
+
3. Share your space with others!
|
| 167 |
+
|
| 168 |
+
## 📚 Resources
|
| 169 |
+
|
| 170 |
+
- **Your Space**: https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council
|
| 171 |
+
- **HuggingFace Inference API**: https://huggingface.co/docs/api-inference/
|
| 172 |
+
- **OpenAI Pricing**: https://openai.com/api/pricing/
|
| 173 |
+
- **Original Project**: https://github.com/machine-theory/lm-council
|
| 174 |
+
|
| 175 |
+
## 🎉 You're All Set!
|
| 176 |
+
|
| 177 |
+
Your LLM Council is deployed and ready to use FREE HuggingFace models + your OpenAI key!
|
| 178 |
+
|
| 179 |
+
**Just add those two secrets and you're live!** 🚀
|
| 180 |
+
|
| 181 |
+
---
|
| 182 |
+
|
| 183 |
+
Questions? Check the other documentation files or the HuggingFace Space logs.
|
backend/api_client.py
CHANGED
|
@@ -89,7 +89,7 @@ async def query_huggingface_model(
|
|
| 89 |
max_retries: int = MAX_RETRIES
|
| 90 |
) -> Optional[Dict[str, Any]]:
|
| 91 |
"""
|
| 92 |
-
Query a HuggingFace model via
|
| 93 |
|
| 94 |
Args:
|
| 95 |
model: HuggingFace model ID (e.g., "meta-llama/Llama-3.3-70B-Instruct")
|
|
@@ -105,20 +105,17 @@ async def query_huggingface_model(
|
|
| 105 |
"Content-Type": "application/json",
|
| 106 |
}
|
| 107 |
|
| 108 |
-
#
|
| 109 |
-
prompt = format_messages_for_hf(messages)
|
| 110 |
-
|
| 111 |
payload = {
|
| 112 |
-
"
|
| 113 |
-
"
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
"do_sample": True,
|
| 118 |
-
}
|
| 119 |
}
|
| 120 |
|
| 121 |
-
|
|
|
|
| 122 |
|
| 123 |
for attempt in range(max_retries + 1):
|
| 124 |
try:
|
|
@@ -128,20 +125,13 @@ async def query_huggingface_model(
|
|
| 128 |
|
| 129 |
data = response.json()
|
| 130 |
|
| 131 |
-
#
|
| 132 |
-
if
|
| 133 |
-
content = data[0]
|
| 134 |
-
|
| 135 |
-
if content.startswith(prompt):
|
| 136 |
-
content = content[len(prompt):].strip()
|
| 137 |
-
elif isinstance(data, dict):
|
| 138 |
-
content = data.get("generated_text", "")
|
| 139 |
-
if content.startswith(prompt):
|
| 140 |
-
content = content[len(prompt):].strip()
|
| 141 |
else:
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
return {"content": content}
|
| 145 |
|
| 146 |
except httpx.TimeoutException as e:
|
| 147 |
print(f"⏱️ Timeout querying HF {model} (attempt {attempt + 1}/{max_retries + 1})")
|
|
@@ -152,10 +142,10 @@ async def query_huggingface_model(
|
|
| 152 |
|
| 153 |
except httpx.HTTPStatusError as e:
|
| 154 |
error_msg = e.response.text
|
| 155 |
-
print(f"🚫 HTTP {e.response.status_code} querying HF {model}: {error_msg[:
|
| 156 |
|
| 157 |
# Model is loading - retry with longer delay
|
| 158 |
-
if "loading" in error_msg.lower():
|
| 159 |
print(f"⏳ Model is loading, waiting 20s...")
|
| 160 |
await asyncio.sleep(20)
|
| 161 |
if attempt < max_retries:
|
|
@@ -180,35 +170,6 @@ async def query_huggingface_model(
|
|
| 180 |
return None
|
| 181 |
|
| 182 |
|
| 183 |
-
def format_messages_for_hf(messages: List[Dict[str, str]]) -> str:
|
| 184 |
-
"""
|
| 185 |
-
Format chat messages for HuggingFace models.
|
| 186 |
-
|
| 187 |
-
Args:
|
| 188 |
-
messages: List of message dicts with 'role' and 'content'
|
| 189 |
-
|
| 190 |
-
Returns:
|
| 191 |
-
Formatted prompt string
|
| 192 |
-
"""
|
| 193 |
-
# Use common chat template format
|
| 194 |
-
prompt = ""
|
| 195 |
-
for msg in messages:
|
| 196 |
-
role = msg["role"]
|
| 197 |
-
content = msg["content"]
|
| 198 |
-
|
| 199 |
-
if role == "system":
|
| 200 |
-
prompt += f"<|system|>\n{content}\n"
|
| 201 |
-
elif role == "user":
|
| 202 |
-
prompt += f"<|user|>\n{content}\n"
|
| 203 |
-
elif role == "assistant":
|
| 204 |
-
prompt += f"<|assistant|>\n{content}\n"
|
| 205 |
-
|
| 206 |
-
# Add assistant prefix for response
|
| 207 |
-
prompt += "<|assistant|>\n"
|
| 208 |
-
|
| 209 |
-
return prompt
|
| 210 |
-
|
| 211 |
-
|
| 212 |
async def query_model(
|
| 213 |
model_config: Dict[str, str],
|
| 214 |
messages: List[Dict[str, str]],
|
|
|
|
| 89 |
max_retries: int = MAX_RETRIES
|
| 90 |
) -> Optional[Dict[str, Any]]:
|
| 91 |
"""
|
| 92 |
+
Query a HuggingFace model via Router (FREE).
|
| 93 |
|
| 94 |
Args:
|
| 95 |
model: HuggingFace model ID (e.g., "meta-llama/Llama-3.3-70B-Instruct")
|
|
|
|
| 105 |
"Content-Type": "application/json",
|
| 106 |
}
|
| 107 |
|
| 108 |
+
# Use OpenAI-compatible format for HuggingFace Router
|
|
|
|
|
|
|
| 109 |
payload = {
|
| 110 |
+
"model": model,
|
| 111 |
+
"messages": messages,
|
| 112 |
+
"max_tokens": 2048,
|
| 113 |
+
"temperature": 0.7,
|
| 114 |
+
"top_p": 0.9,
|
|
|
|
|
|
|
| 115 |
}
|
| 116 |
|
| 117 |
+
# Updated to use router.huggingface.co (new endpoint)
|
| 118 |
+
api_url = "https://router.huggingface.co/v1/chat/completions"
|
| 119 |
|
| 120 |
for attempt in range(max_retries + 1):
|
| 121 |
try:
|
|
|
|
| 125 |
|
| 126 |
data = response.json()
|
| 127 |
|
| 128 |
+
# Parse OpenAI-compatible response format
|
| 129 |
+
if "choices" in data and len(data["choices"]) > 0:
|
| 130 |
+
content = data["choices"][0]["message"]["content"]
|
| 131 |
+
return {"content": content}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
else:
|
| 133 |
+
print(f"❌ Unexpected response format from HF {model}: {data}")
|
| 134 |
+
return None
|
|
|
|
| 135 |
|
| 136 |
except httpx.TimeoutException as e:
|
| 137 |
print(f"⏱️ Timeout querying HF {model} (attempt {attempt + 1}/{max_retries + 1})")
|
|
|
|
| 142 |
|
| 143 |
except httpx.HTTPStatusError as e:
|
| 144 |
error_msg = e.response.text
|
| 145 |
+
print(f"🚫 HTTP {e.response.status_code} querying HF {model}: {error_msg[:200]}")
|
| 146 |
|
| 147 |
# Model is loading - retry with longer delay
|
| 148 |
+
if "loading" in error_msg.lower() or "warming up" in error_msg.lower():
|
| 149 |
print(f"⏳ Model is loading, waiting 20s...")
|
| 150 |
await asyncio.sleep(20)
|
| 151 |
if attempt < max_retries:
|
|
|
|
| 170 |
return None
|
| 171 |
|
| 172 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
async def query_model(
|
| 174 |
model_config: Dict[str, str],
|
| 175 |
messages: List[Dict[str, str]],
|