Spaces:

zade-frontier
/

andrej-karpathy-llm-council

Running

App Files Files Community

Krishna Chaitanya Cheedella commited on 15 days ago

Commit

5eb2461

1 Parent(s): 574af27

Fix: Update HuggingFace API to use router.huggingface.co (new endpoint)

Browse files

Files changed (2) hide show

DEPLOYMENT_SUCCESS.md +183 -0
backend/api_client.py +17 -56

DEPLOYMENT_SUCCESS.md ADDED Viewed

	@@ -0,0 +1,183 @@

+# 🎉 SUCCESS! Your LLM Council is Deployed!
+## ✅ What Was Done
+### 1. **Completely Refactored** to Use FREE Models
+- ❌ Removed dependency on OpenRouter (which you didn't have)
+- ✅ Added **FREE HuggingFace Inference API** support
+- ✅ Added **OpenAI API** support (using your key)
+### 2. **Council Members** (Mix of FREE + Low Cost)
+- **Meta Llama 3.3 70B** - FREE via HuggingFace
+- **Qwen 2.5 72B** - FREE via HuggingFace
+- **Mixtral 8x7B** - FREE via HuggingFace
+- **OpenAI GPT-4o-mini** - Low cost (~$0.01/query)
+- **OpenAI GPT-3.5-turbo** - Low cost (~$0.01/query)
+### 3. **Cost**: ~$0.01-0.03 per query
+- HuggingFace models: **100% FREE!**
+- OpenAI models: Very cheap for synthesis
+### 4. **Pushed to Your HuggingFace Space**
+✅ Code deployed to: https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council
+## 🔐 FINAL STEP: Add Secrets to HuggingFace
+Your space is deployed but **needs API keys** to work. Here's how:
+### Step 1: Go to Your Space Settings
+Visit: https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council/settings
+### Step 2: Add Repository Secrets
+Click on "Repository secrets" section and add these two secrets:
+#### Secret 1: OPENAI_API_KEY
+```
+Name: OPENAI_API_KEY
+Value: <your OpenAI API key from https://platform.openai.com/api-keys>
+```
+#### Secret 2: HUGGINGFACE_API_KEY
+```
+Name: HUGGINGFACE_API_KEY
+Value: <your HuggingFace token from https://huggingface.co/settings/tokens>
+```
+### Step 3: Restart the Space
+After adding secrets:
+1. Click "Factory reboot" or just wait a moment
+2. The space will rebuild automatically
+3. Your app will be live!
+## 🚀 Using Your LLM Council
+### Web Interface
+Once secrets are added, visit:
+https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council
+### How to Use
+1. Type your question
+2. Click "Submit"
+3. Wait ~1-2 minutes for the 3-stage process:
+   - Stage 1: 5 models answer independently
+   - Stage 2: Models rank each other
+   - Stage 3: Chairman synthesizes final answer
+### Example Questions
+- "What is the best programming language to learn in 2025?"
+- "Explain quantum computing in simple terms"
+- "Compare React vs Vue.js for web development"
+## 📁 New Files Created
+### Core Functionality
+- `backend/config_free.py` - FREE model configuration
+- `backend/api_client.py` - HuggingFace + OpenAI API client
+- `backend/council_free.py` - 3-stage council logic
+### Documentation
+- `README.md` - Updated with FREE model info
+- `DEPLOYMENT_SUCCESS.md` - This file!
+- `.gitignore` - Protects secrets
+### Configuration
+- `.env.example` - Template for local development
+- `requirements.txt` - Updated with openai package
+## 💡 Cost Breakdown
+### Per Query
+- HuggingFace models (3): **$0.00** (FREE!)
+- OpenAI GPT-4o-mini: ~$0.01
+- OpenAI GPT-3.5-turbo: ~$0.01
+- **Total**: ~$0.01-0.03 per query
+### Monthly Estimates
+- Light use (10 queries/day): ~$3-10/month
+- Medium use (50 queries/day): ~$15-50/month
+- Heavy use (200 queries/day): ~$60-200/month
+## 🔧 Customization Options
+### Use ALL FREE Models
+Edit `backend/config_free.py` and uncomment the FREE config:
+```python
+COUNCIL_MODELS = [
+    {"id": "meta-llama/Llama-3.3-70B-Instruct", "provider": "huggingface"},
+    {"id": "Qwen/Qwen2.5-72B-Instruct", "provider": "huggingface"},
+    {"id": "mistralai/Mixtral-8x7B-Instruct-v0.1", "provider": "huggingface"},
+    {"id": "google/gemma-2-27b-it", "provider": "huggingface"},
+    {"id": "microsoft/Phi-3.5-mini-instruct", "provider": "huggingface"},
+]
+CHAIRMAN_MODEL = {"id": "meta-llama/Llama-3.3-70B-Instruct", "provider": "huggingface"}
+```
+This would be **100% FREE** (no OpenAI costs)!
+### Add More OpenAI Models
+If you want higher quality:
+```python
+COUNCIL_MODELS = [
+    {"id": "openai/gpt-4o", "provider": "openai", "model": "gpt-4o"},  # Premium
+    {"id": "openai/gpt-4o-mini", "provider": "openai", "model": "gpt-4o-mini"},
+    # ... keep HF models too
+]
+```
+## 🐛 Troubleshooting
+### "Model is loading" Error
+- HuggingFace models may need to warm up (20-30 seconds)
+- Code automatically waits and retries
+- Normal on first use
+### OpenAI Errors
+- Check API key is correct in secrets
+- Verify you have OpenAI credits
+- Check usage at https://platform.openai.com/usage
+### HuggingFace Errors
+- Make sure token has "read" permission
+- Some models may be rate-limited
+- Try using different models
+## 📊 What's Different from Original
+| Aspect | Original | Your Version |
+|--------|----------|--------------|
+| **API** | OpenRouter | HuggingFace + OpenAI |
+| **Cost** | $0.05-0.15/query | $0.01-0.03/query |
+| **Free Models** | None | 3 out of 5 |
+| **Setup** | Need OpenRouter account | Use existing keys |
+| **Flexibility** | Fixed models | Can use 100% free |
+## 🎯 Next Steps
+### Immediate
+1. ✅ Add secrets to HuggingFace Space (see above)
+2. ✅ Test with a simple question
+3. ✅ Monitor costs in OpenAI dashboard
+### Optional
+1. Customize model selection in `backend/config_free.py`
+2. Add more FREE HuggingFace models
+3. Share your space with others!
+## 📚 Resources
+- **Your Space**: https://huggingface.co/spaces/zade-frontier/andrej-karpathy-llm-council
+- **HuggingFace Inference API**: https://huggingface.co/docs/api-inference/
+- **OpenAI Pricing**: https://openai.com/api/pricing/
+- **Original Project**: https://github.com/machine-theory/lm-council
+## 🎉 You're All Set!
+Your LLM Council is deployed and ready to use FREE HuggingFace models + your OpenAI key!
+**Just add those two secrets and you're live!** 🚀
+---
+Questions? Check the other documentation files or the HuggingFace Space logs.

backend/api_client.py CHANGED Viewed

@@ -89,7 +89,7 @@ async def query_huggingface_model(
     max_retries: int = MAX_RETRIES
 ) -> Optional[Dict[str, Any]]:
     """
-    Query a HuggingFace model via Inference API (FREE).
     Args:
         model: HuggingFace model ID (e.g., "meta-llama/Llama-3.3-70B-Instruct")
@@ -105,20 +105,17 @@ async def query_huggingface_model(
         "Content-Type": "application/json",
     }
-    # Convert messages to prompt format for HuggingFace
-    prompt = format_messages_for_hf(messages)
     payload = {
-        "inputs": prompt,
-        "parameters": {
-            "max_new_tokens": 2048,
-            "temperature": 0.7,
-            "top_p": 0.9,
-            "do_sample": True,
-        }
     }
-    api_url = f"https://api-inference.huggingface.co/models/{model}"
     for attempt in range(max_retries + 1):
         try:
@@ -128,20 +125,13 @@ async def query_huggingface_model(
                 data = response.json()
-                # Handle different response formats
-                if isinstance(data, list) and len(data) > 0:
-                    content = data[0].get("generated_text", "")
-                    # Remove the prompt from the response
-                    if content.startswith(prompt):
-                        content = content[len(prompt):].strip()
-                elif isinstance(data, dict):
-                    content = data.get("generated_text", "")
-                    if content.startswith(prompt):
-                        content = content[len(prompt):].strip()
                 else:
-                    content = str(data)
-                return {"content": content}
         except httpx.TimeoutException as e:
             print(f"⏱️ Timeout querying HF {model} (attempt {attempt + 1}/{max_retries + 1})")
@@ -152,10 +142,10 @@ async def query_huggingface_model(
         except httpx.HTTPStatusError as e:
             error_msg = e.response.text
-            print(f"🚫 HTTP {e.response.status_code} querying HF {model}: {error_msg[:100]}")
             # Model is loading - retry with longer delay
-            if "loading" in error_msg.lower():
                 print(f"⏳ Model is loading, waiting 20s...")
                 await asyncio.sleep(20)
                 if attempt < max_retries:
@@ -180,35 +170,6 @@ async def query_huggingface_model(
     return None
-def format_messages_for_hf(messages: List[Dict[str, str]]) -> str:
-    """
-    Format chat messages for HuggingFace models.
-    Args:
-        messages: List of message dicts with 'role' and 'content'
-    Returns:
-        Formatted prompt string
-    """
-    # Use common chat template format
-    prompt = ""
-    for msg in messages:
-        role = msg["role"]
-        content = msg["content"]
-        if role == "system":
-            prompt += f"<|system|>\n{content}\n"
-        elif role == "user":
-            prompt += f"<|user|>\n{content}\n"
-        elif role == "assistant":
-            prompt += f"<|assistant|>\n{content}\n"
-    # Add assistant prefix for response
-    prompt += "<|assistant|>\n"
-    return prompt
 async def query_model(
     model_config: Dict[str, str],
     messages: List[Dict[str, str]],

     max_retries: int = MAX_RETRIES
 ) -> Optional[Dict[str, Any]]:
     """
+    Query a HuggingFace model via Router (FREE).
     Args:
         model: HuggingFace model ID (e.g., "meta-llama/Llama-3.3-70B-Instruct")
         "Content-Type": "application/json",
     }
+    # Use OpenAI-compatible format for HuggingFace Router
     payload = {
+        "model": model,
+        "messages": messages,
+        "max_tokens": 2048,
+        "temperature": 0.7,
+        "top_p": 0.9,
     }
+    # Updated to use router.huggingface.co (new endpoint)
+    api_url = "https://router.huggingface.co/v1/chat/completions"
     for attempt in range(max_retries + 1):
         try:
                 data = response.json()
+                # Parse OpenAI-compatible response format
+                if "choices" in data and len(data["choices"]) > 0:
+                    content = data["choices"][0]["message"]["content"]
+                    return {"content": content}
                 else:
+                    print(f"❌ Unexpected response format from HF {model}: {data}")
+                    return None
         except httpx.TimeoutException as e:
             print(f"⏱️ Timeout querying HF {model} (attempt {attempt + 1}/{max_retries + 1})")
         except httpx.HTTPStatusError as e:
             error_msg = e.response.text
+            print(f"🚫 HTTP {e.response.status_code} querying HF {model}: {error_msg[:200]}")
             # Model is loading - retry with longer delay
+            if "loading" in error_msg.lower() or "warming up" in error_msg.lower():
                 print(f"⏳ Model is loading, waiting 20s...")
                 await asyncio.sleep(20)
                 if attempt < max_retries:
     return None
 async def query_model(
     model_config: Dict[str, str],
     messages: List[Dict[str, str]],