api-embedding / README.md
fahmiaziz98
init README
e6b4aad
|
raw
history blame
11.3 kB
---
title: Api Embedding
emoji: 🐠
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
---
# 🧠 Unified Embedding API
> 🧩 Unified API for all your Embedding, Sparse & Reranking Models β€” plug and play with any model from Hugging Face or your own fine-tuned versions.
---
## πŸš€ Overview
**Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, **sparse**, and **reranking** models.
It's designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** β€” all controlled from a single `config.yaml` file.
⚠️ **Note:** This is a development API.
For production deployment, host it on cloud platforms such as **Hugging Face TEI**, **AWS**, **GCP**, or any cloud provider of your choice.
---
## 🧩 Features
- 🧠 **Unified Interface** β€” One API to handle dense, sparse, and reranking models
- ⚑ **Batch Processing** β€” Automatic single/batch detection
- πŸ”§ **Flexible Parameters** β€” Full control via kwargs and options
- πŸ”Œ **OpenAI Compatible** β€” Works with OpenAI client libraries
- πŸ“ˆ **RAG Support** β€” Perfect base for Retrieval-Augmented Generation systems
- ⚑ **Fast & Lightweight** β€” Powered by FastAPI and optimized with async processing
- 🧰 **Extendable** β€” Switch models instantly via `config.yaml` and add your own models effortlessly
---
## πŸ“ Project Structure
```
unified-embedding-api/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ api/
β”‚ β”‚ β”œβ”€β”€ dependencies.py
β”‚ β”‚ └── routes/
β”‚ β”‚ β”œβ”€β”€ embeddings.py # endpoint sparse & dense
β”‚ β”‚ β”œβ”€β”€ models.py
β”‚ β”‚ β”œβ”€β”€ health.py
β”‚ β”‚ └── rerank.py # endpoint reranking
β”‚ β”œβ”€β”€ core/
β”‚ β”‚ β”œβ”€β”€ base.py
β”‚ β”‚ β”œβ”€β”€ config.py
β”‚ β”‚ β”œβ”€β”€ exceptions.py
β”‚ β”‚ └── manager.py
β”‚ β”œβ”€β”€ models/
β”‚ β”‚ β”œβ”€β”€ embeddings/
β”‚ β”‚ β”‚ β”œβ”€β”€ dense.py # dense model
β”‚ β”‚ β”‚ β”œβ”€β”€ sparse.py # sparse model
β”‚ β”‚ β”‚ └── rank.py # reranking model
β”‚ β”‚ └── schemas/
β”‚ β”‚ β”œβ”€β”€ common.py
β”‚ β”‚ β”œβ”€β”€ requests.py
β”‚ β”‚ └── responses.py
β”‚ β”œβ”€β”€ config/
β”‚ β”‚ β”œβ”€β”€ settings.py
β”‚ β”‚ └── models.yaml # add/change models here
β”‚ └── utils/
β”‚ β”œβ”€β”€ logger.py
β”‚ └── validators.py
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ Dockerfile
└── README.md
```
---
## 🧩 Model Selection
Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for model recommendations and memory usage reference.
**Add More Models:** Edit `src/config/models.yaml`
```yaml
models:
your-model-name:
name: "org/model-name"
type: "embeddings" # or "sparse-embeddings" or "rerank"
```
⚠️ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space.
---
## ☁️ How to Deploy (Free πŸš€)
Deploy your **Custom Embedding API** on **Hugging Face Spaces** β€” free, fast, and serverless.
### **1️⃣ Deploy on Hugging Face Spaces (Free!)**
1. **Duplicate this Space:**
πŸ‘‰ [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
Click **β‹―** (three dots) β†’ **Duplicate this Space**
2. **Add HF_TOKEN environment variable**. Make sure your space is public
3. **Clone your Space locally:**
Click **β‹―** β†’ **Clone repository**
```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
cd api-embedding
```
4. **Edit `src/config/models.yaml`** to customize models:
```yaml
models:
your-model:
name: "org/model-name"
type: "embeddings" # or "sparse-embeddings" or "rerank"
```
5. **Commit and push changes:**
```bash
git add src/config/models.yaml
git commit -m "Update models configuration"
git push
```
6. **Access your API:**
Click **β‹―** β†’ **Embed this Space** β†’ copy **Direct URL**
```
https://YOUR_USERNAME-api-embedding.hf.space
https://YOUR_USERNAME-api-embedding.hf.space/docs # Interactive docs
```
That's it! You now have a live embedding API endpoint powered by your models.
### **2️⃣ Run Locally (NOT RECOMMENDED)**
```bash
# Clone repository
git clone https://github.com/fahmiaziz98/unified-embedding-api.git
cd unified-embedding-api
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run server
python app.py
```
API available at: `http://localhost:7860`
### **3️⃣ Run with Docker**
```bash
# Build and run
docker-compose up --build
# Or with Docker only
docker build -t embedding-api .
docker run -p 7860:7860 embedding-api
```
---
## πŸ“– Usage Examples
### **Python with Native API**
```python
import requests
base_url = "https://fahmiaziz-api-embedding.hf.space/api/v1"
# Single embedding
response = requests.post(f"{base_url}/embeddings", json={
"input": "What is artificial intelligence?",
"model": "qwen3-0.6b"
})
embeddings = response.json()["data"]
# Batch embeddings with options
response = requests.post(f"{base_url}/embeddings", json={
"input": ["First document", "Second document", "Third document"],
"model": "qwen3-0.6b",
"options": {
"normalize_embeddings": True
}
})
batch_embeddings = response.json()["data"]
```
### **cURL**
```bash
# Dense embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{
"input": ["Hello world"],
"model": "qwen3-0.6b"
}'
# Sparse embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embed_sparse" \
-H "Content-Type: application/json" \
-d '{
"input": ["First doc", "Second doc", "Third doc"],
"model": "splade-pp-v2"
}'
# Reranking
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank" \
-H "Content-Type: application/json" \
-d '{
"query": "Python for data science",
"documents": [
"Python is great for data science",
"Java is used for enterprise apps",
"R is for statistical analysis"
],
"model": "bge-v2-m3",
"top_k": 2
}'
```
### **JavaScript/TypeScript**
```typescript
const baseUrl = "https://fahmiaziz-api-embedding.hf.space/api/v1";
// Using fetch
const response = await fetch(`${baseUrl}/embeddings`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
texts: ["Hello world"],
model_id: "qwen3-0.6b",
}),
});
const { embeddings } = await response.json();
console.log(embeddings);
```
---
## πŸ“Š API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/embeddings` | POST | Generate embeddings (OpenAI compatible) |
| `/api/v1/embed_sparse` | POST | Generate sparse embeddings |
| `/api/v1/rerank` | POST | Rerank documents by relevance |
| `/api/v1/models` | GET | List available models |
| `/api/v1/models/{model_id}` | GET | Get model information |
| `/health` | GET | Health check |
| `/` | GET | API information |
| `/docs` | GET | Interactive API documentation |
---
## πŸ”Œ OpenAI Client Compatibility
This API is **fully compatible** with OpenAI's client libraries, making it a drop-in replacement for OpenAI's embedding API.
### **Why use OpenAI client?**
βœ… **Familiar API** β€” Same interface as OpenAI
βœ… **Type Safety** β€” Full type hints and IDE support
βœ… **Error Handling** β€” Built-in retry logic and error handling
βœ… **Async Support** β€” Native async/await support
βœ… **Easy Migration** β€” Switch between OpenAI and self-hosted seamlessly
### **Supported Features**
| Feature | Supported | Notes |
|---------|-----------|-------|
| `embeddings.create()` | βœ… Yes | Single and batch inputs |
| `input` as string | βœ… Yes | Auto-converted to list |
| `input` as list | βœ… Yes | Batch processing |
| `model` parameter | βœ… Yes | Use your model IDs |
| `encoding_format` | ⚠️ Partial | Always returns `float` |
### **Example with OpenAI Client (Compatible!)**
```python
from openai import OpenAI
# Initialize client with your API endpoint
client = OpenAI(
base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
api_key="-" # API key not required, but must be present
)
# Generate embeddings
embedding = client.embeddings.create(
input="Hello",
model="qwen3-0.6b"
)
# Access results
for item in embedding.data:
print(f"Embedding: {item.embedding[:5]}...") # First 5 dimensions
print(f"Index: {item.index}")
```
### **Async OpenAI Client**
```python
from openai import AsyncOpenAI
# Initialize async client
client = AsyncOpenAI(
base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
api_key="-"
)
# Generate embeddings asynchronously
async def get_embeddings():
try:
embedding = await client.embeddings.create(
input=["Hello", "World", "AI"],
model="qwen3-0.6b"
)
return embedding
except Exception as e:
print(f"Error: {e}")
# Use in async context
embeddings = await get_embeddings()
```
---
## 🀝 Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
---
## πŸ“š Resources
- [API Documentation](API.md)
- [Sentence Transformers](https://www.sbert.net/)
- [FastAPI Docs](https://fastapi.tiangolo.com/)
- [OpenAI Python Client](https://github.com/openai/openai-python)
- [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
- [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces)
- [Deploy Applications on Hugging Face Spaces](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces)
- [Sync HF Spaces with GitHub](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository)
- [Duplicate & Clone Spaces](https://huggingface.co/docs/hub/spaces-overview#duplicating-a-space)
---
## πŸ“ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## πŸ™ Acknowledgments
- **Sentence Transformers** for the embedding models
- **FastAPI** for the excellent web framework
- **Hugging Face** for model hosting and Spaces
- **OpenAI** for the client library design
- **Open Source Community** for inspiration and support
---
## πŸ“ž Support
- **Issues:** [GitHub Issues](https://github.com/fahmiaziz98/unified-embedding-api/issues)
- **Discussions:** [GitHub Discussions](https://github.com/fahmiaziz98/unified-embedding-api/discussions)
- **Hugging Face Space:** [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
---
<div align="center">
Made with ❀️ by the Open-Source Community
> ✨ "Unify your embeddings. Simplify your AI stack."
</div>