Spaces:
Running
title: Api Embedding
emoji: π
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
π§ Unified Embedding API
π§© Unified API for all your Embedding, Sparse & Reranking Models β plug and play with any model from Hugging Face or your own fine-tuned versions.
π Overview
Unified Embedding API is a modular and open-source RAG-ready API built for developers who want a simple, unified way to access dense, sparse, and reranking models.
It's designed for vector search, semantic retrieval, and AI-powered pipelines β all controlled from a single config.yaml file.
β οΈ Note: This is a development API.
For production deployment, host it on cloud platforms such as Hugging Face TEI, AWS, GCP, or any cloud provider of your choice.
π§© Features
- π§ Unified Interface β One API to handle dense, sparse, and reranking models
- β‘ Batch Processing β Automatic single/batch detection
- π§ Flexible Parameters β Full control via kwargs and options
- π OpenAI Compatible β Works with OpenAI client libraries
- π RAG Support β Perfect base for Retrieval-Augmented Generation systems
- β‘ Fast & Lightweight β Powered by FastAPI and optimized with async processing
- π§° Extendable β Switch models instantly via
config.yamland add your own models effortlessly
π Project Structure
unified-embedding-api/
βββ src/
β βββ api/
β β βββ dependencies.py
β β βββ routes/
β β βββ embeddings.py # endpoint sparse & dense
β β βββ models.py
β β βββ health.py
β β βββ rerank.py # endpoint reranking
β βββ core/
β β βββ base.py
β β βββ config.py
β β βββ exceptions.py
β β βββ manager.py
β βββ models/
β β βββ embeddings/
β β β βββ dense.py # dense model
β β β βββ sparse.py # sparse model
β β β βββ rank.py # reranking model
β β βββ schemas/
β β βββ common.py
β β βββ requests.py
β β βββ responses.py
β βββ config/
β β βββ settings.py
β β βββ models.yaml # add/change models here
β βββ utils/
β βββ logger.py
β βββ validators.py
β
βββ app.py
βββ requirements.txt
βββ LICENSE
βββ Dockerfile
βββ README.md
π§© Model Selection
Default configuration is optimized for CPU 2vCPU / 16GB RAM. See MTEB Leaderboard for model recommendations and memory usage reference.
Add More Models: Edit src/config/models.yaml
models:
your-model-name:
name: "org/model-name"
type: "embeddings" # or "sparse-embeddings" or "rerank"
β οΈ If you plan to use larger models like Qwen2-embedding-8B, please upgrade your Space.
βοΈ How to Deploy (Free π)
Deploy your Custom Embedding API on Hugging Face Spaces β free, fast, and serverless.
1οΈβ£ Deploy on Hugging Face Spaces (Free!)
Duplicate this Space:
π fahmiaziz/api-embedding
Click β― (three dots) β Duplicate this SpaceAdd HF_TOKEN environment variable. Make sure your space is public
Clone your Space locally:
Click β― β Clone repositorygit clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding cd api-embeddingEdit
src/config/models.yamlto customize models:models: your-model: name: "org/model-name" type: "embeddings" # or "sparse-embeddings" or "rerank"Commit and push changes:
git add src/config/models.yaml git commit -m "Update models configuration" git pushAccess your API:
Click β― β Embed this Space β copy Direct URLhttps://YOUR_USERNAME-api-embedding.hf.space https://YOUR_USERNAME-api-embedding.hf.space/docs # Interactive docs
That's it! You now have a live embedding API endpoint powered by your models.
2οΈβ£ Run Locally (NOT RECOMMENDED)
# Clone repository
git clone https://github.com/fahmiaziz98/unified-embedding-api.git
cd unified-embedding-api
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run server
python app.py
API available at: http://localhost:7860
3οΈβ£ Run with Docker
# Build and run
docker-compose up --build
# Or with Docker only
docker build -t embedding-api .
docker run -p 7860:7860 embedding-api
π Usage Examples
Python with Native API
import requests
base_url = "https://fahmiaziz-api-embedding.hf.space/api/v1"
# Single embedding
response = requests.post(f"{base_url}/embeddings", json={
"input": "What is artificial intelligence?",
"model": "qwen3-0.6b"
})
embeddings = response.json()["data"]
# Batch embeddings with options
response = requests.post(f"{base_url}/embeddings", json={
"input": ["First document", "Second document", "Third document"],
"model": "qwen3-0.6b",
"options": {
"normalize_embeddings": True
}
})
batch_embeddings = response.json()["data"]
cURL
# Dense embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{
"input": ["Hello world"],
"model": "qwen3-0.6b"
}'
# Sparse embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embed_sparse" \
-H "Content-Type: application/json" \
-d '{
"input": ["First doc", "Second doc", "Third doc"],
"model": "splade-pp-v2"
}'
# Reranking
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank" \
-H "Content-Type: application/json" \
-d '{
"query": "Python for data science",
"documents": [
"Python is great for data science",
"Java is used for enterprise apps",
"R is for statistical analysis"
],
"model": "bge-v2-m3",
"top_k": 2
}'
JavaScript/TypeScript
const baseUrl = "https://fahmiaziz-api-embedding.hf.space/api/v1";
// Using fetch
const response = await fetch(`${baseUrl}/embeddings`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
texts: ["Hello world"],
model_id: "qwen3-0.6b",
}),
});
const { embeddings } = await response.json();
console.log(embeddings);
π API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/v1/embeddings |
POST | Generate embeddings (OpenAI compatible) |
/api/v1/embed_sparse |
POST | Generate sparse embeddings |
/api/v1/rerank |
POST | Rerank documents by relevance |
/api/v1/models |
GET | List available models |
/api/v1/models/{model_id} |
GET | Get model information |
/health |
GET | Health check |
/ |
GET | API information |
/docs |
GET | Interactive API documentation |
π OpenAI Client Compatibility
This API is fully compatible with OpenAI's client libraries, making it a drop-in replacement for OpenAI's embedding API.
Why use OpenAI client?
β
Familiar API β Same interface as OpenAI
β
Type Safety β Full type hints and IDE support
β
Error Handling β Built-in retry logic and error handling
β
Async Support β Native async/await support
β
Easy Migration β Switch between OpenAI and self-hosted seamlessly
Supported Features
| Feature | Supported | Notes |
|---|---|---|
embeddings.create() |
β Yes | Single and batch inputs |
input as string |
β Yes | Auto-converted to list |
input as list |
β Yes | Batch processing |
model parameter |
β Yes | Use your model IDs |
encoding_format |
β οΈ Partial | Always returns float |
Example with OpenAI Client (Compatible!)
from openai import OpenAI
# Initialize client with your API endpoint
client = OpenAI(
base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
api_key="-" # API key not required, but must be present
)
# Generate embeddings
embedding = client.embeddings.create(
input="Hello",
model="qwen3-0.6b"
)
# Access results
for item in embedding.data:
print(f"Embedding: {item.embedding[:5]}...") # First 5 dimensions
print(f"Index: {item.index}")
Async OpenAI Client
from openai import AsyncOpenAI
# Initialize async client
client = AsyncOpenAI(
base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
api_key="-"
)
# Generate embeddings asynchronously
async def get_embeddings():
try:
embedding = await client.embeddings.create(
input=["Hello", "World", "AI"],
model="qwen3-0.6b"
)
return embedding
except Exception as e:
print(f"Error: {e}")
# Use in async context
embeddings = await get_embeddings()
π€ Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
π Resources
- API Documentation
- Sentence Transformers
- FastAPI Docs
- OpenAI Python Client
- MTEB Leaderboard
- Hugging Face Spaces
- Deploy Applications on Hugging Face Spaces
- Sync HF Spaces with GitHub
- Duplicate & Clone Spaces
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Sentence Transformers for the embedding models
- FastAPI for the excellent web framework
- Hugging Face for model hosting and Spaces
- OpenAI for the client library design
- Open Source Community for inspiration and support
π Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Hugging Face Space: fahmiaziz/api-embedding
Made with β€οΈ by the Open-Source Community
β¨ "Unify your embeddings. Simplify your AI stack."