Spaces:
Running
Running
| title: Api Embedding | |
| emoji: π | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| # π§ Unified Embedding API | |
| > π§© Unified API for all your Embedding, Sparse & Reranking Models β plug and play with any model from Hugging Face or your own fine-tuned versions. | |
| --- | |
| ## π Overview | |
| **Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, **sparse**, and **reranking** models. | |
| It's designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** β all controlled from a single `config.yaml` file. | |
| β οΈ **Note:** This is a development API. | |
| For production deployment, host it on cloud platforms such as **Hugging Face TEI**, **AWS**, **GCP**, or any cloud provider of your choice. | |
| --- | |
| ## π§© Features | |
| - π§ **Unified Interface** β One API to handle dense, sparse, and reranking models | |
| - β‘ **Batch Processing** β Automatic single/batch detection | |
| - π§ **Flexible Parameters** β Full control via kwargs and options | |
| - π **OpenAI Compatible** β Works with OpenAI client libraries | |
| - π **RAG Support** β Perfect base for Retrieval-Augmented Generation systems | |
| - β‘ **Fast & Lightweight** β Powered by FastAPI and optimized with async processing | |
| - π§° **Extendable** β Switch models instantly via `config.yaml` and add your own models effortlessly | |
| --- | |
| ## π Project Structure | |
| ``` | |
| unified-embedding-api/ | |
| βββ src/ | |
| β βββ api/ | |
| β β βββ dependencies.py | |
| β β βββ routes/ | |
| β β βββ embeddings.py # endpoint sparse & dense | |
| β β βββ models.py | |
| β β βββ health.py | |
| β β βββ rerank.py # endpoint reranking | |
| β βββ core/ | |
| β β βββ base.py | |
| β β βββ config.py | |
| β β βββ exceptions.py | |
| β β βββ manager.py | |
| β βββ models/ | |
| β β βββ embeddings/ | |
| β β β βββ dense.py # dense model | |
| β β β βββ sparse.py # sparse model | |
| β β β βββ rank.py # reranking model | |
| β β βββ schemas/ | |
| β β βββ common.py | |
| β β βββ requests.py | |
| β β βββ responses.py | |
| β βββ config/ | |
| β β βββ settings.py | |
| β β βββ models.yaml # add/change models here | |
| β βββ utils/ | |
| β βββ logger.py | |
| β βββ validators.py | |
| β | |
| βββ app.py | |
| βββ requirements.txt | |
| βββ LICENSE | |
| βββ Dockerfile | |
| βββ README.md | |
| ``` | |
| --- | |
| ## π§© Model Selection | |
| Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for model recommendations and memory usage reference. | |
| **Add More Models:** Edit `src/config/models.yaml` | |
| ```yaml | |
| models: | |
| your-model-name: | |
| name: "org/model-name" | |
| type: "embeddings" # or "sparse-embeddings" or "rerank" | |
| ``` | |
| β οΈ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space. | |
| --- | |
| ## βοΈ How to Deploy (Free π) | |
| Deploy your **Custom Embedding API** on **Hugging Face Spaces** β free, fast, and serverless. | |
| ### **1οΈβ£ Deploy on Hugging Face Spaces (Free!)** | |
| 1. **Duplicate this Space:** | |
| π [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding) | |
| Click **β―** (three dots) β **Duplicate this Space** | |
| 2. **Add HF_TOKEN environment variable**. Make sure your space is public | |
| 3. **Clone your Space locally:** | |
| Click **β―** β **Clone repository** | |
| ```bash | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding | |
| cd api-embedding | |
| ``` | |
| 4. **Edit `src/config/models.yaml`** to customize models: | |
| ```yaml | |
| models: | |
| your-model: | |
| name: "org/model-name" | |
| type: "embeddings" # or "sparse-embeddings" or "rerank" | |
| ``` | |
| 5. **Commit and push changes:** | |
| ```bash | |
| git add src/config/models.yaml | |
| git commit -m "Update models configuration" | |
| git push | |
| ``` | |
| 6. **Access your API:** | |
| Click **β―** β **Embed this Space** β copy **Direct URL** | |
| ``` | |
| https://YOUR_USERNAME-api-embedding.hf.space | |
| https://YOUR_USERNAME-api-embedding.hf.space/docs # Interactive docs | |
| ``` | |
| That's it! You now have a live embedding API endpoint powered by your models. | |
| ### **2οΈβ£ Run Locally (NOT RECOMMENDED)** | |
| ```bash | |
| # Clone repository | |
| git clone https://github.com/fahmiaziz98/unified-embedding-api.git | |
| cd unified-embedding-api | |
| # Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run server | |
| python app.py | |
| ``` | |
| API available at: `http://localhost:7860` | |
| ### **3οΈβ£ Run with Docker** | |
| ```bash | |
| # Build and run | |
| docker-compose up --build | |
| # Or with Docker only | |
| docker build -t embedding-api . | |
| docker run -p 7860:7860 embedding-api | |
| ``` | |
| --- | |
| ## π Usage Examples | |
| ### **Python with Native API** | |
| ```python | |
| import requests | |
| base_url = "https://fahmiaziz-api-embedding.hf.space/api/v1" | |
| # Single embedding | |
| response = requests.post(f"{base_url}/embeddings", json={ | |
| "input": "What is artificial intelligence?", | |
| "model": "qwen3-0.6b" | |
| }) | |
| embeddings = response.json()["data"] | |
| # Batch embeddings with options | |
| response = requests.post(f"{base_url}/embeddings", json={ | |
| "input": ["First document", "Second document", "Third document"], | |
| "model": "qwen3-0.6b", | |
| "options": { | |
| "normalize_embeddings": True | |
| } | |
| }) | |
| batch_embeddings = response.json()["data"] | |
| ``` | |
| ### **cURL** | |
| ```bash | |
| # Dense embeddings | |
| curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "input": ["Hello world"], | |
| "model": "qwen3-0.6b" | |
| }' | |
| # Sparse embeddings | |
| curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embed_sparse" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "input": ["First doc", "Second doc", "Third doc"], | |
| "model": "splade-pp-v2" | |
| }' | |
| # Reranking | |
| curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "query": "Python for data science", | |
| "documents": [ | |
| "Python is great for data science", | |
| "Java is used for enterprise apps", | |
| "R is for statistical analysis" | |
| ], | |
| "model": "bge-v2-m3", | |
| "top_k": 2 | |
| }' | |
| ``` | |
| ### **JavaScript/TypeScript** | |
| ```typescript | |
| const baseUrl = "https://fahmiaziz-api-embedding.hf.space/api/v1"; | |
| // Using fetch | |
| const response = await fetch(`${baseUrl}/embeddings`, { | |
| method: "POST", | |
| headers: { "Content-Type": "application/json" }, | |
| body: JSON.stringify({ | |
| texts: ["Hello world"], | |
| model_id: "qwen3-0.6b", | |
| }), | |
| }); | |
| const { embeddings } = await response.json(); | |
| console.log(embeddings); | |
| ``` | |
| --- | |
| ## π API Endpoints | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/api/v1/embeddings` | POST | Generate embeddings (OpenAI compatible) | | |
| | `/api/v1/embed_sparse` | POST | Generate sparse embeddings | | |
| | `/api/v1/rerank` | POST | Rerank documents by relevance | | |
| | `/api/v1/models` | GET | List available models | | |
| | `/api/v1/models/{model_id}` | GET | Get model information | | |
| | `/health` | GET | Health check | | |
| | `/` | GET | API information | | |
| | `/docs` | GET | Interactive API documentation | | |
| --- | |
| ## π OpenAI Client Compatibility | |
| This API is **fully compatible** with OpenAI's client libraries, making it a drop-in replacement for OpenAI's embedding API. | |
| ### **Why use OpenAI client?** | |
| β **Familiar API** β Same interface as OpenAI | |
| β **Type Safety** β Full type hints and IDE support | |
| β **Error Handling** β Built-in retry logic and error handling | |
| β **Async Support** β Native async/await support | |
| β **Easy Migration** β Switch between OpenAI and self-hosted seamlessly | |
| ### **Supported Features** | |
| | Feature | Supported | Notes | | |
| |---------|-----------|-------| | |
| | `embeddings.create()` | β Yes | Single and batch inputs | | |
| | `input` as string | β Yes | Auto-converted to list | | |
| | `input` as list | β Yes | Batch processing | | |
| | `model` parameter | β Yes | Use your model IDs | | |
| | `encoding_format` | β οΈ Partial | Always returns `float` | | |
| ### **Example with OpenAI Client (Compatible!)** | |
| ```python | |
| from openai import OpenAI | |
| # Initialize client with your API endpoint | |
| client = OpenAI( | |
| base_url="https://fahmiaziz-api-embedding.hf.space/api/v1", | |
| api_key="-" # API key not required, but must be present | |
| ) | |
| # Generate embeddings | |
| embedding = client.embeddings.create( | |
| input="Hello", | |
| model="qwen3-0.6b" | |
| ) | |
| # Access results | |
| for item in embedding.data: | |
| print(f"Embedding: {item.embedding[:5]}...") # First 5 dimensions | |
| print(f"Index: {item.index}") | |
| ``` | |
| ### **Async OpenAI Client** | |
| ```python | |
| from openai import AsyncOpenAI | |
| # Initialize async client | |
| client = AsyncOpenAI( | |
| base_url="https://fahmiaziz-api-embedding.hf.space/api/v1", | |
| api_key="-" | |
| ) | |
| # Generate embeddings asynchronously | |
| async def get_embeddings(): | |
| try: | |
| embedding = await client.embeddings.create( | |
| input=["Hello", "World", "AI"], | |
| model="qwen3-0.6b" | |
| ) | |
| return embedding | |
| except Exception as e: | |
| print(f"Error: {e}") | |
| # Use in async context | |
| embeddings = await get_embeddings() | |
| ``` | |
| --- | |
| ## π€ Contributing | |
| Contributions are welcome! Please: | |
| 1. Fork the repository | |
| 2. Create a feature branch (`git checkout -b feature/amazing-feature`) | |
| 3. Commit your changes (`git commit -m 'Add amazing feature'`) | |
| 4. Push to the branch (`git push origin feature/amazing-feature`) | |
| 5. Open a Pull Request | |
| --- | |
| ## π Resources | |
| - [API Documentation](API.md) | |
| - [Sentence Transformers](https://www.sbert.net/) | |
| - [FastAPI Docs](https://fastapi.tiangolo.com/) | |
| - [OpenAI Python Client](https://github.com/openai/openai-python) | |
| - [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) | |
| - [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces) | |
| - [Deploy Applications on Hugging Face Spaces](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces) | |
| - [Sync HF Spaces with GitHub](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository) | |
| - [Duplicate & Clone Spaces](https://huggingface.co/docs/hub/spaces-overview#duplicating-a-space) | |
| --- | |
| ## π License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
| --- | |
| ## π Acknowledgments | |
| - **Sentence Transformers** for the embedding models | |
| - **FastAPI** for the excellent web framework | |
| - **Hugging Face** for model hosting and Spaces | |
| - **OpenAI** for the client library design | |
| - **Open Source Community** for inspiration and support | |
| --- | |
| ## π Support | |
| - **Issues:** [GitHub Issues](https://github.com/fahmiaziz98/unified-embedding-api/issues) | |
| - **Discussions:** [GitHub Discussions](https://github.com/fahmiaziz98/unified-embedding-api/discussions) | |
| - **Hugging Face Space:** [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding) | |
| --- | |
| <div align="center"> | |
| Made with β€οΈ by the Open-Source Community | |
| > β¨ "Unify your embeddings. Simplify your AI stack." | |
| </div> |