api-embedding / README.md
fahmiaziz98
init README
e6b4aad
|
raw
history blame
11.3 kB
metadata
title: Api Embedding
emoji: 🐠
colorFrom: green
colorTo: purple
sdk: docker
pinned: false

🧠 Unified Embedding API

🧩 Unified API for all your Embedding, Sparse & Reranking Models β€” plug and play with any model from Hugging Face or your own fine-tuned versions.


πŸš€ Overview

Unified Embedding API is a modular and open-source RAG-ready API built for developers who want a simple, unified way to access dense, sparse, and reranking models.

It's designed for vector search, semantic retrieval, and AI-powered pipelines β€” all controlled from a single config.yaml file.

⚠️ Note: This is a development API.
For production deployment, host it on cloud platforms such as Hugging Face TEI, AWS, GCP, or any cloud provider of your choice.


🧩 Features

  • 🧠 Unified Interface β€” One API to handle dense, sparse, and reranking models
  • ⚑ Batch Processing β€” Automatic single/batch detection
  • πŸ”§ Flexible Parameters β€” Full control via kwargs and options
  • πŸ”Œ OpenAI Compatible β€” Works with OpenAI client libraries
  • πŸ“ˆ RAG Support β€” Perfect base for Retrieval-Augmented Generation systems
  • ⚑ Fast & Lightweight β€” Powered by FastAPI and optimized with async processing
  • 🧰 Extendable β€” Switch models instantly via config.yaml and add your own models effortlessly

πŸ“ Project Structure

unified-embedding-api/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ dependencies.py
β”‚   β”‚   └── routes/
β”‚   β”‚       β”œβ”€β”€ embeddings.py  # endpoint sparse & dense   
β”‚   β”‚       β”œβ”€β”€ models.py
β”‚   β”‚       β”œβ”€β”€ health.py
β”‚   β”‚       └── rerank.py      # endpoint reranking
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ base.py
β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”œβ”€β”€ exceptions.py
β”‚   β”‚   └── manager.py
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ embeddings/
β”‚   β”‚   β”‚   β”œβ”€β”€ dense.py       # dense model
β”‚   β”‚   β”‚   β”œβ”€β”€ sparse.py      # sparse model
β”‚   β”‚   β”‚   └── rank.py        # reranking model
β”‚   β”‚   └── schemas/
β”‚   β”‚       β”œβ”€β”€ common.py
β”‚   β”‚       β”œβ”€β”€ requests.py       
β”‚   β”‚       └── responses.py
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ settings.py
β”‚   β”‚   └── models.yaml        # add/change models here
β”‚   └── utils/
β”‚       β”œβ”€β”€ logger.py
β”‚       └── validators.py
β”‚
β”œβ”€β”€ app.py                         
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ Dockerfile
└── README.md

🧩 Model Selection

Default configuration is optimized for CPU 2vCPU / 16GB RAM. See MTEB Leaderboard for model recommendations and memory usage reference.

Add More Models: Edit src/config/models.yaml

models:
  your-model-name:
    name: "org/model-name"
    type: "embeddings"  # or "sparse-embeddings" or "rerank"

⚠️ If you plan to use larger models like Qwen2-embedding-8B, please upgrade your Space.


☁️ How to Deploy (Free πŸš€)

Deploy your Custom Embedding API on Hugging Face Spaces β€” free, fast, and serverless.

1️⃣ Deploy on Hugging Face Spaces (Free!)

  1. Duplicate this Space:
    πŸ‘‰ fahmiaziz/api-embedding
    Click β‹― (three dots) β†’ Duplicate this Space

  2. Add HF_TOKEN environment variable. Make sure your space is public

  3. Clone your Space locally:
    Click β‹― β†’ Clone repository

    git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
    cd api-embedding
    
  4. Edit src/config/models.yaml to customize models:

    models:
      your-model:
        name: "org/model-name"
        type: "embeddings"  # or "sparse-embeddings" or "rerank"
    
  5. Commit and push changes:

    git add src/config/models.yaml
    git commit -m "Update models configuration"
    git push
    
  6. Access your API:
    Click β‹― β†’ Embed this Space β†’ copy Direct URL

    https://YOUR_USERNAME-api-embedding.hf.space
    https://YOUR_USERNAME-api-embedding.hf.space/docs  # Interactive docs
    

That's it! You now have a live embedding API endpoint powered by your models.

2️⃣ Run Locally (NOT RECOMMENDED)

# Clone repository
git clone https://github.com/fahmiaziz98/unified-embedding-api.git
cd unified-embedding-api

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run server
python app.py

API available at: http://localhost:7860

3️⃣ Run with Docker

# Build and run
docker-compose up --build

# Or with Docker only
docker build -t embedding-api .
docker run -p 7860:7860 embedding-api

πŸ“– Usage Examples

Python with Native API

import requests

base_url = "https://fahmiaziz-api-embedding.hf.space/api/v1"

# Single embedding
response = requests.post(f"{base_url}/embeddings", json={
    "input": "What is artificial intelligence?",
    "model": "qwen3-0.6b"
})
embeddings = response.json()["data"]

# Batch embeddings with options
response = requests.post(f"{base_url}/embeddings", json={
    "input": ["First document", "Second document", "Third document"],
    "model": "qwen3-0.6b",
    "options": {
        "normalize_embeddings": True
    }
})
batch_embeddings = response.json()["data"]

cURL

# Dense embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world"],
    "model": "qwen3-0.6b"
  }'

# Sparse embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embed_sparse" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["First doc", "Second doc", "Third doc"],
    "model": "splade-pp-v2"
  }'

# Reranking
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Python for data science",
    "documents": [
      "Python is great for data science",
      "Java is used for enterprise apps",
      "R is for statistical analysis"
    ],
    "model": "bge-v2-m3",
    "top_k": 2
  }'

JavaScript/TypeScript

const baseUrl = "https://fahmiaziz-api-embedding.hf.space/api/v1";

// Using fetch
const response = await fetch(`${baseUrl}/embeddings`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    texts: ["Hello world"],
    model_id: "qwen3-0.6b",
  }),
});

const { embeddings } = await response.json();
console.log(embeddings);

πŸ“Š API Endpoints

Endpoint Method Description
/api/v1/embeddings POST Generate embeddings (OpenAI compatible)
/api/v1/embed_sparse POST Generate sparse embeddings
/api/v1/rerank POST Rerank documents by relevance
/api/v1/models GET List available models
/api/v1/models/{model_id} GET Get model information
/health GET Health check
/ GET API information
/docs GET Interactive API documentation

πŸ”Œ OpenAI Client Compatibility

This API is fully compatible with OpenAI's client libraries, making it a drop-in replacement for OpenAI's embedding API.

Why use OpenAI client?

βœ… Familiar API β€” Same interface as OpenAI
βœ… Type Safety β€” Full type hints and IDE support
βœ… Error Handling β€” Built-in retry logic and error handling
βœ… Async Support β€” Native async/await support
βœ… Easy Migration β€” Switch between OpenAI and self-hosted seamlessly

Supported Features

Feature Supported Notes
embeddings.create() βœ… Yes Single and batch inputs
input as string βœ… Yes Auto-converted to list
input as list βœ… Yes Batch processing
model parameter βœ… Yes Use your model IDs
encoding_format ⚠️ Partial Always returns float

Example with OpenAI Client (Compatible!)

from openai import OpenAI

# Initialize client with your API endpoint
client = OpenAI(
    base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
    api_key="-"  # API key not required, but must be present
)

# Generate embeddings
embedding = client.embeddings.create(
    input="Hello",
    model="qwen3-0.6b"
)

# Access results
for item in embedding.data:
    print(f"Embedding: {item.embedding[:5]}...")  # First 5 dimensions
    print(f"Index: {item.index}")

Async OpenAI Client

from openai import AsyncOpenAI

# Initialize async client
client = AsyncOpenAI(
    base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
    api_key="-"
)

# Generate embeddings asynchronously
async def get_embeddings():
    try:
        embedding = await client.embeddings.create(
            input=["Hello", "World", "AI"],
            model="qwen3-0.6b"
        )
        return embedding
    except Exception as e:
        print(f"Error: {e}")

# Use in async context
embeddings = await get_embeddings()

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“š Resources


πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Sentence Transformers for the embedding models
  • FastAPI for the excellent web framework
  • Hugging Face for model hosting and Spaces
  • OpenAI for the client library design
  • Open Source Community for inspiration and support

πŸ“ž Support


Made with ❀️ by the Open-Source Community

✨ "Unify your embeddings. Simplify your AI stack."