Spaces:

fahmiaziz
/

api-embedding

Running

App Files Files Community

api-embedding / README.md

fahmiaziz98

init README

e6b4aad about 1 month ago

preview code

raw

history blame

11.3 kB

metadata

title: Api Embedding
emoji: 🐠
colorFrom: green
colorTo: purple
sdk: docker
pinned: false

🧠 Unified Embedding API

🧩 Unified API for all your Embedding, Sparse & Reranking Models — plug and play with any model from Hugging Face or your own fine-tuned versions.

🚀 Overview

Unified Embedding API is a modular and open-source RAG-ready API built for developers who want a simple, unified way to access dense, sparse, and reranking models.

It's designed for vector search, semantic retrieval, and AI-powered pipelines — all controlled from a single config.yaml file.

⚠️ Note: This is a development API.
For production deployment, host it on cloud platforms such as Hugging Face TEI, AWS, GCP, or any cloud provider of your choice.

🧩 Features

🧠 Unified Interface — One API to handle dense, sparse, and reranking models
⚡ Batch Processing — Automatic single/batch detection
🔧 Flexible Parameters — Full control via kwargs and options
🔌 OpenAI Compatible — Works with OpenAI client libraries
📈 RAG Support — Perfect base for Retrieval-Augmented Generation systems
⚡ Fast & Lightweight — Powered by FastAPI and optimized with async processing
🧰 Extendable — Switch models instantly via config.yaml and add your own models effortlessly

📁 Project Structure

unified-embedding-api/
├── src/
│   ├── api/
│   │   ├── dependencies.py
│   │   └── routes/
│   │       ├── embeddings.py  # endpoint sparse & dense   
│   │       ├── models.py
│   │       ├── health.py
│   │       └── rerank.py      # endpoint reranking
│   ├── core/
│   │   ├── base.py
│   │   ├── config.py
│   │   ├── exceptions.py
│   │   └── manager.py
│   ├── models/
│   │   ├── embeddings/
│   │   │   ├── dense.py       # dense model
│   │   │   ├── sparse.py      # sparse model
│   │   │   └── rank.py        # reranking model
│   │   └── schemas/
│   │       ├── common.py
│   │       ├── requests.py       
│   │       └── responses.py
│   ├── config/
│   │   ├── settings.py
│   │   └── models.yaml        # add/change models here
│   └── utils/
│       ├── logger.py
│       └── validators.py
│
├── app.py                         
├── requirements.txt
├── LICENSE
├── Dockerfile
└── README.md

🧩 Model Selection

Default configuration is optimized for CPU 2vCPU / 16GB RAM. See MTEB Leaderboard for model recommendations and memory usage reference.

Add More Models: Edit src/config/models.yaml

models:
  your-model-name:
    name: "org/model-name"
    type: "embeddings"  # or "sparse-embeddings" or "rerank"

⚠️ If you plan to use larger models like Qwen2-embedding-8B, please upgrade your Space.

☁️ How to Deploy (Free 🚀)

Deploy your Custom Embedding API on Hugging Face Spaces — free, fast, and serverless.

1️⃣ Deploy on Hugging Face Spaces (Free!)

Duplicate this Space:
👉 fahmiaziz/api-embedding
Click ⋯ (three dots) → Duplicate this Space
Add HF_TOKEN environment variable. Make sure your space is public

Clone your Space locally:
Click ⋯ → Clone repository

git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
cd api-embedding

Edit src/config/models.yaml to customize models:

models:
  your-model:
    name: "org/model-name"
    type: "embeddings"  # or "sparse-embeddings" or "rerank"

Commit and push changes:

git add src/config/models.yaml
git commit -m "Update models configuration"
git push

Access your API:
Click ⋯ → Embed this Space → copy Direct URL

https://YOUR_USERNAME-api-embedding.hf.space
https://YOUR_USERNAME-api-embedding.hf.space/docs  # Interactive docs

That's it! You now have a live embedding API endpoint powered by your models.

2️⃣ Run Locally (NOT RECOMMENDED)

# Clone repository
git clone https://github.com/fahmiaziz98/unified-embedding-api.git
cd unified-embedding-api

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run server
python app.py

API available at: http://localhost:7860

3️⃣ Run with Docker

# Build and run
docker-compose up --build

# Or with Docker only
docker build -t embedding-api .
docker run -p 7860:7860 embedding-api

📖 Usage Examples

Python with Native API

import requests

base_url = "https://fahmiaziz-api-embedding.hf.space/api/v1"

# Single embedding
response = requests.post(f"{base_url}/embeddings", json={
    "input": "What is artificial intelligence?",
    "model": "qwen3-0.6b"
})
embeddings = response.json()["data"]

# Batch embeddings with options
response = requests.post(f"{base_url}/embeddings", json={
    "input": ["First document", "Second document", "Third document"],
    "model": "qwen3-0.6b",
    "options": {
        "normalize_embeddings": True
    }
})
batch_embeddings = response.json()["data"]

cURL

# Dense embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world"],
    "model": "qwen3-0.6b"
  }'

# Sparse embeddings
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/embed_sparse" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["First doc", "Second doc", "Third doc"],
    "model": "splade-pp-v2"
  }'

# Reranking
curl -X POST "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Python for data science",
    "documents": [
      "Python is great for data science",
      "Java is used for enterprise apps",
      "R is for statistical analysis"
    ],
    "model": "bge-v2-m3",
    "top_k": 2
  }'

JavaScript/TypeScript

const baseUrl = "https://fahmiaziz-api-embedding.hf.space/api/v1";

// Using fetch
const response = await fetch(`${baseUrl}/embeddings`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    texts: ["Hello world"],
    model_id: "qwen3-0.6b",
  }),
});

const { embeddings } = await response.json();
console.log(embeddings);

📊 API Endpoints

Endpoint	Method	Description
`/api/v1/embeddings`	POST	Generate embeddings (OpenAI compatible)
`/api/v1/embed_sparse`	POST	Generate sparse embeddings
`/api/v1/rerank`	POST	Rerank documents by relevance
`/api/v1/models`	GET	List available models
`/api/v1/models/{model_id}`	GET	Get model information
`/health`	GET	Health check
`/`	GET	API information
`/docs`	GET	Interactive API documentation

🔌 OpenAI Client Compatibility

This API is fully compatible with OpenAI's client libraries, making it a drop-in replacement for OpenAI's embedding API.

Why use OpenAI client?

✅ Familiar API — Same interface as OpenAI
✅ Type Safety — Full type hints and IDE support
✅ Error Handling — Built-in retry logic and error handling
✅ Async Support — Native async/await support
✅ Easy Migration — Switch between OpenAI and self-hosted seamlessly

Supported Features

Feature	Supported	Notes
`embeddings.create()`	✅ Yes	Single and batch inputs
`input` as string	✅ Yes	Auto-converted to list
`input` as list	✅ Yes	Batch processing
`model` parameter	✅ Yes	Use your model IDs
`encoding_format`	⚠️ Partial	Always returns `float`

Example with OpenAI Client (Compatible!)

from openai import OpenAI

# Initialize client with your API endpoint
client = OpenAI(
    base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
    api_key="-"  # API key not required, but must be present
)

# Generate embeddings
embedding = client.embeddings.create(
    input="Hello",
    model="qwen3-0.6b"
)

# Access results
for item in embedding.data:
    print(f"Embedding: {item.embedding[:5]}...")  # First 5 dimensions
    print(f"Index: {item.index}")

Async OpenAI Client

from openai import AsyncOpenAI

# Initialize async client
client = AsyncOpenAI(
    base_url="https://fahmiaziz-api-embedding.hf.space/api/v1",
    api_key="-"
)

# Generate embeddings asynchronously
async def get_embeddings():
    try:
        embedding = await client.embeddings.create(
            input=["Hello", "World", "AI"],
            model="qwen3-0.6b"
        )
        return embedding
    except Exception as e:
        print(f"Error: {e}")

# Use in async context
embeddings = await get_embeddings()

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📚 Resources

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Sentence Transformers for the embedding models
FastAPI for the excellent web framework
Hugging Face for model hosting and Spaces
OpenAI for the client library design
Open Source Community for inspiration and support

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Hugging Face Space: fahmiaziz/api-embedding

Made with ❤️ by the Open-Source Community

✨ "Unify your embeddings. Simplify your AI stack."