Datasourceforcryptocurrency / hf-data-engine /REAL_DATA_IMPLEMENTATION.md
Really-amin's picture
Upload 317 files
eebf5c4 verified

Real Data Implementation Guide

Overview

The crypto monitoring API has been upgraded from mock data to real provider-backed data. This document explains the changes and how to use the new functionality.

What Changed

Files Modified

  1. api_server_extended.py - Main API server

    • Added imports for ProviderFetchHelper, CryptoDatabase, and os
    • Added fetch_helper and db global instances
    • Added USE_MOCK_DATA environment flag
    • Replaced 5 mock endpoints with real implementations:
      • GET /api/market - Now fetches from CoinGecko
      • GET /api/sentiment - Now fetches from Alternative.me
      • GET /api/trending - Now fetches from CoinGecko
      • GET /api/defi - Returns 503 (requires DeFi provider)
      • POST /api/hf/run-sentiment - Returns 501 (requires ML models)
    • Added new endpoint: GET /api/market/history - Historical data from SQLite
  2. provider_fetch_helper.py - New file

    • Implements ProviderFetchHelper class
    • Provides fetch_from_pool() method for pool-based fetching
    • Provides fetch_from_provider() method for direct provider access
    • Integrates with existing ProviderManager, circuit breakers, and logging
    • Handles automatic failover and retry logic
  3. test_real_data.py - New file

    • Test script to verify real data endpoints
    • Tests all modified endpoints
    • Provides clear pass/fail results

Architecture

Data Flow

Client Request
    ↓
FastAPI Endpoint (api_server_extended.py)
    ↓
ProviderFetchHelper.fetch_from_provider()
    ↓
ProviderManager → Get Provider Config
    ↓
aiohttp → HTTP Request to External API
    ↓
Response Processing & Normalization
    ↓
Database Storage (SQLite)
    ↓
JSON Response to Client

Provider Integration

The implementation uses the existing provider management system:

  • Provider Configs: Loaded from JSON files (providers_config_extended.json, etc.)
  • Circuit Breakers: Automatic failure detection and recovery
  • Metrics: Success rate, response time, request counts
  • Logging: All requests logged with provider_id and details
  • Health Checks: Existing health check system continues to work

API Endpoints

1. GET /api/market

Real Data Mode (default):

curl http://localhost:8000/api/market

Response:

{
  "mode": "real",
  "cryptocurrencies": [
    {
      "rank": 1,
      "name": "Bitcoin",
      "symbol": "BTC",
      "price": 43250.50,
      "change_24h": 2.35,
      "market_cap": 845000000000,
      "volume_24h": 28500000000
    }
  ],
  "source": "CoinGecko",
  "timestamp": "2025-01-15T10:30:00Z",
  "response_time_ms": 245
}

Mock Mode:

USE_MOCK_DATA=true python main.py
curl http://localhost:8000/api/market

2. GET /api/market/history

New endpoint for historical price data from database:

curl "http://localhost:8000/api/market/history?symbol=BTC&limit=10"

Response:

{
  "symbol": "BTC",
  "count": 10,
  "history": [
    {
      "symbol": "BTC",
      "name": "Bitcoin",
      "price_usd": 43250.50,
      "volume_24h": 28500000000,
      "market_cap": 845000000000,
      "percent_change_24h": 2.35,
      "rank": 1,
      "timestamp": "2025-01-15 10:30:00"
    }
  ]
}

3. GET /api/sentiment

Real Data Mode:

curl http://localhost:8000/api/sentiment

Response:

{
  "mode": "real",
  "fear_greed_index": {
    "value": 62,
    "classification": "Greed",
    "timestamp": "1705315800",
    "time_until_update": "43200"
  },
  "source": "alternative.me"
}

4. GET /api/trending

Real Data Mode:

curl http://localhost:8000/api/trending

Response:

{
  "mode": "real",
  "trending": [
    {
      "name": "Solana",
      "symbol": "SOL",
      "thumb": "https://...",
      "market_cap_rank": 5,
      "score": 0
    }
  ],
  "source": "CoinGecko",
  "timestamp": "2025-01-15T10:30:00Z"
}

5. GET /api/defi

Status: Not implemented (requires DeFi provider)

curl http://localhost:8000/api/defi

Response:

{
  "detail": "DeFi TVL data provider not configured. Add DefiLlama or similar provider to enable this endpoint."
}

Status Code: 503 Service Unavailable

6. POST /api/hf/run-sentiment

Status: Not implemented (requires ML models)

curl -X POST http://localhost:8000/api/hf/run-sentiment \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Bitcoin is bullish"]}'

Response:

{
  "detail": "Real ML-based sentiment analysis is not yet implemented. This endpoint is reserved for future integration with HuggingFace transformer models. Set USE_MOCK_DATA=true for demo mode with keyword-based sentiment."
}

Status Code: 501 Not Implemented

Environment Variables

USE_MOCK_DATA

Controls whether endpoints return real or mock data.

Default: false (real data)

Usage:

# Real data (default)
python main.py

# Mock data (for demos)
USE_MOCK_DATA=true python main.py

# Docker
docker run -e USE_MOCK_DATA=false -p 8000:8000 crypto-monitor

Behavior:

  • false or unset: All endpoints fetch real data from providers
  • true: Endpoints return mock data (for testing/demos)

Provider Configuration

Required Providers

The following providers must be configured in providers_config_extended.json:

  1. coingecko - For market data and trending

    • Endpoints: simple_price, trending
    • No API key required (free tier)
    • Rate limit: 50 req/min
  2. alternative.me - For sentiment (Fear & Greed Index)

    • Direct HTTP call (not in provider config)
    • No API key required
    • Public API

Optional Providers

  1. DefiLlama - For DeFi TVL data
    • Not currently configured
    • Would enable /api/defi endpoint

Adding New Providers

To add a new provider:

  1. Edit providers_config_extended.json:
{
  "providers": {
    "your_provider": {
      "name": "Your Provider",
      "category": "market_data",
      "base_url": "https://api.example.com",
      "endpoints": {
        "prices": "/v1/prices"
      },
      "rate_limit": {
        "requests_per_minute": 60
      },
      "requires_auth": false,
      "priority": 8,
      "weight": 80
    }
  }
}
  1. Use in endpoint:
result = await fetch_helper.fetch_from_provider(
    "your_provider",
    "prices",
    params={"symbols": "BTC,ETH"}
)

Database Integration

Schema

The SQLite database (data/crypto_aggregator.db) stores:

prices table:

  • symbol, name, price_usd, volume_24h, market_cap
  • percent_change_1h, percent_change_24h, percent_change_7d
  • rank, timestamp

Automatic Storage

When /api/market is called:

  1. Real data is fetched from CoinGecko
  2. Each asset is automatically saved to the database
  3. Historical data accumulates over time
  4. Query with /api/market/history

Manual Queries

from database import CryptoDatabase

db = CryptoDatabase()

# Get recent prices
with db.get_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("""
        SELECT * FROM prices 
        WHERE symbol = 'BTC' 
        ORDER BY timestamp DESC 
        LIMIT 100
    """)
    rows = cursor.fetchall()

Testing

Automated Tests

# Start server
python main.py

# In another terminal, run tests
python test_real_data.py

Manual Testing

# Test market data
curl http://localhost:8000/api/market

# Test with parameters
curl "http://localhost:8000/api/market/history?symbol=ETH&limit=5"

# Test sentiment
curl http://localhost:8000/api/sentiment

# Test trending
curl http://localhost:8000/api/trending

# Check health
curl http://localhost:8000/health

# View API docs
open http://localhost:8000/docs

Error Handling

Provider Unavailable

If a provider is down:

{
  "detail": "All providers in pool 'market_primary' failed. Last error: Connection timeout"
}

Status Code: 503

Provider Not Configured

If required provider missing:

{
  "detail": "Market data provider (CoinGecko) not configured"
}

Status Code: 503

Database Error

If database operation fails:

{
  "detail": "Database error: unable to open database file"
}

Status Code: 500

Monitoring

Logs

All requests are logged to logs/ directory:

INFO - Successfully fetched from CoinGecko
  provider_id: coingecko
  endpoint: simple_price
  response_time_ms: 245
  pool: market_primary

Metrics

Provider metrics are updated automatically:

  • total_requests
  • successful_requests
  • failed_requests
  • avg_response_time
  • success_rate
  • consecutive_failures

View metrics:

curl http://localhost:8000/api/providers/coingecko

Health Checks

Existing health check system continues to work:

curl http://localhost:8000/api/providers/coingecko/health-check

Deployment

Docker

# Build
docker build -t crypto-monitor .

# Run with real data (default)
docker run -p 8000:8000 crypto-monitor

# Run with mock data
docker run -e USE_MOCK_DATA=true -p 8000:8000 crypto-monitor

Hugging Face Spaces

The service is ready for HF Spaces deployment:

  1. Push to HF Space repository
  2. Set Space SDK to "Docker"
  3. Optionally set USE_MOCK_DATA in Space secrets
  4. Service will start automatically

Future Enhancements

Planned

  1. Pool-based fetching: Use provider pools instead of direct provider access
  2. ML sentiment analysis: Load HuggingFace models for real sentiment
  3. DeFi integration: Add DefiLlama provider
  4. Caching layer: Redis for frequently accessed data
  5. Rate limiting: Per-client rate limits
  6. Authentication: API key management

Contributing

To add real data for a new endpoint:

  1. Identify the provider and endpoint
  2. Add provider to config if needed
  3. Use fetch_helper.fetch_from_provider() in endpoint
  4. Normalize response to consistent schema
  5. Add database storage if applicable
  6. Update tests and documentation

Troubleshooting

"Provider not configured"

Solution: Check providers_config_extended.json has the required provider

"All providers failed"

Solution:

  • Check internet connectivity
  • Verify provider URLs are correct
  • Check rate limits haven't been exceeded
  • View logs for detailed error messages

"Database error"

Solution:

  • Ensure data/ directory exists and is writable
  • Check disk space
  • Verify SQLite is installed

Mock data still showing

Solution:

  • Ensure USE_MOCK_DATA is not set or is set to false
  • Restart the server
  • Check environment variables: env | grep USE_MOCK_DATA

Summary

Real data is now the default for all crypto endpoints ✅ Database integration stores historical prices ✅ Provider management uses existing sophisticated system ✅ Graceful degradation with clear error messages ✅ Mock mode available for demos via environment flag ✅ Production-ready for deployment

The API is now a fully functional crypto data service, not just a monitoring platform!