Datasourceforcryptocurrency / hf-data-engine /REAL_DATA_IMPLEMENTATION.md
Really-amin's picture
Upload 317 files
eebf5c4 verified
# Real Data Implementation Guide
## Overview
The crypto monitoring API has been upgraded from mock data to **real provider-backed data**. This document explains the changes and how to use the new functionality.
## What Changed
### Files Modified
1. **`api_server_extended.py`** - Main API server
- Added imports for `ProviderFetchHelper`, `CryptoDatabase`, and `os`
- Added `fetch_helper` and `db` global instances
- Added `USE_MOCK_DATA` environment flag
- Replaced 5 mock endpoints with real implementations:
- `GET /api/market` - Now fetches from CoinGecko
- `GET /api/sentiment` - Now fetches from Alternative.me
- `GET /api/trending` - Now fetches from CoinGecko
- `GET /api/defi` - Returns 503 (requires DeFi provider)
- `POST /api/hf/run-sentiment` - Returns 501 (requires ML models)
- Added new endpoint: `GET /api/market/history` - Historical data from SQLite
2. **`provider_fetch_helper.py`** - New file
- Implements `ProviderFetchHelper` class
- Provides `fetch_from_pool()` method for pool-based fetching
- Provides `fetch_from_provider()` method for direct provider access
- Integrates with existing ProviderManager, circuit breakers, and logging
- Handles automatic failover and retry logic
3. **`test_real_data.py`** - New file
- Test script to verify real data endpoints
- Tests all modified endpoints
- Provides clear pass/fail results
## Architecture
### Data Flow
```
Client Request
FastAPI Endpoint (api_server_extended.py)
ProviderFetchHelper.fetch_from_provider()
ProviderManager → Get Provider Config
aiohttp → HTTP Request to External API
Response Processing & Normalization
Database Storage (SQLite)
JSON Response to Client
```
### Provider Integration
The implementation uses the **existing provider management system**:
- **Provider Configs**: Loaded from JSON files (providers_config_extended.json, etc.)
- **Circuit Breakers**: Automatic failure detection and recovery
- **Metrics**: Success rate, response time, request counts
- **Logging**: All requests logged with provider_id and details
- **Health Checks**: Existing health check system continues to work
## API Endpoints
### 1. GET /api/market
**Real Data Mode** (default):
```bash
curl http://localhost:8000/api/market
```
Response:
```json
{
"mode": "real",
"cryptocurrencies": [
{
"rank": 1,
"name": "Bitcoin",
"symbol": "BTC",
"price": 43250.50,
"change_24h": 2.35,
"market_cap": 845000000000,
"volume_24h": 28500000000
}
],
"source": "CoinGecko",
"timestamp": "2025-01-15T10:30:00Z",
"response_time_ms": 245
}
```
**Mock Mode**:
```bash
USE_MOCK_DATA=true python main.py
curl http://localhost:8000/api/market
```
### 2. GET /api/market/history
**New endpoint** for historical price data from database:
```bash
curl "http://localhost:8000/api/market/history?symbol=BTC&limit=10"
```
Response:
```json
{
"symbol": "BTC",
"count": 10,
"history": [
{
"symbol": "BTC",
"name": "Bitcoin",
"price_usd": 43250.50,
"volume_24h": 28500000000,
"market_cap": 845000000000,
"percent_change_24h": 2.35,
"rank": 1,
"timestamp": "2025-01-15 10:30:00"
}
]
}
```
### 3. GET /api/sentiment
**Real Data Mode**:
```bash
curl http://localhost:8000/api/sentiment
```
Response:
```json
{
"mode": "real",
"fear_greed_index": {
"value": 62,
"classification": "Greed",
"timestamp": "1705315800",
"time_until_update": "43200"
},
"source": "alternative.me"
}
```
### 4. GET /api/trending
**Real Data Mode**:
```bash
curl http://localhost:8000/api/trending
```
Response:
```json
{
"mode": "real",
"trending": [
{
"name": "Solana",
"symbol": "SOL",
"thumb": "https://...",
"market_cap_rank": 5,
"score": 0
}
],
"source": "CoinGecko",
"timestamp": "2025-01-15T10:30:00Z"
}
```
### 5. GET /api/defi
**Status**: Not implemented (requires DeFi provider)
```bash
curl http://localhost:8000/api/defi
```
Response:
```json
{
"detail": "DeFi TVL data provider not configured. Add DefiLlama or similar provider to enable this endpoint."
}
```
**Status Code**: 503 Service Unavailable
### 6. POST /api/hf/run-sentiment
**Status**: Not implemented (requires ML models)
```bash
curl -X POST http://localhost:8000/api/hf/run-sentiment \
-H "Content-Type: application/json" \
-d '{"texts": ["Bitcoin is bullish"]}'
```
Response:
```json
{
"detail": "Real ML-based sentiment analysis is not yet implemented. This endpoint is reserved for future integration with HuggingFace transformer models. Set USE_MOCK_DATA=true for demo mode with keyword-based sentiment."
}
```
**Status Code**: 501 Not Implemented
## Environment Variables
### USE_MOCK_DATA
Controls whether endpoints return real or mock data.
**Default**: `false` (real data)
**Usage**:
```bash
# Real data (default)
python main.py
# Mock data (for demos)
USE_MOCK_DATA=true python main.py
# Docker
docker run -e USE_MOCK_DATA=false -p 8000:8000 crypto-monitor
```
**Behavior**:
- `false` or unset: All endpoints fetch real data from providers
- `true`: Endpoints return mock data (for testing/demos)
## Provider Configuration
### Required Providers
The following providers must be configured in `providers_config_extended.json`:
1. **coingecko** - For market data and trending
- Endpoints: `simple_price`, `trending`
- No API key required (free tier)
- Rate limit: 50 req/min
2. **alternative.me** - For sentiment (Fear & Greed Index)
- Direct HTTP call (not in provider config)
- No API key required
- Public API
### Optional Providers
3. **DefiLlama** - For DeFi TVL data
- Not currently configured
- Would enable `/api/defi` endpoint
### Adding New Providers
To add a new provider:
1. Edit `providers_config_extended.json`:
```json
{
"providers": {
"your_provider": {
"name": "Your Provider",
"category": "market_data",
"base_url": "https://api.example.com",
"endpoints": {
"prices": "/v1/prices"
},
"rate_limit": {
"requests_per_minute": 60
},
"requires_auth": false,
"priority": 8,
"weight": 80
}
}
}
```
2. Use in endpoint:
```python
result = await fetch_helper.fetch_from_provider(
"your_provider",
"prices",
params={"symbols": "BTC,ETH"}
)
```
## Database Integration
### Schema
The SQLite database (`data/crypto_aggregator.db`) stores:
**prices table**:
- symbol, name, price_usd, volume_24h, market_cap
- percent_change_1h, percent_change_24h, percent_change_7d
- rank, timestamp
### Automatic Storage
When `/api/market` is called:
1. Real data is fetched from CoinGecko
2. Each asset is automatically saved to the database
3. Historical data accumulates over time
4. Query with `/api/market/history`
### Manual Queries
```python
from database import CryptoDatabase
db = CryptoDatabase()
# Get recent prices
with db.get_connection() as conn:
cursor = conn.cursor()
cursor.execute("""
SELECT * FROM prices
WHERE symbol = 'BTC'
ORDER BY timestamp DESC
LIMIT 100
""")
rows = cursor.fetchall()
```
## Testing
### Automated Tests
```bash
# Start server
python main.py
# In another terminal, run tests
python test_real_data.py
```
### Manual Testing
```bash
# Test market data
curl http://localhost:8000/api/market
# Test with parameters
curl "http://localhost:8000/api/market/history?symbol=ETH&limit=5"
# Test sentiment
curl http://localhost:8000/api/sentiment
# Test trending
curl http://localhost:8000/api/trending
# Check health
curl http://localhost:8000/health
# View API docs
open http://localhost:8000/docs
```
## Error Handling
### Provider Unavailable
If a provider is down:
```json
{
"detail": "All providers in pool 'market_primary' failed. Last error: Connection timeout"
}
```
**Status Code**: 503
### Provider Not Configured
If required provider missing:
```json
{
"detail": "Market data provider (CoinGecko) not configured"
}
```
**Status Code**: 503
### Database Error
If database operation fails:
```json
{
"detail": "Database error: unable to open database file"
}
```
**Status Code**: 500
## Monitoring
### Logs
All requests are logged to `logs/` directory:
```
INFO - Successfully fetched from CoinGecko
provider_id: coingecko
endpoint: simple_price
response_time_ms: 245
pool: market_primary
```
### Metrics
Provider metrics are updated automatically:
- `total_requests`
- `successful_requests`
- `failed_requests`
- `avg_response_time`
- `success_rate`
- `consecutive_failures`
View metrics:
```bash
curl http://localhost:8000/api/providers/coingecko
```
### Health Checks
Existing health check system continues to work:
```bash
curl http://localhost:8000/api/providers/coingecko/health-check
```
## Deployment
### Docker
```bash
# Build
docker build -t crypto-monitor .
# Run with real data (default)
docker run -p 8000:8000 crypto-monitor
# Run with mock data
docker run -e USE_MOCK_DATA=true -p 8000:8000 crypto-monitor
```
### Hugging Face Spaces
The service is ready for HF Spaces deployment:
1. Push to HF Space repository
2. Set Space SDK to "Docker"
3. Optionally set `USE_MOCK_DATA` in Space secrets
4. Service will start automatically
## Future Enhancements
### Planned
1. **Pool-based fetching**: Use provider pools instead of direct provider access
2. **ML sentiment analysis**: Load HuggingFace models for real sentiment
3. **DeFi integration**: Add DefiLlama provider
4. **Caching layer**: Redis for frequently accessed data
5. **Rate limiting**: Per-client rate limits
6. **Authentication**: API key management
### Contributing
To add real data for a new endpoint:
1. Identify the provider and endpoint
2. Add provider to config if needed
3. Use `fetch_helper.fetch_from_provider()` in endpoint
4. Normalize response to consistent schema
5. Add database storage if applicable
6. Update tests and documentation
## Troubleshooting
### "Provider not configured"
**Solution**: Check `providers_config_extended.json` has the required provider
### "All providers failed"
**Solution**:
- Check internet connectivity
- Verify provider URLs are correct
- Check rate limits haven't been exceeded
- View logs for detailed error messages
### "Database error"
**Solution**:
- Ensure `data/` directory exists and is writable
- Check disk space
- Verify SQLite is installed
### Mock data still showing
**Solution**:
- Ensure `USE_MOCK_DATA` is not set or is set to `false`
- Restart the server
- Check environment variables: `env | grep USE_MOCK_DATA`
## Summary
**Real data** is now the default for all crypto endpoints
**Database integration** stores historical prices
**Provider management** uses existing sophisticated system
**Graceful degradation** with clear error messages
**Mock mode** available for demos via environment flag
**Production-ready** for deployment
The API is now a fully functional crypto data service, not just a monitoring platform!