Really-amin's picture
Upload 317 files
eebf5c4 verified
# Cryptocurrency Data Aggregator - Complete Rewrite
A production-ready cryptocurrency data aggregation application with AI-powered analysis, real-time data collection, and an interactive Gradio dashboard.
## Features
### Core Capabilities
- **Real-time Price Tracking**: Monitor top 100 cryptocurrencies with live updates
- **AI-Powered Sentiment Analysis**: Using HuggingFace models for news sentiment
- **Market Analysis**: Technical indicators (MA, RSI), trend detection, predictions
- **News Aggregation**: RSS feeds from CoinDesk, Cointelegraph, Bitcoin.com, and Reddit
- **Interactive Dashboard**: 6-tab Gradio interface with auto-refresh
- **SQLite Database**: Persistent storage with full CRUD operations
- **No API Keys Required**: Uses only free data sources
### Data Sources (All Free, No Authentication)
- **CoinGecko API**: Market data, prices, rankings
- **CoinCap API**: Backup price data source
- **Binance Public API**: Real-time trading data
- **Alternative.me**: Fear & Greed Index
- **RSS Feeds**: CoinDesk, Cointelegraph, Bitcoin Magazine, Decrypt, Bitcoinist
- **Reddit**: r/cryptocurrency, r/bitcoin, r/ethtrader, r/cryptomarkets
### AI Models (HuggingFace - Local Inference)
- **cardiffnlp/twitter-roberta-base-sentiment-latest**: Social media sentiment
- **ProsusAI/finbert**: Financial news sentiment
- **facebook/bart-large-cnn**: News summarization
## Project Structure
```
crypto-dt-source/
├── config.py # Configuration constants
├── database.py # SQLite database with CRUD operations
├── collectors.py # Data collection from all sources
├── ai_models.py # HuggingFace model integration
├── utils.py # Helper functions and utilities
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
├── README.md # This file
├── data/
│ ├── database/ # SQLite database files
│ └── backups/ # Database backups
└── logs/
└── crypto_aggregator.log # Application logs
```
## Installation
### Prerequisites
- Python 3.8 or higher
- 4GB+ RAM (for AI models)
- Internet connection
### Step 1: Clone Repository
```bash
git clone <repository-url>
cd crypto-dt-source
```
### Step 2: Install Dependencies
```bash
pip install -r requirements.txt
```
This will install:
- Gradio (web interface)
- Pandas, NumPy (data processing)
- Transformers, PyTorch (AI models)
- Plotly (charts)
- BeautifulSoup4, Feedparser (web scraping)
- And more...
### Step 3: Run Application
```bash
python app.py
```
The application will:
1. Initialize the SQLite database
2. Load AI models (first run may take 2-3 minutes)
3. Start background data collection
4. Launch Gradio interface
Access the dashboard at: **http://localhost:7860**
## Gradio Dashboard
### Tab 1: Live Dashboard 📊
- Top 100 cryptocurrencies with real-time prices
- Columns: Rank, Name, Symbol, Price, 24h Change, Volume, Market Cap
- Auto-refresh every 30 seconds
- Search and filter functionality
- Color-coded price changes (green/red)
### Tab 2: Historical Charts 📈
- Select any cryptocurrency
- Choose timeframe: 1d, 7d, 30d, 90d, 1y, All
- Interactive Plotly charts with:
- Price line chart
- Volume bars
- MA(7) and MA(30) overlays
- RSI indicator
- Export charts as PNG
### Tab 3: News & Sentiment 📰
- Latest cryptocurrency news from 9+ sources
- Filter by sentiment: All, Positive, Neutral, Negative
- Filter by coin: BTC, ETH, etc.
- Each article shows:
- Title (clickable link)
- Source and date
- AI-generated sentiment score
- Summary
- Related coins
- Market sentiment gauge (0-100 scale)
### Tab 4: AI Analysis 🤖
- Select cryptocurrency
- Generate AI-powered analysis:
- Current trend (Bullish/Bearish/Neutral)
- Support/Resistance levels
- Technical indicators (RSI, MA7, MA30)
- 24-72h prediction
- Confidence score
- Analysis saved to database for history
### Tab 5: Database Explorer 🗄️
- Pre-built SQL queries:
- Top 10 gainers in last 24h
- All positive sentiment news
- Price history for any coin
- Database statistics
- Custom SQL query support (read-only for security)
- Export results to CSV
### Tab 6: Data Sources Status 🔍
- Real-time status monitoring:
- CoinGecko API ✓
- CoinCap API ✓
- Binance API ✓
- RSS feeds (5 sources) ✓
- Reddit endpoints (4 subreddits) ✓
- Database connection ✓
- Shows: Status (🟢/🔴), Last Update, Error Count
- Manual refresh and data collection controls
- Error log viewer
## Database Schema
### `prices` Table
- `id`: Primary key
- `symbol`: Coin symbol (e.g., "bitcoin")
- `name`: Full name (e.g., "Bitcoin")
- `price_usd`: Current price in USD
- `volume_24h`: 24-hour trading volume
- `market_cap`: Market capitalization
- `percent_change_1h`, `percent_change_24h`, `percent_change_7d`: Price changes
- `rank`: Market cap rank
- `timestamp`: Record timestamp
### `news` Table
- `id`: Primary key
- `title`: News article title
- `summary`: AI-generated summary
- `url`: Article URL (unique)
- `source`: Source name (e.g., "CoinDesk")
- `sentiment_score`: Float (-1 to 1)
- `sentiment_label`: Label (positive/negative/neutral)
- `related_coins`: JSON array of coin symbols
- `published_date`: Original publication date
- `timestamp`: Record timestamp
### `market_analysis` Table
- `id`: Primary key
- `symbol`: Coin symbol
- `timeframe`: Analysis period
- `trend`: Trend direction (Bullish/Bearish/Neutral)
- `support_level`, `resistance_level`: Price levels
- `prediction`: Text prediction
- `confidence`: Confidence score (0-1)
- `timestamp`: Analysis timestamp
### `user_queries` Table
- `id`: Primary key
- `query`: SQL query or search term
- `result_count`: Number of results
- `timestamp`: Query timestamp
## Configuration
Edit `config.py` to customize:
```python
# Data collection intervals
COLLECTION_INTERVALS = {
"price_data": 300, # 5 minutes
"news_data": 1800, # 30 minutes
"sentiment_data": 1800 # 30 minutes
}
# Number of coins to track
TOP_COINS_LIMIT = 100
# Gradio settings
GRADIO_SERVER_PORT = 7860
AUTO_REFRESH_INTERVAL = 30 # seconds
# Cache settings
CACHE_TTL = 300 # 5 minutes
CACHE_MAX_SIZE = 1000
# Logging
LOG_LEVEL = "INFO"
LOG_FILE = "logs/crypto_aggregator.log"
```
## API Usage Examples
### Collect Data Manually
```python
from collectors import collect_price_data, collect_news_data
# Collect latest prices
success, count = collect_price_data()
print(f"Collected {count} prices")
# Collect news
count = collect_news_data()
print(f"Collected {count} articles")
```
### Query Database
```python
from database import get_database
db = get_database()
# Get latest prices
prices = db.get_latest_prices(limit=10)
# Get news by coin
news = db.get_news_by_coin("bitcoin", limit=5)
# Get top gainers
gainers = db.get_top_gainers(limit=10)
```
### AI Analysis
```python
from ai_models import analyze_sentiment, analyze_market_trend
from database import get_database
# Analyze sentiment
result = analyze_sentiment("Bitcoin hits new all-time high!")
print(result) # {'label': 'positive', 'score': 0.95, 'confidence': 0.92}
# Analyze market trend
db = get_database()
history = db.get_price_history("bitcoin", hours=168)
analysis = analyze_market_trend(history)
print(analysis) # {'trend': 'Bullish', 'support_level': 50000, ...}
```
## Error Handling & Resilience
### Fallback Mechanisms
- If CoinGecko fails → CoinCap is used
- If both APIs fail → cached database data is used
- If AI models fail to load → keyword-based sentiment analysis
- All network requests have timeout and retry logic
### Data Validation
- Price bounds checking (MIN_PRICE to MAX_PRICE)
- Volume and market cap validation
- Duplicate prevention (unique URLs for news)
- SQL injection prevention (read-only queries only)
### Logging
All operations are logged to `logs/crypto_aggregator.log`:
- Info: Successful operations, data collection
- Warning: API failures, retries
- Error: Database errors, critical failures
## Performance Optimization
- **Async/Await**: All network requests use aiohttp
- **Connection Pooling**: Reused HTTP connections
- **Caching**: In-memory cache with 5-minute TTL
- **Batch Inserts**: Minimum 100 records per database insert
- **Indexed Queries**: Database indexes on symbol, timestamp, sentiment
- **Lazy Loading**: AI models load only when first used
## Troubleshooting
### Issue: Models won't load
**Solution**: Ensure you have 4GB+ RAM. Models download on first run (2-3 min).
### Issue: No data appearing
**Solution**: Wait 5 minutes for initial data collection, or click "Refresh" buttons.
### Issue: Port 7860 already in use
**Solution**: Change `GRADIO_SERVER_PORT` in `config.py` or kill existing process.
### Issue: Database locked
**Solution**: Only one process can write at a time. Close other instances.
### Issue: RSS feeds failing
**Solution**: Some feeds may be temporarily down. Check Tab 6 for status.
## Development
### Running Tests
```bash
# Test data collection
python collectors.py
# Test AI models
python ai_models.py
# Test utilities
python utils.py
# Test database
python database.py
```
### Adding New Data Sources
Edit `collectors.py`:
```python
def collect_new_source():
try:
response = safe_api_call("https://api.example.com/data")
# Parse and save data
return True
except Exception as e:
logger.error(f"Error: {e}")
return False
```
Add to scheduler in `collectors.py`:
```python
# In schedule_data_collection()
threading.Timer(interval, collect_new_source).start()
```
## Validation Checklist
- [x] All 8 files complete
- [x] No TODO or FIXME comments
- [x] No placeholder functions
- [x] All imports in requirements.txt
- [x] Database schema matches specification
- [x] All 6 Gradio tabs implemented
- [x] All 3 AI models integrated
- [x] All 5+ data sources configured
- [x] Error handling in every network call
- [x] Logging for all major operations
- [x] No API keys in code
- [x] Comments in English
- [x] PEP 8 compliant
## License
MIT License - Free to use, modify, and distribute.
## Support
For issues or questions:
- Check logs: `logs/crypto_aggregator.log`
- Review error messages in Tab 6
- Ensure all dependencies installed: `pip install -r requirements.txt`
## Credits
- **Data Sources**: CoinGecko, CoinCap, Binance, Alternative.me, CoinDesk, Cointelegraph, Reddit
- **AI Models**: HuggingFace (Cardiff NLP, ProsusAI, Facebook)
- **Framework**: Gradio
---
**Made with ❤️ for the Crypto Community**