| # Cryptocurrency Data Aggregator - Complete Rewrite | |
| A production-ready cryptocurrency data aggregation application with AI-powered analysis, real-time data collection, and an interactive Gradio dashboard. | |
| ## Features | |
| ### Core Capabilities | |
| - **Real-time Price Tracking**: Monitor top 100 cryptocurrencies with live updates | |
| - **AI-Powered Sentiment Analysis**: Using HuggingFace models for news sentiment | |
| - **Market Analysis**: Technical indicators (MA, RSI), trend detection, predictions | |
| - **News Aggregation**: RSS feeds from CoinDesk, Cointelegraph, Bitcoin.com, and Reddit | |
| - **Interactive Dashboard**: 6-tab Gradio interface with auto-refresh | |
| - **SQLite Database**: Persistent storage with full CRUD operations | |
| - **No API Keys Required**: Uses only free data sources | |
| ### Data Sources (All Free, No Authentication) | |
| - **CoinGecko API**: Market data, prices, rankings | |
| - **CoinCap API**: Backup price data source | |
| - **Binance Public API**: Real-time trading data | |
| - **Alternative.me**: Fear & Greed Index | |
| - **RSS Feeds**: CoinDesk, Cointelegraph, Bitcoin Magazine, Decrypt, Bitcoinist | |
| - **Reddit**: r/cryptocurrency, r/bitcoin, r/ethtrader, r/cryptomarkets | |
| ### AI Models (HuggingFace - Local Inference) | |
| - **cardiffnlp/twitter-roberta-base-sentiment-latest**: Social media sentiment | |
| - **ProsusAI/finbert**: Financial news sentiment | |
| - **facebook/bart-large-cnn**: News summarization | |
| ## Project Structure | |
| ``` | |
| crypto-dt-source/ | |
| ├── config.py # Configuration constants | |
| ├── database.py # SQLite database with CRUD operations | |
| ├── collectors.py # Data collection from all sources | |
| ├── ai_models.py # HuggingFace model integration | |
| ├── utils.py # Helper functions and utilities | |
| ├── app.py # Main Gradio application | |
| ├── requirements.txt # Python dependencies | |
| ├── README.md # This file | |
| ├── data/ | |
| │ ├── database/ # SQLite database files | |
| │ └── backups/ # Database backups | |
| └── logs/ | |
| └── crypto_aggregator.log # Application logs | |
| ``` | |
| ## Installation | |
| ### Prerequisites | |
| - Python 3.8 or higher | |
| - 4GB+ RAM (for AI models) | |
| - Internet connection | |
| ### Step 1: Clone Repository | |
| ```bash | |
| git clone <repository-url> | |
| cd crypto-dt-source | |
| ``` | |
| ### Step 2: Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| This will install: | |
| - Gradio (web interface) | |
| - Pandas, NumPy (data processing) | |
| - Transformers, PyTorch (AI models) | |
| - Plotly (charts) | |
| - BeautifulSoup4, Feedparser (web scraping) | |
| - And more... | |
| ### Step 3: Run Application | |
| ```bash | |
| python app.py | |
| ``` | |
| The application will: | |
| 1. Initialize the SQLite database | |
| 2. Load AI models (first run may take 2-3 minutes) | |
| 3. Start background data collection | |
| 4. Launch Gradio interface | |
| Access the dashboard at: **http://localhost:7860** | |
| ## Gradio Dashboard | |
| ### Tab 1: Live Dashboard 📊 | |
| - Top 100 cryptocurrencies with real-time prices | |
| - Columns: Rank, Name, Symbol, Price, 24h Change, Volume, Market Cap | |
| - Auto-refresh every 30 seconds | |
| - Search and filter functionality | |
| - Color-coded price changes (green/red) | |
| ### Tab 2: Historical Charts 📈 | |
| - Select any cryptocurrency | |
| - Choose timeframe: 1d, 7d, 30d, 90d, 1y, All | |
| - Interactive Plotly charts with: | |
| - Price line chart | |
| - Volume bars | |
| - MA(7) and MA(30) overlays | |
| - RSI indicator | |
| - Export charts as PNG | |
| ### Tab 3: News & Sentiment 📰 | |
| - Latest cryptocurrency news from 9+ sources | |
| - Filter by sentiment: All, Positive, Neutral, Negative | |
| - Filter by coin: BTC, ETH, etc. | |
| - Each article shows: | |
| - Title (clickable link) | |
| - Source and date | |
| - AI-generated sentiment score | |
| - Summary | |
| - Related coins | |
| - Market sentiment gauge (0-100 scale) | |
| ### Tab 4: AI Analysis 🤖 | |
| - Select cryptocurrency | |
| - Generate AI-powered analysis: | |
| - Current trend (Bullish/Bearish/Neutral) | |
| - Support/Resistance levels | |
| - Technical indicators (RSI, MA7, MA30) | |
| - 24-72h prediction | |
| - Confidence score | |
| - Analysis saved to database for history | |
| ### Tab 5: Database Explorer 🗄️ | |
| - Pre-built SQL queries: | |
| - Top 10 gainers in last 24h | |
| - All positive sentiment news | |
| - Price history for any coin | |
| - Database statistics | |
| - Custom SQL query support (read-only for security) | |
| - Export results to CSV | |
| ### Tab 6: Data Sources Status 🔍 | |
| - Real-time status monitoring: | |
| - CoinGecko API ✓ | |
| - CoinCap API ✓ | |
| - Binance API ✓ | |
| - RSS feeds (5 sources) ✓ | |
| - Reddit endpoints (4 subreddits) ✓ | |
| - Database connection ✓ | |
| - Shows: Status (🟢/🔴), Last Update, Error Count | |
| - Manual refresh and data collection controls | |
| - Error log viewer | |
| ## Database Schema | |
| ### `prices` Table | |
| - `id`: Primary key | |
| - `symbol`: Coin symbol (e.g., "bitcoin") | |
| - `name`: Full name (e.g., "Bitcoin") | |
| - `price_usd`: Current price in USD | |
| - `volume_24h`: 24-hour trading volume | |
| - `market_cap`: Market capitalization | |
| - `percent_change_1h`, `percent_change_24h`, `percent_change_7d`: Price changes | |
| - `rank`: Market cap rank | |
| - `timestamp`: Record timestamp | |
| ### `news` Table | |
| - `id`: Primary key | |
| - `title`: News article title | |
| - `summary`: AI-generated summary | |
| - `url`: Article URL (unique) | |
| - `source`: Source name (e.g., "CoinDesk") | |
| - `sentiment_score`: Float (-1 to 1) | |
| - `sentiment_label`: Label (positive/negative/neutral) | |
| - `related_coins`: JSON array of coin symbols | |
| - `published_date`: Original publication date | |
| - `timestamp`: Record timestamp | |
| ### `market_analysis` Table | |
| - `id`: Primary key | |
| - `symbol`: Coin symbol | |
| - `timeframe`: Analysis period | |
| - `trend`: Trend direction (Bullish/Bearish/Neutral) | |
| - `support_level`, `resistance_level`: Price levels | |
| - `prediction`: Text prediction | |
| - `confidence`: Confidence score (0-1) | |
| - `timestamp`: Analysis timestamp | |
| ### `user_queries` Table | |
| - `id`: Primary key | |
| - `query`: SQL query or search term | |
| - `result_count`: Number of results | |
| - `timestamp`: Query timestamp | |
| ## Configuration | |
| Edit `config.py` to customize: | |
| ```python | |
| # Data collection intervals | |
| COLLECTION_INTERVALS = { | |
| "price_data": 300, # 5 minutes | |
| "news_data": 1800, # 30 minutes | |
| "sentiment_data": 1800 # 30 minutes | |
| } | |
| # Number of coins to track | |
| TOP_COINS_LIMIT = 100 | |
| # Gradio settings | |
| GRADIO_SERVER_PORT = 7860 | |
| AUTO_REFRESH_INTERVAL = 30 # seconds | |
| # Cache settings | |
| CACHE_TTL = 300 # 5 minutes | |
| CACHE_MAX_SIZE = 1000 | |
| # Logging | |
| LOG_LEVEL = "INFO" | |
| LOG_FILE = "logs/crypto_aggregator.log" | |
| ``` | |
| ## API Usage Examples | |
| ### Collect Data Manually | |
| ```python | |
| from collectors import collect_price_data, collect_news_data | |
| # Collect latest prices | |
| success, count = collect_price_data() | |
| print(f"Collected {count} prices") | |
| # Collect news | |
| count = collect_news_data() | |
| print(f"Collected {count} articles") | |
| ``` | |
| ### Query Database | |
| ```python | |
| from database import get_database | |
| db = get_database() | |
| # Get latest prices | |
| prices = db.get_latest_prices(limit=10) | |
| # Get news by coin | |
| news = db.get_news_by_coin("bitcoin", limit=5) | |
| # Get top gainers | |
| gainers = db.get_top_gainers(limit=10) | |
| ``` | |
| ### AI Analysis | |
| ```python | |
| from ai_models import analyze_sentiment, analyze_market_trend | |
| from database import get_database | |
| # Analyze sentiment | |
| result = analyze_sentiment("Bitcoin hits new all-time high!") | |
| print(result) # {'label': 'positive', 'score': 0.95, 'confidence': 0.92} | |
| # Analyze market trend | |
| db = get_database() | |
| history = db.get_price_history("bitcoin", hours=168) | |
| analysis = analyze_market_trend(history) | |
| print(analysis) # {'trend': 'Bullish', 'support_level': 50000, ...} | |
| ``` | |
| ## Error Handling & Resilience | |
| ### Fallback Mechanisms | |
| - If CoinGecko fails → CoinCap is used | |
| - If both APIs fail → cached database data is used | |
| - If AI models fail to load → keyword-based sentiment analysis | |
| - All network requests have timeout and retry logic | |
| ### Data Validation | |
| - Price bounds checking (MIN_PRICE to MAX_PRICE) | |
| - Volume and market cap validation | |
| - Duplicate prevention (unique URLs for news) | |
| - SQL injection prevention (read-only queries only) | |
| ### Logging | |
| All operations are logged to `logs/crypto_aggregator.log`: | |
| - Info: Successful operations, data collection | |
| - Warning: API failures, retries | |
| - Error: Database errors, critical failures | |
| ## Performance Optimization | |
| - **Async/Await**: All network requests use aiohttp | |
| - **Connection Pooling**: Reused HTTP connections | |
| - **Caching**: In-memory cache with 5-minute TTL | |
| - **Batch Inserts**: Minimum 100 records per database insert | |
| - **Indexed Queries**: Database indexes on symbol, timestamp, sentiment | |
| - **Lazy Loading**: AI models load only when first used | |
| ## Troubleshooting | |
| ### Issue: Models won't load | |
| **Solution**: Ensure you have 4GB+ RAM. Models download on first run (2-3 min). | |
| ### Issue: No data appearing | |
| **Solution**: Wait 5 minutes for initial data collection, or click "Refresh" buttons. | |
| ### Issue: Port 7860 already in use | |
| **Solution**: Change `GRADIO_SERVER_PORT` in `config.py` or kill existing process. | |
| ### Issue: Database locked | |
| **Solution**: Only one process can write at a time. Close other instances. | |
| ### Issue: RSS feeds failing | |
| **Solution**: Some feeds may be temporarily down. Check Tab 6 for status. | |
| ## Development | |
| ### Running Tests | |
| ```bash | |
| # Test data collection | |
| python collectors.py | |
| # Test AI models | |
| python ai_models.py | |
| # Test utilities | |
| python utils.py | |
| # Test database | |
| python database.py | |
| ``` | |
| ### Adding New Data Sources | |
| Edit `collectors.py`: | |
| ```python | |
| def collect_new_source(): | |
| try: | |
| response = safe_api_call("https://api.example.com/data") | |
| # Parse and save data | |
| return True | |
| except Exception as e: | |
| logger.error(f"Error: {e}") | |
| return False | |
| ``` | |
| Add to scheduler in `collectors.py`: | |
| ```python | |
| # In schedule_data_collection() | |
| threading.Timer(interval, collect_new_source).start() | |
| ``` | |
| ## Validation Checklist | |
| - [x] All 8 files complete | |
| - [x] No TODO or FIXME comments | |
| - [x] No placeholder functions | |
| - [x] All imports in requirements.txt | |
| - [x] Database schema matches specification | |
| - [x] All 6 Gradio tabs implemented | |
| - [x] All 3 AI models integrated | |
| - [x] All 5+ data sources configured | |
| - [x] Error handling in every network call | |
| - [x] Logging for all major operations | |
| - [x] No API keys in code | |
| - [x] Comments in English | |
| - [x] PEP 8 compliant | |
| ## License | |
| MIT License - Free to use, modify, and distribute. | |
| ## Support | |
| For issues or questions: | |
| - Check logs: `logs/crypto_aggregator.log` | |
| - Review error messages in Tab 6 | |
| - Ensure all dependencies installed: `pip install -r requirements.txt` | |
| ## Credits | |
| - **Data Sources**: CoinGecko, CoinCap, Binance, Alternative.me, CoinDesk, Cointelegraph, Reddit | |
| - **AI Models**: HuggingFace (Cardiff NLP, ProsusAI, Facebook) | |
| - **Framework**: Gradio | |
| --- | |
| **Made with ❤️ for the Crypto Community** | |