Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published 10 days ago • 4
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15 • 8
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29 • 202
Comma v0.1 Artifacts Collection A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated Jun 6 • 4
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6 • 20
Common Pile v0.1 Raw Data Collection 8TB of public domain and openly licensed text • 30 items • Updated Aug 14 • 21
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6 • 37
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 53
Community Projects Collection Datasets, models, and spaces created by the community • 20 items • Updated Nov 1 • 1
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 23
Positional Datasets Collection Datasets where each row is a chess position • 5 items • Updated 24 days ago • 8