Following the strong response to the Google Code Archive nyuuzyou/google-code-archive (thanks!), this release preserves another major historical repository: the Microsoft CodePlex Archive.
CodePlex served as Microsoft’s primary open-source hosting platform from 2006 to 2017. This dataset captures the distinct .NET and Windows-centric development ecosystem that flourished before the industry standardizing on GitHub.
Key Stats:
- 5,043,730 files from 38,087 repositories - 3.6 GB compressed Parquet - 91 programming languages (Heavily featuring C#, ASP.NET, and C++) - Cleaned of binaries, build artifacts, and vendor directories (node_modules, packages) - Includes platform-specific license metadata (Ms-PL, Ms-RL)