GitHub Data

GitHub is how people build software and is home to the largest community of open source developers in the world, with over 12 million people contributing to 31 million projects on GitHub since 2008.

This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions.

Data and Resources

Additional Info

Field Value
External Description https://cloud.google.com/bigquery/public-data/github
Source https://bigquery.cloud.google.com/dataset/bigquery-public-data:github_repos?pli=1
Version 1.0
Contact Timofey Ermilov
Contact Email ermilov@informatik.uni-leipzig.de
Benchmark Generation and Acquisition