Skip to content

Curated list of Publicly available Big Data datasets. Uncompressed size in brackets. No Blockchains.

Notifications You must be signed in to change notification settings

niderhoff/big-data-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Big Data Datasets

Curated list of Publicly available Big Data datasets. Uncompressed size in brackets. No Blockchains.

Structured

Text

  • CommonCrawl (AWS) - A corpus of web crawl data composed of over 25 billion web pages.
    • Semi-Structured (includes Metadata): 250 TB
  • DBpedia - curated wikipedia data
  • Freebase
    • Freebase: 22 GB (250 GB)
    • Freebase Deleted Triples: 2 GB (8 GB)
    • Freebase/wikidata Mappings: 22 MB (243 MB)
  • StackOverflow Data (BigQuery) - 182 GB

Image

Audio

Bonus: API / Streamdata / "Self-Service"

Bonus: Opendata / Census / Government data

Meta / Lists / Sources

These pages might link to datastes which are already in the list.

About

Curated list of Publicly available Big Data datasets. Uncompressed size in brackets. No Blockchains.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published