A data set containing Google Books n-gram corpora. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from http://books.google.com/ngrams/....
Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.Reuse and redistribution: the data must be provided under terms that per...
World Continents, Countries, and CitiesDatabase with all the continents, countries, states and cities of the world. This directory contains all 7 continents, 250 countries, 4k subdivisions (provinces, states, etc) and more than 127k thousand cities. All data can be retrieved and managed via APIs. Us...
A collection of public data sets for testing out visualization methods. These data sets are at various stages of preparation, some are just raw data, some are CSV files, and some are exposed as AMD modules. This collection is messy, but with some digging you may find hidden gems....