Friday, August 31, 2012
If anyone is looking for a comprehensive places dataset, then try the open sourced SimpleGeo database dump (see link below)
From SimpleGeo: "We’re very excited to announce that the SimpleGeo’s CC0 Places data set is now available for download at no cost. If you’d like to get your hands on 21M+ POIs that cover 63 countries, we’re ready to hand that over to you in one file. The file is about 2GB in .ZIP format, and remember, with the CC0 license, this data becomes yours – free and clear – to do whatever you want."
It holds 21 million of places!!!, but with a horrible geo-json format, probably pumped out of a NoSQL database. Therefore, I have created a couple of Java classes that export it to a friendlier CSV file. First run the Clean_Geojson.java class to produce a more easily parsed file, and then run ConvertToCSV.java on the file output by the cleaning process. As the files are huge, this process avoids storing everything in memory (which may cause your JVM to become unhappy with its Heap size).
On first look, it is more complete than any other place dataset that I have come across. Even beats OSM here in Zürich, which up until now was perhaps the best open source dataset of places.
NOTE : requires the google JSON java parser, available here http://code.google.com/p/google-gson/downloads/list