Categories
gsoc

Google Map Maker and OpenStreetMap – My five cents

Update 1-7 : New evidence blows my best-case scenario out of the water, but no worries (see below)
Update 1-7, on hour later : Or perhaps it does not, just keep the popcorn close

For those of you who have been sleeping for the past couple of hours, Google just released a map making application. Now everyone can add streets to Google maps. It is currently only available for a couple of countries where Google has very little map data, but I’m sure they will scale it up in the future. Many people, including me, will probably wonder what that will mean for the OpenStreetMap project.

There is a blog post on BlinkGeo about this:

Will Google play nice and make the crowd-sourced data available for use in applications other than Google Maps and in some common format (hint hint, KML)? Who ultimately should own the data?

The OpenGeoData blog is not very positive about the development. Steve writes:

The fundamental reasons for OpenStreetMap remain intact and if anything are now stronger. At first glance it sounds like OpenStreetMap, until you realise that Google own that data you give them, there’s no community and you are unlikely to see use of the data in ‘creative, productive, or unexpected ways’.

I am personally a bit more positive about the situation. My guess is that because Google set this up in a hurry to help out with the Myanmar disaster relief, they did not have time to think about copyright issues or to communicate with the OpenStreetMap community on how to prevent duplicate effort. They probably had their legal department come up with a standard data licence in a hurry; as Steve noticed:

The terms of use are hilarious – the bottom line seems to be that Google ends up owning excusively the entire aggregate work, but it is your fault if anything goes wrong.

As a side note on duplicate effort, I’ve heard several people claim that the time it took to create Wikipedia, is the time US people spend every weekend watching ads. In other words, duplicate effort in massive online collaboration projects is really not an issue yet. Even if there were a dozen projects like OpenStreetMap and Google Map Creator, each of them could still create a complete map of the world in a few days if enough people join the effort.

Personnaly, I love it when other people take care of work that I was supposed to do. That gives me time to do other things and planning is the most interesting part of most projects anyway. In my opinion, Google does a great job here for the OpenStreetMap community, with just a few as-long-as-they’s.

I like the intuitive interface and the moderation system. I managed to trace a couple of streets in the Netherlands Antilles without reading a manual. They have build in support for moderation. You can select a neighborhood that you know and get a notification when someone edits it; my edits got feedback and were rejected (for good reasons) within the hour. It distinguishes between new and experienced users and even automatically detects when you make a ‘suspicious’ edit.

I would love to see cooperation between OpenStreetMap and Google and I think that would make a lot of sense. Google can provide the OpenStreetMap community with the massive scalable infrastructure they need and OpenStreetMap can provide Google with a community, map making experience and great tools.

Here’s a possible future scenario:

  1. Google opens up the user contributed data using an API and a weekly database dump. It should provide read and write access to the buildig blocks of the street data; e.g. the nodes and ways.
  2. Google switches to a Creative Commons license or something similar. As I mentioned above, I think and hope their current data policy was made in a hurry.
  3. Google imports all existing OpenStreetMap data into their system. Only the parts that are easy at first; roads and railways.
  4. Google incorporates the Map Maker API into their App Engine. Google mentioned during the Developer Day in Sydney that they strive to make that App Engine comply with a future standard to prevent a platform lock in.
  5. People start building their own editing software or porting existing ones like Potlatch and JOSM. They can use the Google App Engine or something similar, so they do not to worry about scalability.
  6. People start building their own map viewing software or porting existing ones like Osmarender.

So in short:

  • Worst case: Google does not collaborate and we’ll have duplicate effort, which is not a problem because there is an astounding amount of disposable man power available on the planet.
  • Best case: Google does collaborate and the OpenStreetMap project will be finished in a year or so.

Update 1-7 : Google Signs Five Year Map Agreement with Tele Atlas

That definitely blows my ‘done in a hurry’ argument out of the water, but they haven’t taken over the world yet. We’re back at what I call the worst case scenario in which Google does not seem to be planning to share their users work.

As I wrote, that should not be a problem for OpenStreetMap, because there is an astounding amount of disposable man power available on the planet.

Apart from that, Google is completely dependent on income from advertisements: Google can not offer any service on a massive scale that does not generate add revenue. This provides several oportunities to the OpenStreetMap community:

1 – it can give anyone access to all data, in any format
2 – it can distribute hosting and serving of this data (Tiles@home is a good example)
3 – it can collaborate with as many partners as it wants (I don’t think Google can easily expand its Tele Altas deal to other map providers.)

The rise of Google Map maker is probably a good thing for OpenSreetMap, because it provides something real to compete against. That makes it easier to identify weak spots and increases motivation to work on them.

Update 1-7, an hour later:

Deal does not apply to Google Map Maker contributions.

You may also be interested in Ed Parsons’ post about this subject. He does not represent the official Google point of view, but he does provide a very useful insight nonetheless.

Categories
gsoc

Google App Engine – On uploading serious bulk data

“The Google App Engine enables you to build web applications on the same scalable systems that power Google applications”, says Google. Sounds like that could come in handy for my route altitude profile. The NASA SRTM dataset has 1.5 billion data points just for Australia and I have no idea how many people will use my application.

In the last couple of days, I’ve worked through the tutorial and some of the documentation and have taken the first steps (on a separate git branch) in making my application run on the App Engine.

The first challenge was – and still is – to insert the data into the data store. The data consists of 1060 tiles of 1 degree longitude by 1 degree latitude. Each tile consists of 1200 x 1200 data points representing the altitude at each location; about every 100 meters. I wanted to insert just 1 tile, which took about a minute with Postgres when using the COPY command.

Google provides a bulk uploader that can be used to upload data. However, I think the author of that script had a different idea of ‘bulk’ than I do.

It is possible to test your application offline, before uploading it; the SDK comes with an offline data store. Of course, the offline data store is not the real thing. After 10 hours, it had only inserted about 10% of 1 tile. I also noticed that after each insert, the next insert would take longer.

So I skipped ahead and tried the real thing online. The good news is that the slowdown effect had disappeared, as I expected. The bad news is that I could only insert about 100 points each time. Any more and the server would kick me. This means I would need to send an HTTP POST request 15.000 times to upload 1 tile. That would take roughly 3 days with my current connection (remember: 1 minute with Postgres). In addition, even though I used small chunks , the connection got broken by the server after a while. That means starting over again, because the bulk upload script does not have a resume function.

Clearly we need a different approach here.

One way would be to improve the bulk upload script. It should be able to resume a failed upload. But that would not speed things up.

Therefore, I think the best solution would be if Google makes it possible to upload a CSV file, so their servers can perform their equivalent of a SQL COPY command.

The next challenge will be probably be data consumption. In Postgres, I store the coordinates in a ‘special way’ as one long int (8 bytes) and the altitude is stored as an small integer (2 bytes). So in theory that requires 10 bytes per data point. In practice we need to index the coordinate and there is probably some other overhead. I estimate (it’s a bit difficult in Postgres) that Postgres uses anywhere between 40 and 87 bytes per data point. Its even harder to estimate it with the Google App data store, but it looks like that uses about 100 bytes per data point. I have no idea if that is good or bad. But at that rate, I would need about 150 GB of storage space and the test version only allows 500 MB. Imagine if I would like to upload the whole world.

I have to run to airport now (or actually to the bus to the airport); tomorrow during the Google Developer Day in Sydney, I will hopefully have the opportunity to discuss these issues with experts. I’ll keep you posted: to be continued.

Categories
gsoc

Demonstration

Example altitude profileI built a simple website that displays the altitude profile for four different example routes. But you can already get the altitude profile for any route in Australia through an XML-RPC request to:

http://bak.sprovoost.nl:8000/

And then call one of the following two functions:

altitude_profile(route) : returns an xml document

altitude_profile_gchart(route) : returns a Google Chart like the one on the left.

The route argument has to look like this:

<route>
  <point id="1" lat="61.8083953857422" lon="10.8497076034546" />
  <point id="2" lat="61.9000000000000" lon="10.8600000000000" />
  <point id="3" lat="61.9000000000000" lon="10.8800000000000" />
</route>

Please be nice to my home computer and if you find any security issues, please tell me.

So what shall I focus on next?

There’s the actual profile:

  • interpolation : when two points in a route are far apart, add extra points between them
  • better altitude estimation : currently I get the altitude from the nearest SRTM data point; I will combine information from multiple points

I hope that with these changes the profiles will look a bit more smooth.

There’s performance:

  • generate lots of example routes; I created these four routes manually. I would probably need to find a way to automatically extract a whole bunch of routes from the Google route planner or something similar.
  • stress test with these routes; I want to get some idea of what kind of hardware & bandwidth requirements are involved.
  • I would really like to get my hands on the Google Apps Engine as that would outsource part of the work to Google. Two problems there: I don’t have an API key yet and I would need more storage space then they currently allow; it would be great if I could store the whole planet, but Australia already needs 123 GB.

Security:

  • Don’t know a lot about that, except that it is very important.

Looks:

  • Something a little more pretty than the Google Chart?
  • Add some ‘landmarks’, e.g. place names, road type.

Code:

  • I’m new to Python, so I want to go over my code and make things a bit nicer: refactoring.

Last but not least, how can this tool best be integrated with other websites? One scenario that I have in mind would be a third party website that uses both the OpenRouteService and my altitude profile application. A user would enter origin and destination. Then the website sends this origin and destination to OpenRouteService (through xml_rpc?), which sends the route back, both as a map figure and as an xml document. Next the website sends this route to the Altitude Profile service (through xml_rpc), which then returns the Google Chart (or an XML document). The website then displays the map and Google Chart for the user to enjoy.

There’s many ways to go from here and although I have plenty of time left for the project, I probably can’t do everything. So what would you like to see next?

Update: More information about my project can be found here.

Categories
Australië gsoc livejournal

Internet Dependancy and the 15-Minute People

This is the second time that I am living in another country for a while and once again I have stumbled into a huge problem that I believe is massively under appreciated by many. I need the internet, but it’s not as ubiquitous as you’d think.

Has anyone seen the South Park episode about this? The entire United States flees to refugee camps on the west coast where there is still a bit of Internet. People wait in a cue all day for only 40 seconds of Internet. Does that sound a bit ridiculous? Well if you are a homeless person without a laptop (and except Japan probably, most homeless people do not have one) and want to use the Internet, you’ll have to cue up at the State Library in Melbourne for 15 minutes of Internet (it takes about 1 minute to load gmail, so do the math…). And that is the best deal in town as far as I know.

When I see these people (probably not all homeless) I get the same emotions that most people probably get when they see hungry people in Africa. Of course, that is not fair or rational, but it’s just easier to emphasize with for me. I completely depend on the Internet for almost everything I do and it allows me to live a fascinating life. But those 15-Minute People, as I call them, have to do everything the old fashioned way: more time consuming, more expensive., less opportunities. For example, they have to make a phone call to book a flight; that adds at least 15 dollars, plus you can’t easily compare fares. They have to find a place to live through newpaper adds; most rooms are shared online, so they miss those. And they don’t have access to all sorts of job search websites.

I can probably find the best price and book a flight in under 15 minutes, but most people can’t; there are many important tasks for people need to sit down and focus a lot longer. 15-Minute People have a serious disadvantage here. How about adding two hours of Internet per day, for everyone, to the list of government responsibilities? Or shall we just wait and see what happens if we don’t?

Anyway, I’ve been one those 15-Minute People a couple of times during my travels. I take the Internet for granted, but I have been without it quite a couple of times at great costs to productivity. It is even worse if the situation is unpredictable; if you do know for how long you will be connected, or how long it will take before you are connected again. This makes it impossible to plan anything.

I spent six months in Slovakia and I would have some weeks with and some weeks without the Internet. At the office! And with no way of predicting what would happen next. Not very practical if your work is Internet based. But I guess people kind of expect that sort of thing in Slovakia (although I think it does not have to be that way).

The biggest surprise in this respect is Australia. I was completely taken by surprise at the terrible state of the Internet over here. I was expecting more or less 99% household penetration of Internet, free wifi at hostels, the usual, but I found the opposite. Hostels will gladly charge you 4 dollars an hour and it won’t be wireless either. Many homes, even with young people, do not have an Internet connection. And in general the Internet is very slow, has download limits(!) and is expensive.

Just a comparison: at home in The Netherlands I pay 30 dollars a month for my Internet connection, plus about 15 for the phone line that you need to have with it. For that money, I can download at 20 Mbit and there is no limit to how much I download. The best deal I could find here, is 50 dollars for the Internet plus 20 for the phone. That gets me 0.5 Mbit download and a 25 GB limit per month. So that is 40 times slower for almost twice the price and you get a download limit as a bonus.

So what is causing this? Well, let’s just say Telstra (the former state company) blames crushing government regulations and most others blame Telstra for acting like a monopolist. I leave the choice to you.

The good news is that people are not happy with that and there is a lot of work going on to upgrade Australia’s physical, and equally important, organizational Internet infrastructure. The other good news is that Australia has a pretty cool system that you can use to quickly switch between providers and that the business of connecting and disconnecting people seems to be a *lot* faster than at home (it took only 3 hours to disconnect from our old provider; these things take weeks in The Netherlands, months in Slovakia).

I hope I will have a working 24/7 Internet connection at home by the time I get back from the Google Developers day, my Sydney excursion and the Burning Couch festival next week. That would great for productivity and peace of mind!

Categories
Australië livejournal

Apartment

Its been a while since my last travel related post. To make a long story short: I spent a week in lovely Canberra and another week in warm Brisbane. I then discovered that the combination of programming and traveling was a bit too much of good thing, so I decided to fly back to Melbourne and settle down for a while.

Canberra, for those who do not know, is Australia’s capital city. It has a lot in common with Washington. It is one of the more successful examples of a Garden City, which contrary to Overvecht, it is not about to be bulldozed.

Home in PrestonI managed to find an appartment in Melbourne in less than a week (and four attempts) through Gumtree advertisements. As you can see in the picture on the left, it is a converted warehouse. Nice old (for Australia) look on the outside, all modern stuff on the inside. I will take pictures of the interior later, but if this is what it looks like when new. Its only $470.000 (€300.000) to buy, or if you’re like me, you rent it with two other people and a cat.

The Melbourne Couch Surfing Group is awesome! It has 917 members and organizes lots of activities. This Queens Birthday weekend me and about 25 others stayed at Lena‘s parents vacation house in the Mornington peninsula. The most amazing thing is that many of them, including me, had never met her before. It was a lot of fun and I’ll post some pictures later.

My plan for the next couple of weeks is to work hard on my project, so I have plenty of free time during my three day (carbon compensated) visit to Sydney next week. I will also find some nice second hand furniture for my room and ‘work’ on my social life.