It’s a bit of a boring title, but it actually has been an interesting week. Although I found myself highly distracted by some unrelated but fascinating things, I still managed to get quite a bit done.
The script that I used to download the SRTM data set and import it into a Postgres database can now deal with all continents and supports uploading a subset of a continent by means of a bounding box. I also put the md5 check sum of every tile in the source code.
Since the App Engine still has some issues, I have revived the Postgres version of my application. It is located at http://altitude-pg.sprovoost.nl and contains most of Europe as far east as Moscow and as far south as Cyprus. It runs on my home computer in The Netherlands, so please be nice to it. I use apache-mod-python for the formal demonstration website and apache-mod-wsgi combined with web.py for the altitude profile server. To make this as painless as possible, I have moved all App Engine and Postgress specific code to their own files and kept as much common functionality as possible in the main file. I can now run the development servers for both Apache and the App Engine from the same source code folder, at the same time.
I have requested more storage space on the App Engine and I am also considering a more efficient storage method. In stead of storing one altitude per record, I could store 100 altitudes per record and zip them. That would drastically reduce the total storage requirement, but at the cost of performance because I often need only about 2 out of these 100 altitudes.
I have also been a bit more active on their mailing list; it feels good to be able to answer peoples questions and at the same time it allows me to verify my own code and design. There are also some interesting albeit more philosophical discussions on the list.
I have signed and fulfilled a pledge to “spend an hour OpenStreetMapping features on Caribbean islands from Yahoo! aerial imagery and […] donate £10 to the OpenStreetMap Foundation but only if 60 other people will do the same.”. I felt like I could really use another jet-lag. The pledge is full, but who knows, if they can rally another 60 people there might be a second ticket?
Those of you who laboriously follow every commit to the OpenStreetMap subversion repository, may have noticed that I am still struggling with git-svn. I got really tired of fixing conflicts, so I unleashed the power of git-svn set-tree:
git-svn set-tree -i trunk 3cb585dca1d7fe10791312ca26125168506b61c1 git-svn set-tree -i trunk 07c9024f5ea4ce60f481b8089b61d4988e7588fa
Even the manual recommends against doing this, and you should make sure nobody else (like your mentor) has committed anything to subversion before you do this.
I find git-svn to be harder to use than it should be. I think it is trying to hard to properly translate between The Git Way and The Subversion Way. I just want the subversion repository to ‘sort of’ track my git repository. I don’t care if it has to represent the history a bit different. Just keep the code up to date. I am looking forward to this command:
git-svn just-do-it
I really think Git would benefit the OpenStreetMap community, because it reflects the decentralized nature of OpenStreetMap. With Git, there is no such thing as a central repository. People can write any code they like without having to live in constant fear of breaking the trunk with their next commit. In stead, when they build something cool or useful, they will tell their friends to pull it in and experiment with it. The person who operates a production website will only pull pieces of code that he or she considers safe and useful enough.
But the reality is that many organizations rely on subversion at the moment and have excellent reasons for not risking their operations by making an instant jump to Git. So people are not going to adopt Git very quickly as long as it is so hard to sync with subversion. But lets wait for a while and see…
I am getting better and better at keeping my git repository synchronized with the osm subversion, but I would not recommend this strategy to others.
I created a project on Google Code Hosting project for the altitude profile. Not to host the code, not even for the wiki, but just to keep a list of issues. I realize I could have applied for a place on the OpenStreetMap Trac, but I want to use Google Code Hosting for my new project: Jobtorrent. This is also the reason most of the issues point to the Git source (I do point to subversion on the main page and the only reason I do not always point to both is that I am lazy). I will write more about Jobtorrent later; first I need to work on my Summer of Code project you know…
This list of issues should be good for continuity. Because my project does not interact with any OpenStreetMap code at the moment, I am probably the only one in the community who knows how the code works and what needs to be improved. That is a very low bus factor! (“tram factor” would be a better term in Melbourne) Now I really like the OpenStreetMap effort and I will certainly find ways to stay involved in the future, but it might be in a completely different project. Depending on circumstances, I should at least prepare for the possibility that the altitude profile project will be orphaned within a few months.
I use a personal organizing method inspired by the book Getting Things Done (David Allen) and that makes it very easy to transfer everything I am working on or thinking about to the Internet. So that is what I did.
The more difficult part is keeping it synchronized. David recommends that you never share your projects. That is, you should always keep your own lists and let nobody else touch them. Your lists must reflect what you want, or you will start to rebel against them and as a result mess up your system.
So in practice you will end up with a central list (e.g. the list of issues on Google Code) and your local copy of it. They will not be the same. There are a couple of things on my personal list that are not online (nothing ground braking, don’t worry) and my own priorities are not identical to the ones online. The online version reflects what is important for The Project, the offline version reflect what is important for me. At least in theory; as long as I am the only one working on it, it probably reflects my opinion a lot better than it ideally should.
Now I am pretty sure the average recruiter looking for a “true team player” does not like what I just said in the last paragraph.
Sjors,
I am glad that I found this project you are working on – I was just going to sit down and write something very similar myself for a little route planning tool I have been working on.
I had been wondering how best to handle the SRTM data – whether to use a database as you have or just make a big binary file to use as a huge array to try to speed it up – I was worried that trying to do lots of queries (“give me the nearest measured point to position X”) as I traverse the route would thrash the database with such a huge dataset. I thought I could make the big binary file idea work as the data is effectively a big grid of heights.
Do you find that Postgresql can cope alright with such a lot of data (and are you running it on a very serious computer? – I am trying to use a minimalist low power one – a 1GHz Via C7).
Cheers
Graham.
Hi Graham,
Glad to hear that!
In retrospect, a big binary file (or several) might have been a better approach for this project. I was surprised to see the per record overhead that using a database creates [0].
The good news is that I recently abstracted the database part. There is a class Database that has only one method: fetchAltitude(position). So if you are in the mood, you could write a class that accesses the data directly from a file and it should work fine with the rest of my code.
Michael Kosowsky from http://www.heywhatsthat.com/
wrote C++ code to read the .hgt files directly rather than
through a database.
[0] http://code.google.com/p/route-altitude-profile/issues/detail?id=1
Sjors,
Thank you for the pointer – I will have a look at reading the data files directly, but I think I will get your database method working first, so that I only break one thing at a time!
One thing I am curious about is what the ‘position’ is in your database – it seems to be a bigint which is a combination of the lat and lon – is this effectively the position of the data point in the raster file? I am struggling to understand what
def posFromLatLon(lat,lon):
return (lat * 360 + lon) * 1200 * 1200
does.
Could you give me a pointer please?
Cheers
Graham.
Hi Graham,
I have tried to explain it somewhat in the comments of the function posFromLatLon(lat,lon), but it could use a bit more clarification.
There are several things going on in that function. First it finds the correct tile. Then it figures out where your coordinates are within that tile. This will probably not be directly on top of a data point, so in stead it will give you the four closest points and the distance towards these points.
The application then fetches the altitude of these four points and performs bilinear interpolation.
At bit more background about the index:
As you have guessed correctly I use a bigint to create a unique index for every SRTM data point. The original data set consists of lots of tiles, each of which is exactly 1 degree longitude by 1 degree latitude.
Now here comes the more difficult part. Each tile has been given a name (e.g. “N45E001.hgt.zip”) that represents its bottom left (south west) corner. However the file itself reads like a book: it starts in the top left corner. So the first column of the last row is located at 45′ north and 1′ east and the first column of the first row is located at 46′ north and 1′ east.
I have created an index such that the tile “N00E00” has index 0. Tiles are 1200 rows by 1200 columns, so the tile directly east of it (“N00E01”) starts at position 1200 * 1200 =1440000. The tile directly west of it starts at position -1440000. Since there 360 degrees of longitude, the next latitude starts at 1200 * 1200 * 360.
Once you know where a tile starts, you just need to know that the tile content starts at the top left corner and reads like a book.
Sjors,
Thank you for the advice. I had a bit of trouble downloading the SRTM data, so I ended up getting it from a different source as 5deg GeoTIFF files. To confuse me, the origin of these files is top left, not bottom left as the HGT files.
I have had a quick go at putting together a simple server to return the height of a particular point using this data – I read all of the files into arrays in memory (only the uk is covered, so you need to use positions like (54,-1) – it is then a simple matter of checking which file contains the required data point, and grabbing it from an array (I do not do interpolation yet, so it is not precise.
It should be visible to the outside world at http://maps.webhop.net:1281/ if you are interested. The obscure port number is because it is a pure python server rather than running via apache, so I didn’t want to use port 80.
My real intention is to accept a GPX file describing a route, and return an elevation profile of it – a job for tomorrow evening….
Graham.
There seems to be some sort of weird issue with their FTP lately. Try the following link:
ftp://e0srp01u.ecs.nasa.gov/../srtm/version2/SRTM3/
You would have to change the “.gov/srtm” part in my download script to “”.gov/../srtm”:
http://github.com/Sjors/srtm2postgis/