Thoughts on how to optimize geographical queries for a better user experience in location based services and applications.
Usually photos will be taken cluttered within the same geographical area due to one of the following reasons:
- A single user uploads a lot of photos (which is fine!)
- Something "big" is happening and many users are taking photos of it
- There is a huge event (say a rock festival with 100.000+ people for example) and tens or hundrets of people are taking photos at the same time
- Densly populated regions produce more photos, views and votes than the countryside
|James T Kirk has trouble with the number of Tribbles|
The result is that when you look at the 'recent 100 photos' list, the photos are likely to be from the same event or users. Needless to say, they will all appear cluttered in one or just a few heaps just like Captain Kirks tribbles.
When you look at recent photos from the world on a map... which option would you find more interesting?
a) Showing 100 photos from the same mass-event (all within 1 minute)
b) Show the 20 latest images from each of the 5 continents (all within 15 minutes)
Most people would find option b) more interestesting even though technically option a) could be correct if it gets the newest 100 photos. Option a) is also simple to implement from a database perspective:
set row_count 100
select * from photo where sw_lat>-90 and sw_lon>-180 and ne_lat<90 and ne_lon<180 order by created_time desc
Note: The lat/lon criteria is unnecessary when fetching at the world-level but for any other zoomlevel it is required to define the queried map-region.
Fetching a) is easy but not really what the user wants to see.
When you look at a list of recent photos on a world-map you will want to see a nice mix of photos from different continents, countries, cities and districts. In other words: you will definately not want to see a real latest-list but a list with 'recent enough' photos from different areas around the globe. This is the approach Ibify uses.
|Option b): Map with evenly distributed items|
From algorithm-perspective this does present a problem. How do you get "recent enough" photos that are evenly scattered around the map?
If it was just about fetching photos at the whole-world zoom level you could add the continent to the criteria. That however won't work at other zoomlevels. A solution is required that works at all zoomlevels and area sizes - be it world, continent, country, city or a football field sized area.
They say that if you have a big problem then break it up into smaller problems and solve them first. Thats what Ibify does in it's algorithm when fetching photos from any area: Break the area up into a number of smaller rectangles (15 in the below example image) and perform the same photo-query for each rectangle. This way the latest images for each rectangle will be fetched for each of the (15) rectangles. This would result in 15 queries for the below example.
|Map is split up into 15 blocks|
15 queries? Sounds silly? It is.. but only if you use a traditional central database with blocking queries. When using a distributed database like the one used by Ibify this technique will become extremely efficient. When you execute 15 asynchronous none-blocking queries in parallel the operation will be just as fast as doing a single query.
Another great boone when using this technique is caching. If you use a method where the rectangles bounds are stable (like when using geohashing which is a very elegant solution for this) you will have a great opportunity to cache the content of the rectangles - which is usually difficult to do when querying location based data using the latitude and longitude coordinates.
Scotty managed to locate all the tribbles and beam them down to the Klingons - he must have been using the "break down & execute in parallel" technique too.
NOTE: In the coming days there is some experimenting with ibifys geographical query algorithms so don't worry if you encounter tribbles on the www.ibify.com map or the mobile app.