Rearranging The Deckchairs

Frank O'Dwyer's blog

Make GeoKit Use the Spatial Features of the Underlying Database

GeoKit is a great rails plugin and ruby gem for dealing with geodata for mapping applications. I use it on the backend for the Word on the Street app) (link is to iTunes). However, with millions of mappable objects, geokit’s performance takes a nose dive. For example, this query finds all places within 10 miles of a given point: Place.find(:all,:origin=>latlng,:within=>10, :order=>'distance asc') On a database of about a million rows, that takes about 14s on a mac mini. And, according to the explain plan, the right indexes are being used, yet mysql is nonetheless examining 87554 rows - even though the number of places in the result is 1135 (and the number of places actually within the bounding box that geokit applies is just 1323). I spent a little time finding out why this is. GeoKit adds a bounding box to speed up queries together with a combined index on latitude and longitude. This helps, and the index is used, however it doesn’t help as much as it should. It turns out that the database (mysql in my case) is trying to search using two ranges (latitude and longitude), using a BTree index which is not suitable for this kind of query. This is not specific to geo based queries but actually applies to any two range query (e.g. age and height in a DB of the general population. From an answer to a question I asked on stackoverflow, by Quassnoi:

Plain B-Tree indexes are not too good for the queries like this. For your query, the range access method is used on the following condition: places.lat > 51.3373601471464 AND places.lat < 51.6264998528536 , this doesn’t even take lon into account. If you want to use spatial abilities, you should keep your places as Points, create a SPATIAL index of them and use MBRContains to filter the bounding box

Following this suggestion does indeed massively improve performance (the same query executes in less than a second if spatial indexing is used). However, getting the spatial stuff to play nice with rails is a bit of a pain. After a lot of trial and error, I came up with the code below so I thought I’d share it in case it is useful for anyone else. Note: this code requires GeoRuby, the spatial_adapter plugin, MySql 5.0/5.1, and rails 2.3.2. It should be possible to adapt to other databases and versions however I find that spatial_adapter doesn’t work with rails 2.3.3.

Desired interface

Rather than patch geokit I decided to implement a named scope called bounded that would allow manual addition of a second bounding box to queries, such that the DB would correctly identify the spatial index as the one to use. Given a mappable object (lat and lng), and a radius, the named scope works out a bounding box to apply to the query (much the same as geokit does internally, except this one will hit the spatial index). So, for example, to redo the above query: Place.bounded(latlng,10).find(:all, :origin=>latlng, :within=>10, :order=>'distance asc') The :origin and :within parameters are actually a little redundant in the above query, if all you wanted was a bounding box search, but this is just to show that all the usual geokit options are still available. In fact, geokit also adds its own bounding box but this doesn’t hurt. Plus in the case above, Geokit also adds the trig calculations to make it a true radial search. It would also be possible to patch geokit to apply the spatial query, and then the api would be more elegant, but then this would need to be redone for every new release of geokit.

Migrating the database to use spatial

To add the necessary DB bits to get at the spatial features, add this migration for your mappable model. This is for mysql and uses direct SQL on the database, however you could instead use the spatial adapter features to make the migration more readable and database independent. I did it this way because I found that the geometry/spatial stuff is very sensitive to the mysql version - the method below is tested to work with mysql 5.0 and 5.1 class ConvertPlacesToSpatial < ActiveRecord::Migration def self.table_engine(table, engine='InnoDB') execute "ALTER TABLE `#{table}` ENGINE = #{engine}" end def self.add_spatial_to_mappable(table) execute "ALTER TABLE `#{table}` ADD geom GEOMETRY not null" execute "UPDATE `#{table}` set geom=POINTFROMTEXT(CONCAT('POINT(',lat,' ',lng,')'))" execute "CREATE SPATIAL INDEX index_#{table}_on_geom on #{table}(geom)" end def self.up table_engine :places, 'MyISAM' add_spatial_to_mappable(:places) end def self.down remove_index :places,:geom remove_column :places,:geom table_engine :places, 'InnoDB' end end The above migration changes the table storage engine to MyISAM, which is needed for true spatial indexing (but MyISAM also has some drawbacks - e.g. no transactions). The migration also sets up a geometry column and initialises it from the lat and lng columns of existing rows in the DB, then creates a spatial index for fast bounding box queries.

Adding the named scope to your ‘acts_as_mappable’ rails models

Finally, at the rails layer you need some ‘before_save’ code to keep the geometry column in sync with changes to lat and lon. This is also where you implement the named scope for adding the bounding box (again, you could instead hack geokit to do it - I did it this way so that I could update geokit later without having to patch it again). Below is code that I added to my mappable models to make this work. You’ll need the GeoRuby gem and the spatial adapter plugin for this. I also find that at the time of writing the spatial adapter plugin only works well with Rails 2.3.2, and not at all with 2.3.3, so this may not work with earlier versions of rails. require 'rubygems' require 'geo_ruby/simple_features/point' include GeoRuby::SimpleFeatures def Place ... named_scope :bounded, lambda { |latlng, radius| { :conditions=> "MBRContains(#{boundsLineString(latlng,radius)},geom)" }} before_save :spatialize def self.boundsLineString(latlng,radius) bounds=GeoKit::Bounds.from_point_and_radius(latlng,radius) return "GeomFromText('LineString(#{bounds.sw.lat} #{bounds.sw.lng},#{bounds.ne.lat} #{bounds.ne.lng})')" end def spatialize mygeom=GeoRuby::SimpleFeatures::Point.from_x_y(lat,lng) self.geom=mygeom end end (The boundsLineString helper method could be moved somewhere else to avoid repeating it for every mappable model) And that’s it! Just use the ‘bounded’ named scope on any queries that are running slowly, so that the search is limited to a particular centre and radius, and you should see dramatic speed gains.