Friday, 6 January 2012

Geocoding: Google vs. Yahoo

The project I'm working on at the moment requires the ability to get geographic coordinates for a location, otherwise known as "Geocoding". Previously you used to have to pay for the ability to do that in the UK (since the Ordinance Survey owned the copyright to the postcode information, I believe). However of late they've made that information open source (you can even download their maps database(s) for free!) which means that there are a couple of free tools around to help you out.

I was after a solution that would let me pass in a pretty unscructured string (I don't want to ask my users for a full address - a postcode like "SW1A 2AA" should be just as valid as "London") and get a result. I also wanted to use a REST service (which basically means you can get your result from a single web URL) for simplicity.

The three main contenders would be Google's Geocoding API, Yahoo's Placefinder API and Microsoft's Bing Map's API. We decided against Bing because of the potential costs involved, so I decided to do a little compare and contrast between Google and Yahoo's geocoding offerings.

Usage / license

The Google API's usage limits allow 2,500 requests a day for free (or you can get 100,000 a day by signing up to a business API model for $10k USD a year) and the result of your lookup must be displayed on a Google map. Yahoo's service offers 50,000 lookups a day and has no such restriction on what you do with the data.

2,500 lookups a day is one every 35 seconds (or so), whereas 50,000 is one lookup every 1.7 seconds! More lookups for free per day from Yahoo ... plus one to Placefinder.

Features

Both APIs offer pretty similar base functionality: Results are returned in JSON or XML formats and reverse geocoding is supported by both (finding an address from coordinates rather than the other way 'round). Yahoo's API's request format gives you a lot more options for what you're searching for. You can specify parts of the address if you want to, and you can use flags to only return the coordinates if you don't want the street address.

So being able to specify exactly what I want back from Yahoo - that'd help reduce the bandwidth used ... that's two up for Yahoo.

Accuracy

Then, however, I thought I'd check the accurracy of the results returned - and I may have found a potential "gotchya". Searching on the same postcode, Google and Yahoo are returning different locations. Time to look deeper (look for the green pins in the links below):

They were all locations chosen relatively randomly. Although there's not too much in it, Yahoo is consistently off the mark, which I have to admit is a little disappointing. It's true I haven't tested any locations / postcodes in more rural areas or tested more general searches (such as "Westminster", "Canary Wharf" or even "London"), but I've run out of time for today.

Besides, the decision on which to go for isn't mine - I'll leave that in the capable hands of the project manager. ;)

Tuesday, 3 January 2012

"Finding nearest" with Umbraco and MS SQL Server 2008

MS SQL Server 2008 includes a .Net assembly for working with geographic coordinates.  It means that you don't have to write custom functions to find the distance between points (have a look at how complicated the maths can be).

So, needing to be able to work with geo-coded items in my DB, I did a little looking around and found a really helpful article on SQL Server Helper.  What I needed, then, was a table with a column of type "geography" that I could query against.

However, this is an Umbraco site, which means that if I used the (really helpful) GMaps data type, I'm going to have my coordinates stored as a string with a comma in-between, rather than numbers I can actually convert into a geography data type (the format of the output is "[latitude],[longitude],[zoom]").  I hate doing string manipulation in SQL Server, so I didn't want to write a trigger to update my geography values.

So - in the end I decided to not bother with SQL Server triggers to update the record.  Instead I created a new class that inherits from Umbraco's ApplicationBase class.  Here I added a method to run on page publish that checks for if I need to add the geography data type.  If I do, it runs a little stored procedure that I added to the DB.  .Net can parse the GMap coordinates into decimal values, then pass those to the stored procedure.

To store the geography values, I created a *really* simple table:


CREATE TABLE [dbo].[PageLocation](
   [pageID] [int] NOT NULL,
   [location] [geography] NOT NULL

All I want to do is store the ID of the Umbraco page and the coordinates.  I'll need to do some other filtering but I plan on using Lucene to do that for me since it's nice and fast and doesn't rely on proper relationships being created in the database (since the relationships are all stored as Umbraco fields, this is tricky).  Then it's just a matter of using the lovely uQuery to fetch the details of the pages that are returned.

I've yet to actually try this all out, but I think the theory is sound  :)

Monday, 2 January 2012

Fetching user location using their IP address thanks to the IPInfoDB API

So, I needed to find a way of fetching a website user's location using their IP address.  A quick Google search found this stack overflow thread that mentioned a few options.  I decided to try out the IPInfoDB API because it's free and seems to do exactly what I wanted.

The code (bear in mind I'm using C#) what pretty simple to write:

I then stored the information in the session (so that I wouldn't need to make lots of requests to the service).

And there you go.  Geo-location information from the IP address.  The API goes down to city level, which is handy, although as expected the location isn't 100% accurate.  My home IP returns my location as being several miles away, but at least it's in the right city  :)

A new year, a new blog

I've been a very bad boy.  Well, maybe not bad as much as lazy and frankly unhelpful.  I've been a web developer for going on a decade and I've not yet started writing down useful bits of information - this is something that really has to change.

So - this blog is where I'm going to keep bits and pieces of useful information that I've found.  I have a feeling it's only going to be of use to me, but hopefully other developers who find this might find something helpful.  :)