Sunday, November 25, 2007

A Review of Geocoding Providers

We do a lot with geolocated information and maps, and I asked John to provide his review of the various geolocation services that he has had experience with.

===============================

Geocoding providers come in various flavors from the highly value added to barebones geo location. Thank goodness for the US Census Bureau and their effort to geolocate the census data and offer it to the public for free. This has resulted in a number of vendors offering nearly-for-free Tiger line (Census) geolocation capabilities. That fact has brought prices down a LOT. A good example is http://geocoder.us/.

Having thanked the Tiger line vendors for bringing prices down, I have to say that the Tiger data is not particularly accurate (80-85%?, not sure). But, if you have no other choice, you can still get 80+% of the homes, businesses, or other information geocoded so that you can display them on maps with this data. That's good.

When investigating geo-coding services, there really are only a couple of source data. The tiger data is sort of the base line. There are also several private companies that create private mapping data; the two largest, I think are NavTEQ and Teleatlas. You've probably seen both of these names on various on-line maps around the Internet (Google, Yahoo, etc.) Having worked with both datasets, I can say that I prefer the Teleatlas data and I believe that it is of slightly higher quality (that's a personal opinion, but one grounded in experience), but both of them will give you results in excess of 95% usually. I cannot imagine how much effort it must take to send out cars with geocoders to continuously drive all the new communities in the US so that we can get quality mapping. The third solution is Microsoft's Mappoint. I'm not sure if they are developing their content or are acquiring it from a vendor. What I can say is that my experience is that the mappoint geocoding is inferior to TeleAtlas and NavTEQ.

So, the next question should be: Where can I buy this data? Here is a short list of vendors that I have used and can review. I tend to only go with very high quality providers as this is really important data for me.

ESRI - This is the granddaddy of mapping technology. They create mapping interface software (ArcView), though they don't create the lat/lon data for US Roads and instead license it from vendors like NavTEQ and Teleatlas. These are the folks who brought us the Shapefile for GIS. They have a web SOAP service that you can use to geocode addresses with either the NavTEQ or Teleatlas data (your pick). The cost is around $1200 for 100,000 addresses, and has to be renewed annually. If you're creating data for a large metropolitan area, 100,000 addresses isn't that many, but if you're in a rural area, then this is probably significant overkill. They run a first rate service and at a cost of 1.2 cents per record, it is quite reasonable especially given the quality of the company.

Melissa Data - This company seems to specialize in data validation and enhancement. I used their SOAP interface for geocoding addresses for several years, and they do a great job of breaking down the addresses into the component parts, returning standardized data. They even have calls to do reverse lookups on addresses, phone numbers, etc. They provide tools that allow you to improve the quality of one's data, and that's why I took an interest in them. I don't use them any more because I ran into a number of customer service issues and I also thought they had not adjusted their prices in light of ESRI and Teleatlas getting into the geocoding business. Customer Service issue: I got calls from the company saying that they were going to imminently cut off my service due to overuse, when, in fact, their system for counting usage was messed up. I did not like the effect of such calls on my blood pressure. Cost: For 50,000 addresses, they used to charge $1500 (as I recall). Now, if you need absolutely top-notch scrubbed data with enhanced address standardization, it might be worth spending nearly 3 times as much for your geocoding. But I just wanted accurate plotting of my addresses with Teleatlas data (which they can provide). However, we write our code in ColdFusion, and they had easy-to-implement examples to interface our ColdFusion programs with their SOAP services. It was a piece of cake. I'd probably still use them if their prices had adjusted to the market.

Teleatlas - In the 2005-2006 timeframe, these folks created an access via SOAP. However, as indicated above, I used Melissa Data's Coldfusion interface. Over the past couple of years, they've expanded their documentation and now have examples in Perl, Python, Java, etc., though still none in ColdFusion. So the time had come that I wrote my own Coldfusion implementation of their interface using their Java classes (actually, I hired a guy in Romania who wrote the code inexpensively and then I just adjusted it to create a custom interface). I just turned it on this past week, and I must say that it is REALLY fast, and their SOAP interface now provides Census data pointers as well as standardized address information and lat/lon data for the address sent to them. Cost: about 1.5 cents per record, BUT you don't have to commit to large quantities if you don't want to do so. You'll pay a little more if you only want to charge your account with 5,000 address calls, but this is good data and it's quite reasonable in expense. For more information, go to WWW.GEOCODE.COM

Google - Recently Google made its geocoding interface accessible. For years, you could not geocode addresses, at least not directly and get the lat/lon returned. THIS IS FREE! And it appears to be very, very good quality, and has an exceptional address parser. They do limit you to 50,000 pulls per month on your account. That sounds like a lot, but if your web site is also displaying their maps (each of which constitutes a pull), then you have to be careful.

Yahoo - Yahoo blundered a few years ago by not making their mapping stuff available to businesses for commercial sites that were exploring how to use mapping (i.e., you could use it if you were an engineer developing your personal web page, but if you were the IT director of the company, you couldn't tell your engineer to use it on the corporate web site). This resulted in thousands of folks (like us) turning to Google's mapping technologies. I saw the recent presentation at the NAR meeting where the Yahoo guy demonstrated how you could integrate their local data in their maps, and it was impressive (though not for the technical initiate). I mention them only because they now offer their geocoding service for free, but I cannot review it because I have never used it.

Microsoft - I checked into using the Mappoint servers a few years ago, but at the time I thought they were expensive and worked only through value-added resellers (i.e., cost-increasing vendors). I was not overwhelmed with the Mappoint dataset, so I did not investigate this further. They do offer some cool mapping technologies now (a few years later), so this might be something to check out.

RECOMMENDATIONS:

If you have no budget, use: 1) Google; 2) Yahoo; or one of the low-cost providers. If you have bulk files, go with one of the low-cost Tiger data people and that will get you 80% there.

If you have a modest budget with modest needs and want very high-quality commercial data, then you could go with Google (if you are not using their maps or the combined usage is under their limit requirements) or you can go with Teleatlas due to the low up-front cost requirement.

If you can spare over $1000 per year, then Teleatlas is a good choice. I can also recommend ESRI because they have all kinds of bells and whistles and are a tad cheaper than Teleatlas (even though they sell the Teleatlas data), but I don't remember that they provide the Census block info like Teleatlas, so use care if that's important. They have nice clean examples for most development environments. If you need some of the data cleanup and validation tools and are willing to spend at least double the price, then MelissaData has good servers and they have some good tools; but you're going to pay for what you get.

I hope this is useful for you. It's based on five years of working with these various vendors. Write us if you have questions.

John Hokkanen

Encinitas and Carlsbad Real Estate and San Diego County Real Estate


No comments: