GeoRSS in the wild

I've been working today to try and get Drupal's GeoRSS module listening to more than just the deprecated Aggregator2 module to extract geographic locations from aggregated feed items.

The Feedparser module is the next on the list to support and as far as I'm aware is the only one of Drupal's aggregation modules to use an external parsing library, SimplePie. By using an external library it means that we don't need to deal with the sometimes complex task of parsing different types of feeds on the Drupal side, which is a bonus because efforts can be concentrated elsewhere whilst keeping the code nice and simple.

SimplePie is still in development stages but appears to have a good community around it as well as a couple of active developers. They're gearing up to their 1.0 release which includes functions to extract geodata from feeds using the W3C Geo and GeoRSS Simple encodings, the former being the most widespread of methods at present and the latter the one we should be moving towards using.

SimplePie's code is very much based around namespaces (e.g., xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"), which a lot of other aggregator systems will often disregard in favour of the simpler method of parsing out just the individual element names from that vocabulary (e.g., geo:lat or geo:long) to identify the tags. Now that namespaces have suddenly become important (at least for SimplePie's code to work), it's interesting to see how easily overlooked they have been in the past.

Take, for example, the Geograph GeoRSS feed of their latest photos: they had a trailing slash after the GeoRSS namespace URI (http://www.georss.org/georss/ instead of http://www.georss.org/georss as it's defined in the spec). It was there because many namespaces do have the trailing slash, and simply left in by mistake, but because that's not what SimplePie was expecting, it didn't pick up the geodata in the feed. It's been fixed now (Thanks for the quick fix Barry!). There is also the case of the Flickr GeoRSS feeds that use the wrong namespace URI (using the one for W3C Geo instead of the GeoRSS one). Hopefully Rev Dan Catt or someone else at Yahoo will be able to fix that one up.

Even besides namespaces, some elements are often misused, and possibly the most widespread of those is geo:lon which should in fact be geo:long according to the spec. SimplePie doesn't understand the non-standard one and so can't pull the location information out of the feed. In this case, because it is so widespread, the parsing code should probably be extended to look for the non-standard element if it can't find the standard one.

Anyway, just some random observations of GeoRSS in the wild and how what seem like the smallest of differences can mean that the embedded location information will simply be missed by feed consumers.

If you've got a GeoRSS feed from your site, please do the right thing and make sure it's sending out the right information :)