WKT Itch

Originally in GeoSpeil.

Eric Raymond says that open source software comes from developers “scratching their itch” — an unpleasant bodily metaphor for “solving their own problems” — a side effect of which is software being released and usable by everyone.

My current itch is attacking the “well-known text” (WKT) representation of spatial reference systems. In theory, an Open Geospatial Consortium standard spatial database like PostGIS stores spatial reference information in the SPATIAL_REF_SYS table, and the actual information about reference system parameters is serialized in a “well-known text” string, stored in the SRTEXT column of that table. In practice, what PostGIS really uses for coordinate transformations is a PROJ4TEXT string in a spare column of the table, and the SRTEXT is just window dressing — we carry it, but we don’t use it.

Using the WKT representation directly is attractive, because it drops needless duplication of information and allows more direct interoperation with things like ESRI “prj” files, which are themselves just WKT serializations. Unfortunately WKT is not as “well-known” as the name would have us believe. Every vendor has used slightly different naming for things like projection operations, parameters, datum names, and so on.

So my itch is multi-fold: I want to be able to parse WKT, I want to learn the technologies necessary to parse WKT (bison and flex), I want to be able to standardize WKT (to strip out the vendor-specific bits) and I want to be able to turn my parsed form into PROJ4 projection objects, because I’ll still be using the PROJ4 engine for transformations at the end of the day. I’m an itchy guy.

So far, I have achieved the parsing and learning-how-to-parse goals, and placed my results in a spike in the PostGIS SVN repository. Next up is standardization, and finally creating PROJ4 objects. Then I’ll try to hook the whole thing into PostGIS.