OGR FDW Spatial Filtering
18 Nov 2019The OGR FDW now pushes spatial filters down to remote data sources!
Whuuuut?!?!?
The Basics
OK, first, “OGR” is a subcomponent of the GDAL toolkit that allows generic access to dozens of different geospatial file formats. The OGR part handles the “vector” data (points, lines and polygons) and the GDAL part handles the “raster” data (imagery, elevation grids).
Second, “FDW” is a “foreign data wrapper”, an extension API for PostgreSQL that allows developers to connect non-database information to the database and present it in the form of a table.
The simplest FDWs, like the Oracle FDW, just make remote database tables in foreign systems look like local ones. Connecting two databases is “easy” because they share the same data model: tables of typed columns and rows of data.
The OGR data model is pleasantly similar to the database data model. Every OGR “datasource” (database) has “layers” (tables) made of “fields” (columns) with data types like “string” (varchar) and “number” (integer, real).
Now, combine the two ideas of “OGR” and “FDW”!
The “OGR FDW” uses the OGR library to present geospatial data sources as tables inside a PostgreSQL database. The FDW abstraction layer lets us make tables, and OGR abstraction layer lets those tables be sourced from almost any geospatial file format or server.
It’s an abstraction layer over an abstraction layer… the best kind!
Setup the FDW
Here’s an example that connects to a “web feature service” (WFS) from Belgium (we all speak Flemish, right?) and makes a table of it.
CREATE EXTENSION postgis;
CREATE EXTENSION ogr_fdw;
CREATE SERVER wfsserver
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'WFS:http://geoservices.informatievlaanderen.be/overdrachtdiensten/Haltes/wfs',
format 'WFS',
config_options 'CPL_DEBUG=ON'
);
CREATE FOREIGN TABLE haltes (
fid bigint,
shape Geometry(Point,31370),
gml_id varchar,
uidn double precision,
oidn double precision,
stopid double precision,
naamhalte varchar,
typehalte integer,
lbltypehal varchar,
codegem varchar,
naamgem varchar
)
SERVER wfsserver
OPTIONS (
layer 'Haltes:Halte'
);
Pushdown from FDW
Let’s run a query on the haltes
table, and peak into what the OGR FDW is doing, by setting the debug level to DEBUG1
.
SET client_min_messages = DEBUG1;
SELECT gml_id, ST_AsText(shape) AS shape, naamhalte, lbltypehal
FROM haltes
WHERE lbltypehal = 'Niet-belbus'
AND shape && ST_MakeEnvelope(207950, 186590, 207960, 186600, 31370);
We get back one record, and two debug entries:
DEBUG: OGR SQL: (LBLTYPEHAL = 'Niet-belbus')
DEBUG: OGR spatial filter (207950 186590, 207960 186600)
-[ RECORD 1 ]-----------------------
gml_id | Halte.10328
shape | POINT(207956 186596)
naamhalte | Lummen Frederickxstraat
lbltypehal | Niet-belbus
The debug entries are generated by the OGR FDW code, when it recognizes there are parts of the SQL query that can be passed to OGR:
- OGR understands some limited SQL syntax, and OGR FDW passes those parts of any PostgreSQL query down to OGR.
- OGR can handle simple bounding box spatial filters, and when OGR FDW sees the use of the
&&
PostGIS operator, it passes the filter constant down to OGR.
So OGR FDW is passing the attribute and spatial filters from the SQL down to the OGR layer. But are they then being passed on to the remote datasource?
Pushdown from OGR
Every OGR “driver” is capable of pushing different amounts of logic down to the source data.
- A driver that reads a file format cannot push anything down: there is no logic in a file.
- A driver that reads from a database can push a lot down: databases are rich and powerful execution engines in their own right.
Our example data source, the Belgian “web feature server” actually supports both attribute and spatial filters, and the OGR driver will pass them down.
We can see OGR passing the filters down because when we created the server, we set config_options 'CPL_DEBUG=ON'
, to expose the GDAL logging information to our PostgreSQL server.
The GDAL debug entries are visible when we set the logging level to DEBUG2
SET client_min_messages = DEBUG2;
SELECT gml_id, ST_AsText(shape) AS shape, naamhalte, lbltypehal
FROM haltes
WHERE lbltypehal = 'Niet-belbus'
AND shape && ST_MakeEnvelope(207950, 186590, 207960, 186600, 31370);
Now we get a whole slew of logging, but I’m only going to pull out one line, the line that shows the WFS query that OGR sends to the remote server:
DEBUG: GDAL None [0] WFS: http://geoservices.informatievlaanderen.be/overdrachtdiensten/Haltes/wfs?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetFeature&TYPENAME=Haltes:Halte&FILTER=%3CFilter%20xmlns%3D%22http:%2F%2Fwww.opengis.net%2Fogc%22%20xmlns:Haltes%3D%22informatievlaanderen.be%2FHaltes%22%20xmlns:gml%3D%22http:%2F%2Fwww.opengis.net%2Fgml%22%3E%3CAnd%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3ELBLTYPEHAL%3C%2FPropertyName%3E%3CLiteral%3ENiet%2Dbelbus%3C%2FLiteral%3E%3C%2FPropertyIsEqualTo%3E%3CBBOX%3E%3CPropertyName%3EHaltes:SHAPE%3C%2FPropertyName%3E%3Cgml:Box%3E%3Cgml:coordinates%3E207950.0000000000000000,186590.0000000000000000%20207960.0000000000000000,186600.0000000000000000%3C%2Fgml:coordinates%3E%3C%2Fgml:Box%3E%3C%2FBBOX%3E%3C%2FAnd%3E%3C%2FFilter%3E
Awesome, right?
That’s pretty much un-readable, but if I copy out the value in the FILTER
request variable, and reverse the URL encoding, I get this:
<Filter
xmlns="http://www.opengis.net/ogc"
xmlns:Haltes="informatievlaanderen.be/Haltes"
xmlns:gml="http://www.opengis.net/gml">
<And>
<PropertyIsEqualTo>
<PropertyName>LBLTYPEHAL</PropertyName>
<Literal>Niet-belbus</Literal>
</PropertyIsEqualTo>
<BBOX>
<PropertyName>Haltes:SHAPE</PropertyName>
<gml:Box>
<gml:coordinates>
207950.0000000000000000,186590.0000000000000000
207960.0000000000000000,186600.0000000000000000
</gml:coordinates>
</gml:Box>
</BBOX>
</And>
</Filter>
I know, who ever thought that jamming an XML encoded version of a SQL filter into an HTTP GET request was a good idea? (Some very very nice people.)
Anyways, as you can see, both the attribute and spatial portions of our original SQL query have been re-encoded as a WFS XML filter, and sent to the remote server.
OGR FDW correctly pushed the attribute and spatial portions of the WHERE
clause into OGR, and OGR correctly pushed those filters into the dialect of the driver we were using, in this case the WFS driver.
The End
The really really cool part is that if we had been using, for example, the Oracle driver, OGR would have instead generated Oracle-compatible SQL and pushed that down!
It’s an abstraction layer over an abstraction layer… the best kind!