I had a raster weather data set in NetCDF format, that I wanted to manipulate in PostgreSQL. Specifically, I wanted to take the temperature and relative humidity data from raster and convert it into a point dataset, in order to do some vector to vector containment queries. My goals were:
- Extract a point from each cell of the raster, using the centroid (resulting in over 51,000 points per band).
- Get Relative Humidity and Temperature data for each point.
- Create a final table that had one point geometry and a column for RH and T.
In order to extract the raster data from NetCDF, you have to know some basic information about the “bands” that you are attempting to access. You can access this information using gdalinfo. In my case, I had a file named “rtma2p5.2015120411.2dvaranl_ndfd.nc”, so I used “gdalinfo rtma2p5.2015120411.2dvaranl_ndfd.nc” to get:
Driver: netCDF/Network Common Data Format Files: rtma2p5.2015120411.2dvaranl_ndfd.nc Size is 512, 512 Coordinate System is `' Metadata: NC_GLOBAL#Conventions=CF-1.0 NC_GLOBAL#History=created by wgrib2 Subdatasets: SUBDATASET_1_NAME=NETCDF:"rtma2p5.2015120411.2dvaranl_ndfd.nc":latitude SUBDATASET_1_DESC=[165x312] latitude (64-bit floating-point) SUBDATASET_2_NAME=NETCDF:"rtma2p5.2015120411.2dvaranl_ndfd.nc":longitude SUBDATASET_2_DESC=[165x312] longitude (64-bit floating-point) SUBDATASET_3_NAME=NETCDF:"rtma2p5.2015120411.2dvaranl_ndfd.nc":TMP_2maboveground SUBDATASET_3_DESC=[1x165x312] TMP_2maboveground (32-bit floating-point) SUBDATASET_4_NAME=NETCDF:"rtma2p5.2015120411.2dvaranl_ndfd.nc":RELH_2maboveground SUBDATASET_4_DESC=[1x165x312] RELH_2maboveground (32-bit floating-point) Corner Coordinates: Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 512.0) Upper Right ( 512.0, 0.0) Lower Right ( 512.0, 512.0) Center ( 256.0, 256.0)
Once I knew the names of the bands that I wanted, I exported each raster band into a separate temporary table, extracted points and values from each of these tables into 2 separate tables with a postgis vector geometry column, and then merged them into a final table with vector geometry column and columns for temp and rh. This multi-table process may seem a bit overkill, but by extracting each raster into it’s own point table, I was able to make use of spatial indexing to speed the merging of these separate tables into a single final point table (note: the intermediate point tables had to be permanent in order to create the spatial index). Below is a full log of my command line and PSQL actions (note, it takes less than a minute for all processing the final insert query to run on a dataset that spans the state of Virginia, 42,000 square miles):
From a bash shell prompt:
# convert temperature raster2pgsql NETCDF:"rtma2p5.2015120411.2dvaranl_ndfd.nc":TMP_2maboveground temp_tmp > temp_tmp.sql
# convert rh raster2pgsql NETCDF:"rtma2p5.2015120411.2dvaranl_ndfd.nc":RELH_2maboveground rh_tmp > rh_tmp.sql
cat temp_tmp.sql | psql rastertest cat rh_tmp.sql | psql rastertest psql rastertest
From PostgreSQL command line:
create table rh_pts_tmp (geom geometry, rh float8); insert into rh_pts_tmp(geom, rh) select a.geom,st_value( b.rast, 1, a.x, a.y) from (SELECT (ST_PixelAsCentroids(rast, 1)).* FROM rh_tmp WHERE rid = 1) as a, rh_tmp as b ; create index tmp_rh_gix on rh_pts_tmp using GIST (geom); create table temp_pts_tmp (geom geometry, temp float8); insert into temp_pts_tmp(geom, temp) select a.geom,st_value( b.rast, 1, a.x, a.y) from (SELECT (ST_PixelAsCentroids(rast, 1)).* FROM temp_tmp WHERE rid = 1) as a, temp_tmp as b ; create index tmp_temp_gix on temp_pts_tmp using GIST (geom);
create table weather_points (fid serial, temp float8, rh float8, the_geom geometry); insert into weather_points (temp, rh, the_geom) select foo.temp, bar.rh, foo.geom from temp_pts_tmp as foo left outer join rh_pts_tmp as bar on (foo.geom && bar.geom and foo.geom = bar.geom);
# Clean up drop table temp_tmp; drop table rh_tmp;