This algorithm reduces the number of records in a point data set by picking the median height in each cell of a user-defined grid. The picked median elevation maintains its original location. The algorithm was created for ICESat data processing, but it can be applied to any elevation data set.
Compiler: CodeBlocks - Windows
Code and example data: data_decimation.zip
Input file must have a column format, containing with following fields:
- record id (integer)
- x (float)
- y (float)
- z (float)
The algorithm creates two textual outputs. The former is a grid in the ESRI ASCII format that stores the number of records in each grid cell. Open source GIS like Saga or Quantum Gis can read this format. The latter is a file in table format(same as input) that stores the median elevation values together with the original spatial location of the median value.
(Originally posted as Too many elevations on September 14, 2010)
As a practical example, we consider a data set of elevations for a sector of Antarctica, deriving from ICESat satellite data measured over a period from 2003 to 2008. It consists of nearly 2,800,000 data showed in the map below. Data density along satellite tracks is high (every 172 meters), with several different tracks nearly covering each other, while the spacing across track is much higher, for instance in the top-left of the map this spacing is between five and ten kilometers. In many subareas of this zone, the local topography presents periodic height variations (not visible in this map), due to the presence of glacial megadune fields. Megadunes have wavelengths of several kilometers, amplitudes of several meters and lateral continuity of tens of km. Clearly, DEM deriving from these data are much more affected by local structures in correspondence of tracks.
Running this program with the previous Antarctic data set and 1 km cell size takes about 80 seconds on a normal laptop (Windows 32 bit, 4GB RAM). Filtered data are 70,988 from original 2,783,551, i.e. with a 97.4% reduction. Original spatial structure of data is essentially unaltered, as the map below evidences.
The filter additionally eliminates many spikes in the frequency histogram (see figure below, blue line: original data, orange l.: filtered data), probably corresponding to artifacts related the track oversampling of local areas.