6/00 draft -- not for quotation
Lower Christina TMDL Project
John Mackenzie


The lower Christina Watershed is comprised of three principal basins drained by the White Clay Creek, the Red Clay Creek and the Christina Creek. These creeks converge near Churchman's Marsh, becoming the Christina River. The upper Christina Watershed is drained by the Brandywine Creek, which merges with the Christina River in Wilmington, DE just before the Christina empties into the Delaware River. Detailed background information on the watershed is available from the Delaware Water Resources Agency.

Surface water flows from the Christina Watershed provide most of the drinking water for Northern New Castle County, DE, and surface water quality is a continuing concern. This study uses a digital elevation model (DEM) to analyze drainage patterns within the lower Christina Watershed. It simulates runoff accumulation from various land-uses, and indexes the relative runoff contributions from each land-use category.

The objective of this study is to develop a statistically robust method for determining empirical pollutant loadings from various land-use category. In theory at least, it should be possible to regress pollutant concentrations recorded from a network of N sampling stations against relative acreages in each of M up-gradient land-uses. Let Yi represent the pollutant concentration recorded at sampling station i, and Let Xi1 ... XiM represent the percents of total land area draining to sampling station i that are in each land-use category 1...M. Then the regression

Y = B1X1 + ... + BMXM + e
will yield empirical estimates of average loading coefficients B1...BM for each land-use category.

The main sections of this report discuss various statistical and modeling issues that arise in an empirical implementation of this model. The first section illustrates the computation of a series of grid-based drainage accumulation maps, one for each land-use category, so that the vector X, the proportions of upstream land uses draining to a monitoring station, is defined for every grid cell in the watershed. This approach allows for monitoring station locations to be located anywhere in the watershed.

The next section discusses optimal strategies for locating a network of water quality monitoring stations within the watershed. To maximize the statistical efficiency of the regression model, the relative proportions of upstream land uses should vary widely and independently across the monitoring stations. A cluster analysis of the set of drainage accumulation maps assigns each pixel to one of an arbitrary number of statistically distinct categories. A Chi-square criterion is used to identify the cells that are assigned to each category with a high degree of certainty.

Finally, data collected across a stream network will typically exhibit spatial dependence patterns: downstream pollutant concentrations are logically correlated with concentrations in up-stream tributaries. The efficiency and (in some cases) unbiasedness of the classical ordinary least-squares (OLS) regression model depend on assumptions of data independence. Violation of this assumption implies OLS is inefficient and possibly unbiased. This section discusses methods for analyzing stream topology to formulate an appropriate spatial weight matrix to account for the spatial lags inherent in the data.


The base data for this project include:

All data are in UTM Zone 18 (NAD 1983) projection. The analysis was conducted at 30-meter cell resolution within the geographic region 425506W, 456206E, 4417675N and 4380925S (1,225 rows x 1,090 columns).

The GIS used in this analysis is GRASS 4.3, obtained from Baylor University's Center for Applied Geographic and Spatial Research. This analysis was conducted on a Pentium PC running Red Hat Linux 6.1.

Computation of drainage accumulations

The GRASS r.slope.aspect module was used to create an aspect map used to provide the hill-shading effect in a few maps incorporated in this report.

The GRASS r.watershed program calculates basin delineation and flow accumulation maps from a DEM, as well as other derivative maps. A preliminary low-resolution watershed analysis was conducted to determine the appropriate geographic region for the high-resolution analysis and mask out cells more than 200 meters outside the watershed to speed subsequent processing.

Since a DEM is simply a model of the terrain interpolated from hypsography (elevation contour) lines, it is often helpful to edit the DEM for maximum consistency with other data sources. Since the source hypsography data did not include zero-elevation (shoreline) contours, these were extracted as vector perimeters of open water features extraced from a mid-IR satellite image band file. (Open water features are readily identified by the absence of IR reflectance.) The shoreline data were incorporated with the other hypsography data prior to interpolation of the DEM's.

The DEM was then edited to insure consistency of the drainage analysis with actual stream data derived from USGS 1:24K-scale hypsography data. Vector hypsography for the West Grove, Kennett Square, Newark West, Newark East and Wilmington South quads were patched together and rasterized as a 30-meter resolution binary map. The principal stream network was isolated via a mask generated by pixel expansion (r.grow) and assignment of unique ID's to each pixel clump (r.clump). I then used r.buffer to create a map of concentric buffers at 30, 60, 90, 120 and 150 meters distance from the principal stream network cells, then "burned" the stream channels into the DEM with the r.mapcalc expression

w.demedit = 'if(w.streambuf, w.dem+w.streambuf, w.dem+6)'
This modified DEM basically forces the r.watershed program to follow existing stream channels.

A final correction involved filling in all localized depressions in the DEM with the GRASS r.basin.fill module, to insure all cells in the map drain cleanly off the DEM.

The r.watershed program includes an option to input a flow map of quantities to be drained off the DEM surface and accumulated in the flow accumulation map. If a flow map is not specified, r.watershed simply flows a map of 1's off the map, simulating a uniformly-distributed rainfall event on an impervious surface. The logarithm of the default accumulation map plus one shows the overall drainage system quite clearly.

For this analysis I extracted a series of ten binary maps from the 1995 Land-Use/Land Cover map representing the principal categories except water, and analyzed accumulated drainages from each in successive iterations of r.watershed. The following table includes links to log transforms of these maps:

111 single-family residential37,588
112 multi-family residentiala4,632
12 commercial 7,374
13 industrial 2,167
14 transport 5,715
18 institutional 2,438
19 parkland 9,703
21 agriculture 43,506
40 forest 35,964
50 water 1,333
70 vacant 871
75 mining 19
99 unclassified 40

Each of the above maps shows the logarithm of the number of up-gradient cells of a particular land use category that drain through each cell in the watershed. the default color scheme is yellow-green-cyan-blue-magenta-red.

The percent of up-gradient area in each land use is then calculated by dividing accumulation from landuse i by total accumulation:

CODE Percent of Drainage in Land Use i
111 single-family residential
112 multi-family residentiala
12 commercial
13 industrial
14 transport
18 institutional
19 parkland
21 agriculture
40 forest
70 vacant

Determination of a statistically efficient spatial distribution of monitoring points

In practice, the locations of monitoring stations are often determined by convenience of access. Since monitoring can be expensive, there is also a natural bias favoring downstream locations that effectively monitor runoff from larger drainage areas. Unfortunately, larger drainage areas tend to have more homogeneous land-use proportions than small drainage areas, so that a statistical decomposition of aggregate pollutant loads by land-use category is likely to be confounded by high levels of multicollinearity.

The drainage accumulation maps calculated in the previous section can be analyzed to identify monitoring points within the stream network that will maximize the statistical efficiency of the regression model. To derive a statistically efficient sampling frame, I masked out all cells with log(accumulation+1)<5 and performed a cluster analysis on a sample of remaining cells in the stream network to derive statistically distinct drainage "signatures" for an arbitrary number of cell categories (6).

I then uses a maximum likelihood classifier to assign all stream cells to one of these distinct categories. The stream network classification map shows the full classification of stream reaches.

The maximum-likelihood classifier also generates a rejection threshold map from a Chi-square test of each cell's category assignment. A final sampling frame map shows category assignments (colors yellow, green, cyan, blue and red) for stream network cells with probabilities of incorrect classification less than 0.10. Cells with probabilities of mis-classification are shown in black.

Spatial dependence in the stream network model