A Procedure to Evaluate the Effectiveness of Two Watch Warning Systems
The Subcommittee for the Comparative Evaluation of Existing
Heat-Related Watch/Warning Systems and Predictive Models
NOAA Heat Wave Workshop Working Group on Research Requirements to
Better Forecast and Mitigate the Effects of Heat Waves

Bob Livezey
NOAA/Climate Prediction Center
Larry Kalkstein
University of Delaware/Center for Climatic Research
Chris Barnes
Los Alamos National Laboratory
Gib Parrish
Centers for Disease Control

A meeting of the Subcommittee was held on November 6, 1996 at the University of Delaware to develop statistical procedures which would evaluate the effectiveness of at least 2 heat/health watch-warning system approaches. This would be accomplished through a comparative evaluation of heat-related mortality prediction. Participants at the meeting, other than the Subcommittee members listed above, were: Emilio Esteban, CDC, Jerry Libby, Philadelphia Department of Public Health, and Steven Yoon, CDC. An agreed upon plan of action was developed, and is reported below.
An objective evaluation is to be performed on 2 specific procedures which have potential use in heat/health watch-warning systems. The first is an apparent temperature-based system which has its roots in NOAA procedures for heat warnings. The second is an air mass-based classification developed at the University of Delaware which is presently in use by the Philadelphia Department of Public Health, and will be used by the DC Office of Emergency Preparedness beginning summer 1997.
For this objective evaluation, a mandatory requirement is a "level playing field" for both procedures. This includes 3 key components. First, the identical data sets will be used to evaluate both procedures. Second, minimum values for apparent temperatures that must be exceeded for mortality to be impacted will be established (such "thresholds" are already internally determined in the air mass-based procedure). This optimizes the possibility of determining significant linear relationships in the next step. Third, regression equations with appropriate independent variables which evaluate mortality variability will be constructed for both procedures for cases in which thresholds are exceeded. The selection of the independent variables will be done parsimoniously with a pool of no greater than 6, and with the expectation that final regression equations will have approximately 3 terms.
We propose that a sufficient number of cities be evaluated so that all of the climate regions of the country are represented. No less than 20 cities will be examined, all with large populations to decrease the amount of noise in mortality variation. Data will be standardized using procedures recommended by the Centers for Disease Control (CDC). Besides an evaluation of total deaths, several mortality subsets will be developed to exclude those causes which are clearly not related to heat. CDC will assist in the selection procedure of mortality causes. Finally, we will avoid use of data during post-mitigation periods where rigorous attempts have been made to decrease heat-related mortality.
To estimate skill levels that will be realizable in practice, the 2 mortality prediction schemes will be tested on independent data through the use of a fill cross validation. This is a technique whereby data for individual summers are sequentially withheld from the analytical procedures and the resulting test models are then evaluated on the withheld data. For example, assuming a 20-year data set, relationships will be developed on 19 years and the 2Oth year will be withheld for evaluation. This will be repeated 20 times, reserving a different test year each time. Every step in the modeling procedure is redone in this cross validation, each time excluding the evaluated year. Thus, no step in the procedure (threshold determination, regression fitting, etc.) will have knowledge of what occurs within the evaluated year. Forecasts will be made in both categorical and continuous quantitative forms and will be thoroughly evaluated with modern verification techniques.
It is the committee's hope that the results of this test will be used to alter present procedures to evaluate heat/health problems. Further, we believe that this evaluation will provide a demonstration of the necessity for regionally varying criteria for heat-related health alerts. The Working Group on Current Heat Wave Forecasts and Warning Practices at the NOAA Heat Wave Workshop recommended that local National Weather Service offices have the flexibility to apply these regionally-varying criteria, which few currently do. Another demonstration of this evaluation will be the power of regression techniques (or other empirical models) to quantity the level of danger. Finally, this study should provide guidance on which weather-related factors to focus.

Copyright © University of Delaware, 2003 December.
Synoptic Climatology Lab
Comments, suggestions, or questions may be sent here.