## 1. Introduction

The Integrated Urban Air Toxics Study carried out by the U.S. Environmental Protection Agency (EPA 2000) presented a framework for addressing air toxics in urban areas using census tract population centroids for hypothetical receptors. A test example for the Houston, Texas, area was described by EPA (2002), using a straight-line Gaussian plume model, Industrial Source Complex Short Term, version 3, (ISCST3) (EPA 1995). This work addresses the uncertainties in air quality model outputs caused by uncertainties in ISCST3 and American Meteorological Society(AMS)–Environmental Protection Agency (EPA) Model (AERMOD) inputs and parameters for the Houston ship channel area, which is an industrial area on the east side of Houston. AERMOD is intended to incorporate state-of-the-art advances (Cimorelli et al. 2005) and has been proposed as a replacement for ISCST3.

The total uncertainty associated with an application of an air quality model depends on the following several contributions: 1) errors in model inputs, 2) errors in observations of concentration, 3) errors in internal model parameters (such as proportionality constants or power laws), 4) errors and omissions in model physics, and 5) random (or stochastic or turbulent) variations. This paper is concerned only with contributions 1 and 3 (errors in model inputs and in internal model parameters). A Monte Carlo (MC) probabilistic uncertainty approach is used to assess the model uncertainties instead of other possible approaches (e.g., sensitivity analysis) because the MC approach allows the combined influences of the uncertainties in many model inputs and parameters to be simultaneously assessed. The resulting uncertainty in the model outputs can be determined as well as correlations between uncertainties in inputs and outputs. Basic explanations of the MC procedures are provided in several books (e.g., Cullen and Frey 1999), and examples of applications to atmospheric transport and dispersion models have been previously published (e.g., Freeman et al. 1986; Irwin et al. 1987; Hanna et al. 2001, 2005b).

The Houston base case for the year 1996 is used here. It is described in EPA (2002), which used a 150 km × 150 km urban geographic domain surrounding Houston, and used ISCST3 to calculate annually averaged concentrations of five toxic air pollutants. The current study focuses on a 15 km × 15 km domain (see Fig. 1) covering the area around the Houston ship channel, and includes many oil refineries and chemical processing plants, as well as numerous major highways and several residential areas. The modeling addresses two air toxics—benzene and 1,3-butadiene—whose emissions are distributed among mobile sources, industrial sources, and area and volume sources, such as service stations and fugitive emissions. The concentrations in the 15 km × 15 km domain have been calculated using emissions information from sources on a 30 km × 30 km domain, which encompasses the 15 km × 15 km domain.

Annually averaged concentrations at 46 locations (43 at hypothetical census tract population centroids and 3 at monitoring sites) have been calculated on the Houston 15 km × 15 km receptor domain shown in Fig. 1. The averaging time for this modeling exercise is 1 yr, because that is the typical averaging time used for extrapolating lifetime health effects of benzene and 1,3-butadiene (EPA 2000, 2002).

Two alternate straight-line Gaussian plume dispersion models have been run: ISCST3 (EPA 1995) and AERMOD (Cimorelli et al. 2005). ISCST3 has been run in two modes; 1) assume that all sources are located in urban terrain (referred to in the tables and figures as ISTST3-urban), and 2) assume that sources are located in urban or rural terrain, based on their surroundings (referred to in the tables and figures as ISCST3-mixed urban/rural). The model base runs for the Houston domain in Fig. 1 were available from an earlier study (Heinold et al. 2003). However, the emissions files that have been used here were provided by the EPA for the 1996 calendar year and are consistent with those used by the EPA for their similar Monte Carlo uncertainty study, which is ongoing and has not yet been formally published. In the MC exercise reported here, each model has been run 100 times for random choices from the distributions characterizing the input parameters, and the responses of the key model output parameters are analyzed.

The first step in any uncertainty analysis is to clearly define the scientific questions being asked and the model outputs to be analyzed (Cullen and Frey 1999). The following four questions were addressed in this study:

Question 1 (uncertainty of modeling system due to uncertainties in inputs and model parameters): What is the uncertainty in the annually averaged concentration (averaged over the 43 hypothetical census tract population centroid receptors) of benzene and 1,3-butadiene concentrations in the Houston ship channel region, and which input variables and model parameters have the most influence on this uncertainty? What is the uncertainty in the magnitudes of the 100 maximum annually averaged concentrations calculated by each MC run at the 43 population centroid receptors and the 3 monitors?

Question 2 (contributions of emissions versus dispersion models to uncertainty): What is the relative contribution to the model uncertainty between the emissions and the transport and dispersion modules?

Question 3 (contributions of emissions source classes): How do the uncertainties and correlations differ for different source classes, such as mobile versus point, or industrial major point source versus industrial area and volume?

Question 4 (dependence on model): Do the conclusions concerning uncertainty depend on the model being used (e.g., ISCST3 versus AERMOD or ISCST3-urban versus ISCST3-mixed urban/rural)?

The method and the detailed results are given in sections 2 through 5, and succinct conclusions are given in section 6. The work herein presents the highlights of the comprehensive project report by Hanna et al. (2005a). That report contains detailed tables and figures and additional justifications for the assumptions regarding input uncertainties.

## 2. Estimates of input uncertainties for emissions

There are uncertainties and mean biases in the emissions estimates for any type of source, but quantifying these is difficult because few data are available. Uncertainties in emissions can be estimated by a combination of the following two approaches: analysis of available data and expert elicitation (Cullen and Frey 1999). As an example of the first approach, emissions data available for benzene were used to derive probability density functions describing the uncertainties in emissions for several source categories in Houston (Frey and Zhao 2003). Fortunately, the emissions inventory in Houston for benzene and 1,3-butadiene is more complete than in most other parts of the country. Furthermore, a key element of the current study has been a workshop on uncertainties of emissions estimates for benzene and 1,3-butadiene in the Houston area. The participants defined 25 benzene categories, where the “top two” categories, in terms of total mass emissions, are on-road mobile sources (such as cars and trucks) and petroleum refineries. The emissions in other categories are relatively small in comparison with these two. Most of the individual category uncertainties discussed at the workshop were found to be in the range of a factor of 1.5–3. Therefore, it was decided to simply assume a factor-of-3 uncertainty, with a lognormal distribution, for each category. This factor of 3 is assumed to cover the 95% range of the uncertainty (or ±2 standard deviations). Also, after much discussion at the workshop and afterward, it was decided that there was not enough information to assume anything other than zero for the mean biases.

Using the same process, the workshop participants discussed the 1,3-butadiene emissions inventory for the Houston ship channel domain and decided on 13 emissions categories; the top 2 involve industrial processes and plants. A smaller percentage (about 11%) of the 1,3-butadiene emissions comes from on-road mobile sources, as compared with about 29% for benzene.

To prevent unrealistic extremes in emissions from being selected by the MC random number process, it is assumed that there are no emissions that depart from the median by more than ±5 standard deviations (i.e., ± a factor of 7.5 for the emissions uncertainties).

There are no correlations assumed between any of the emissions categories, because they have been defined to assure the maximum independence. Also, as pointed out by Cullen and Frey (1999), the correlation has little effect on the MC-generated uncertainties unless the magnitude of the correlation is relatively large (i.e., 0.6 or 0.7, or greater).

## 3. Estimates of meteorological and dispersion model input uncertainties for ISCST3 and AERMOD

It is also necessary to estimate the uncertainties of the meteorological and dispersion model input and model parameters for the ISCST3 and AERMOD air quality models. It is the standard procedure with both ISCST3 and AERMOD to run the model for every hour of the year and then to use a postprocessor to calculate the 1-yr-averaged concentration, which is the model output studied in this MC exercise.

The uncertainties in transport and dispersion model inputs and parameters are difficult to estimate because there has not been much research on the topic of uncertainties. There has been some discussion about problems with various methods (Freeman et al. 1986; Irwin et al. 1987; Hanna 2002). As a result of planning meetings with the EPA during collaborations on the current project, it was decided to define the following two components of the uncertainty in meteorological inputs and model parameters: 1) the hour-to-hour variations at a single site and 2) the inevitable variations between sites resulting from nonrepresentativeness. These two components are hypothesized to be independent.

The data reported by Draxler (1984) from field experiments involving the dispersion of tracers can be used to estimate uncertainties in Gaussian plume model variables, such as the lateral and vertical dispersion coefficients *σ _{y}* and

*σ*. Both

_{z}*σ*and

_{y}*σ*are estimated to have a 95% uncertainty range of about a factor of 2 (Hanna 2002). Furthermore, minimum and maximum limits can be determined for the distributions of inputs so that they cannot be selected outside of a known physical range. For example, a

_{z}*σ*of less than 10 m is unlikely to occur at a downwind distance of 1000 m. The data on

_{y}*σ*and

_{y}*σ*suggest that their maximum range at a given downwind distance and stability class is a factor of 5 (Draxler 1984), and there is no evidence to justify assuming there is a correlation between the two. Wind speed should also be constrained to be within known physical bounds when selected by a random number generator. Most meteorological inputs are limited to be within ±5 standard deviations of the median. Some variables are naturally bounded at the low and/or high ends. For example, cloud cover (CC) cannot be less than 0.0 or greater than 1.0.

_{z}As for the emissions inputs, in all cases, the uncertainty is represented by a distribution function (normal or lognormal) and a definition of the 95% range (i.e., ±2 standard deviations). None of the input is assumed to have a mean bias. The lognormal distribution is a good choice for many air quality and meteorological variables that have uncertainties with magnitudes larger than a factor of ±50%.

The local hour-to-hour uncertainties are randomly selected each hour, and the site-to-site uncertainties are randomly selected to apply for the entire year. For the lognormally distributed variables, such as wind speed, the relative (normalized by the median) perturbation expressing the total uncertainty is the product of the relative perturbations for the local hour-to-hour and site-to-site (annual average) uncertainties. In the case of a normally distributed variable, such as wind direction, the total relative perturbation is the sum of the two relative perturbations.

It is assumed in this preliminary exercise that the hour-to-hour variations in the uncertainties in a given meteorological input or model parameter are not correlated. That is, the wind speed uncertainty at 0900 local time is not correlated with that at 1000 (the next hour).

The specific recommendations for hour-to-hour and site-to-site perturbations are given below. Because there is little information on these components, in most cases the two types of perturbations are hypothesized to be nearly equal. These assumptions can be refined as additional data are analyzed in future studies.

Wind speed is assumed to have a lognormal distribution with a ±30% uncertainty (covering the 95% range) for each of the hourly and site-to-site components. According to National Weather Service procedures, any wind speed less than 1.2 m s^{−1} is listed as being “calm.” A procedure was devised so that the fraction of calms before and after the MC perturbations were applied would remain the same, on average.

The wind direction (°) is assumed to have a normal distribution with a ±30° uncertainty for each of the hourly and site-to-site components. The new MC-selected wind direction is the sum of the original wind direction and the hourly perturbation plus the site-to-site perturbation. Selected wind direction values greater than 360° and less than 0° are corrected to be in the range of 0°–360° (e.g., −15° is corrected to 345°).

Cloud cover, ranging from 0 (clear) to 1.0 (overcast), is assumed to have a normal distribution with an uncertainty of ±0.1 for each of the hourly and site-to site components. If the randomly selected cloud cover is greater than 1.0, it is reset to 1.0; if it is less than 0.0, it is reset to 0.0.

The mixing height is assumed to have a lognormal distribution with a 95% range of ±20% for both hour-to-hour and for site-to-site perturbations.

The surface roughness length *z*_{0} is used by AERMOD (and not ISCST3) and is assumed to have a lognormal distribution with a 95% range of ± a factor of 3 for site-to-site perturbations. No hour-to-hour perturbations are assumed. Note that because ISCST3 does not directly input *z*_{0}, and instead assumes either urban or rural surface conditions, those uncertainties are being handled in the current study by running ISCST3 in the following two modes: 1) assuming all sources are surrounded by urban terrain and 2) assuming sources are surrounded by either rural or urban terrain, following EPA (1995) criteria. The Bowen ratio (BR; defined as the ratio of the sensible heat flux to the latent heat flux) is another input used by AERMOD and not ISCST3, and is assumed to be lognormally distributed with a 95% range of a factor of 2 for both hour-to-hour and site-to-site perturbations.

Vertical temperature gradient *dT*/*dz* is assumed to have a lognormal distribution with a 95% range of a factor of 2 for both hour-to-hour and for site-to-site perturbations. Note that variations in *dT*/*dz* are expected to primarily affect the plume rise calculations. The perturbations to *dT*/*dz* are applied inside the code after the temperature gradients have been calculated by the internal modules. The perturbations to *σ _{y}* and

*σ*are also applied inside the code after the dispersion parameters have been calculated by the internal modules. A lognormal distribution is assumed with a 95% range of ±50% for both hour-to-hour and site-to-site perturbations. No correlation is assumed between

_{z}*σ*and

_{y}*σ*(i.e., they are varied randomly and independently).

_{z}The MC inputs for the AERMOD and ISCST3 models were implemented by writing some simple preprocessing programs and by making a few changes to the codes themselves. For AERMOD, the meteorological data preprocessor (AERMET) was applied along with special preprocessing programs to generate the 100 meteorological data files and profile data used as input to AERMOD, which included both site-to-site and hour-to-hour perturbations. It is noted that a given MC wind perturbation was applied to the entire wind profile. For ISCST3, the meteorological preprocessor for regulatory models (MPRM) was applied along with special preprocessing programs to generate “onsite” meteorological data files. Added to the ISCST3 meteorological files were hourly perturbations for *σ _{y}* and

*σ*and

_{z}*dT*/

*dz*.

## 4. Monte Carlo sampling and analysis methods

### a. MC sampling methods

With simple random sampling from the input distributions with no assumed correlations among input fluctuations, the number of needed MC runs is manageable because it is not dependent on the number of variables. The number *m* of MC runs used in this study is 100, which is a reasonable compromise between the desire to have more runs to narrow the confidence bounds in the results and the desire to have less runs to save computer time. Separate model runs were carried out for benzene and 1,3-butadiene.

For the emissions uncertainties, a given perturbation or random number applies for the entire year of ISCST3 or AERMOD runs. As mentioned in the previous section, for several of the meteorological and dispersion model uncertainties a site-to-site (averaged over a year) and hour-to-hour variability are assumed. For the site-to-site component, as for the emissions, a random number is selected that applies for the entire year. For each of the *m* = 100 MC runs, a set of “*n*” random numbers is generated (one for each of the *n* input variables or parameters). To make the model-to-model comparisons more meaningful in the subsequent analysis, the same sets of *n* random numbers for the *m* = 100 MC runs have been used for each model.

For the hour-to-hour component, the random number selection procedures are similar to those for the site-to-site component, except that each new dispersion model run has a new set of random numbers for each input for each of the 8760 h of the year. These random numbers have also been saved for use in correlation calculations, although it is expected that the correlations are minimal for the hour-to-hour variations. This is because, over a year, the plus and minus deviations during each hour will tend to cancel out. The same sets of random numbers have been used for the same MC run for each dispersion model.

### b. Description of outputs and analysis techniques

For ISCST3, the following two sets of modeled concentrations were produced for analysis: 1) all urban sources (ISCST3-urban), and 2) mixed urban and rural sources (ISCST3-mixed urban/rural). For each pollutant, a total of 300 ISCST3 runs were made (100 runs for all sources assuming urban terrain, 100 runs for only those sources surrounded by rural terrain, and 100 runs for only those sources surrounded by urban terrain). ISCST3-mixed urban/rural predicted concentrations at any location, which were then assumed to be the sum of the predicted concentrations for the urban and rural sources runs just mentioned. Note that the urban and rural designations are prescribed by procedures outlined by EPA (1995) and are the same as those used in EPA (2002) studies of Houston. For AERMOD, one set of 100 MC runs was carried out for each pollutant, automatically representing mixed rural and urban sources through its input of a spatially varying surface roughness length.

The outputs are analyzed to determine the characteristics of the variability and the inputs whose variations have the most effect on the variations in the outputs. In most probabilistic MC assessments, the majority of the uncertainty in the output distribution (annually averaged concentrations in this scenario) is attributable to uncertainty in a small subset of the inputs (Cullen and Frey 1999). An identification of this subset of highly significant contributors to output uncertainty can help guide future research. The rank correlation coefficient is used to identify important inputs. The analysis also allows the relative contributions of the uncertainties in the emissions model and the transport and dispersion model to the uncertainties in the output concentrations to be assessed. For *m* = 100 MC runs, a correlation can be said to be significantly different from 0.0, with 95% confidence, if the magnitude of the correlation coefficient exceeds about 2*m*^{−1/2}, or 0.2.

The 5–10 parameters with correlation coefficients significant at the 95% confidence level are included in a multiple linear regression analysis. The result is a linear regression equation where the perturbations in predicted concentrations are expressed as linear combinations of the perturbations in input variables or model parameters. The fraction of the variance explained by each parameter is also estimated.

Despite the fact that model simulations at three ambient air monitor locations (see Fig. 1) are included in the outputs, no attempt is made in this paper to compare model simulations with observations. The current paper is focused on the uncertainties in the outputs. The EPA (2000) report and Hanna et al. (2005a) project report include a few comparisons of the model base runs with observations, showing little mean bias for benzene but a mean underprediction bias for 1,3-butadiene.

## 5. Results of analysis of Monte Carlo runs

The results of the MC analysis, including several plots and tables, focus on the maximum (peak) annually averaged concentrations at any of the 46 receptors locations (43 hypothetical census tract population centroids and three monitors, as shown in Fig. 1), and the spatially averaged annually averaged concentrations over the 43 census tract population centroid receptor locations.

The location of the maximum (peak) annually averaged concentration among the 46 receptors was determined for each of the 100 MC runs for each model and pollutant. Given the variability of emission sources categories and other model inputs, it is expected that this maximum location will shift. For benzene, the frequency of occurrence of the location of the peak receptor is more or less split between two groups of receptor locations; the first group is at a location in the northern part of the domain next to a major highway [Interstate Highway 10 (I-10)], where mobile sources dominate the emissions, and the second group is in the western part of the domain within the industrial region along the ship channel, where chemical processing plants and oil refineries dominate the emissions. There are slightly more AERMOD and ISCST3-urban maxima near the ship channel than next to I-10, while the reverse is true for ISCST3-mixed urban/rural. For 1,3-butadiene, over 90% of the maxima for all models occur at the receptor location adjacent to the busy highway I-10. Very few maxima occur in the industrial ship channel area, despite the fact that most of the 1,3-butadiene emissions are from industrial sources. This apparent inconsistency can be explained as a combination of several factors, such as the very close proximity of the receptor location to I-10 and the tendency for the industrial sources to be associated with stacks and/or plume rise.

The conclusions in this paper are specific to the pollutants benzene and 1,3-butadiene, which are mainly emitted from near-ground sources, such as cars and trucks and short stacks at industrial plants. Different conclusions regarding uncertainty and its causes may be found for pollutants, such as sulfur dioxide (SO_{2}), that have significant emissions from tall stacks.

### a. Uncertainties as indicated by probability density functions of Monte Carlo concentration outputs

For each set of 100 MC runs (for the three model options and the two pollutants), the 100 values of predicted concentrations have been ranked from low to high, and probability density functions (pdfs) or frequency distributions have been calculated. Figure 2 is an example (for benzene as simulated by AERMOD) of histograms showing the frequency distributions of the maximum (peak) annually averaged concentrations found at a single receptor among all the receptors (top part of figure), and the annually averaged concentration averaged over the hypothetical 43 census tract population centroids (bottom part of figure). The frequency distributions all have longer “tails” at the high-concentration end of the distribution, which is characteristic of a lognormal distribution. Even though the distributions of the 100 MC outputs in Fig. 3 appear somewhat ragged (i.e., not smooth), the distributions are likely to be satisfactorily capturing the main output of interest, namely, the 95% range of the MC outputs. The references on MC methodologies, such as IAEA (1989), Hoffman (1996), and Cullen and Frey (1999), agree that 100 MC runs are sufficient to capture the primary aspects of the spread of the distribution. If more MC runs were carried out (say 500 or 1000), the distributions would smooth out, but there would be only minor improvement in estimation of the 95% spread.

The distributions are presented in a comprehensive format in Fig. 3. Here, each set of 100 ranked MC outputs is used to estimate certain values and percentiles (minimum, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, and maximum) of the cumulative distribution function (CDF). Figure 3 contains plots of these various percentile levels for outputs of maximum (peak) concentrations (top), and concentrations averaged over all census tract population centroid receptors (bottom), for benzene (Fig. 3a) and 1,3-butadiene (Fig. 3b). The discussion below focuses on the range from the 2.5th and the 97.5th percentiles, because that defines the 95% range, which is of interest in many statistical analyses.

Figure 3 suggests that the relative ranges (i.e., the 95% range divided by the median or the 50th percentile) are usually about plus or minus a factor of 2 or 3, and are fairly constant from one model to the other, from one pollutant to the other, and from the maximum (peak) *C* to the spatially averaged *C*. There is usually less than a 10% difference between the relative 95% uncertainties (normalized by the median) for benzene and 1,3-butadiene. It is found that the 95% uncertainty range averages about 20% larger for the maximum (peak) concentration over the domain relative to the census tract population centroid average.

The figure shows that the 50th percentile (median) predictions of ISCST3-mixed urban/rural are about 25% higher than the corresponding predictions for ISCST3-urban for benzene, and about a factor of 1.5–2 higher for 1,3-butadiene. The difference between the mixed urban/rural and all urban predictions could be attributed to the fact that there is more turbulence, and hence dispersion, for urban conditions. These differences between the two ISC modes reflect a difference in assumed surface roughness conditions.

Generally, the 95% range is about 25% larger for AERMOD than for either ISCST3-mixed urban/rural or ISCST3-urban. This 25% difference can be postulated to be due to the fact that the AERMOD MC runs already include uncertainties in surface roughness, which is shown in later sections to have a significant correlation with variations in AERMOD-predicted concentrations, while the individual ISCST3 MC runs do not, once the terrain surface is selected. If the difference between the predictions of ISCST3-mixed urban/rural and ISCST3-urban is considered to be caused by the effects of uncertainties in surface roughness, then this accounts for the 25% difference in the 95% range. AERMOD is a more state-of-the-art model than ISCST3 in many respects, including AERMOD’s explicit treatments of surface roughness. As mentioned earlier, these conclusions are specific for near-surface sources, typical of benzene and 1,3-butadiene emissions.

### b. Analysis of Monte Carlo outputs to calculate correlations and apply multiple regression methods

Correlations have been calculated in order to identify significant relations between MC variations in the individual inputs and variations in the MC-predicted concentration outputs. In addition, multiple linear regression methods are applied to the few inputs whose variations are significantly correlated with the variations in the outputs.

“Rank” correlation coefficients are used to help reduce the influence of outliers that would occur if actual magnitudes of variables were used. Correlation coefficients that are significantly different from 0.0 at the 95% confidence level can be identified. The magnitude of the correlation coefficient that defines the 95% confidence level for 100 pairs of data is 0.20 [i.e., 2(100)^{−1/2}].

For benzene, the variables with significant correlations include the following: three of the source categories (on-road motor vehicles, refineries, and sewage treatment plants), wind speed, surface roughness (in AERMOD), and *σ _{z}* (vertical dispersion parameter). Some input yielded relatively large correlations, with magnitudes exceeding 0.4. These include 1) on-road motor vehicle sources (cars and trucks), with a correlation exceeding 0.6 for benzene for AERMOD and the two ISCST3 options and for the maximum (MAX) and the spatial average over the census tract population centroid receptors (AVG); 2) petroleum refinery sources, with a correlation exceeding 0.4 for benzene from AERMOD runs; 3) surface roughness, with a correlation of −0.55 for benzene for AERMOD runs for the spatial average over the centroid receptors; and 4)

*σ*, with a correlation of −0.49 and −0.43 for benzene for the spatial average of the centroid receptors for the ISCST3-mixed urban/rural and ISCST-urban runs. The large correlations for benzene both for on-road motor vehicle sources and refinery sources are consistent with the locations of the maximum concentrations for the 100 MC runs, which are roughly equally split between the site next to the busy highway I-10, and a few sites in the industrial area near the Houston ship channel.

_{z}For 1,3-butadiene, the variables with significant correlations include industrial sources, rubber and latex production sources, cooling tower sources, and road segment sources, as well as wind speed, surface roughness, and *σ _{z}*. By far, the largest correlations (about 0.9 for the maximum concentration) are for road segment sources. This category is associated with mobile sources on major roads. This result may be related to the fact that nearly all of the maximum 1,3-butadiene concentrations occur at the receptor location near the busy highway I-10. The meteorological and dispersion model variables with significant correlations are the same as for benzene.

Multiple linear regression coefficients were estimated for benzene and 1,3-butadiene, where a variable was included if it was significant for at least one of the models. Table 1 lists the multipliers of the regression equation, the estimated intercept (i.e., the concentration for the situation where all perturbations are zero), and the final correlation coefficient of the regression equation for benzene and 1,3-butadiene.

*X*represents the normalized MC variation for independent parameter

_{i}*i*, and that

*Y*represents the difference between the predicted concentration and the concentration that would occur if there were no variations in any of the parameters (the intercept). Concentrations are given in micrograms per cubic meter. For example, the following equation is inferred from the regression coefficients in Table 1 for benzene from AERMOD runs for the maximum concentration:

*z*

_{0}), and vertical dispersion parameter (

*σ*), respectively. Of course, it must be recognized that these are not “real” measurement data used to develop the regression relations, but only model predictions from the 100 sets of MC perturbations of the assumed input variables.

_{z}The magnitudes of the regression coefficients listed in Table 1 more or less parallel the individual correlation coefficients. We note that whereas some of the variables with relatively large coefficients make intuitive sense (e.g., on-road mobile source emissions have the largest fraction of the total benzene emissions), others are less so (e.g., landfills have a much smaller fraction of total benzene emissions). The reasons for some of these higher correlations are not obvious and more study is needed.

The relative influence on the uncertainties of the source emissions input group versus the meteorological input group can be estimated by comparing the sums of the squares of the regression coefficients in Table 1 for these groups. The numbers listed in Table 2 are the resulting fraction of total explained variance resulting from the emissions category group. It is seen that in all but one case (the spatial average for AERMOD for 1,3-butadiene), the emissions category group has the largest fraction. For the maximum or peak concentration receptor (MAX in the tables), the emissions category group (specifically, the on-road mobile source category) dominates the explained variance, primarily because the maximum occurs almost exclusively at the monitor adjacent to highway I-10. For the spatially averaged census tract population centroids (AVG in the tables), there is more of a balance between the contributions of the emissions and the meteorological groups. For the AVG data, AERMOD has a larger fraction of explained variance than ISCST3 in the meteorological group primarily because AERMOD explicitly accounts for variations in surface roughness, while the two individual ISCST3 options do not. However, as mentioned earlier, if we consider the differences between ISCST3-urban and ISCST3-mixed urban/rural to be due to differences in terrain roughness, then the surface roughness could be considered to have more of an effect on the ISCST3 uncertainty than is indicated in the table.

## 6. Conclusions

The conclusions are specific to the Houston ship channel domain, the toxic pollutants benzene and 1,3-butadiene, and the AERMOD and ISCST3 turbulence and dispersion models. Most of the sources are near ground, such as cars and short stacks. However, the MC uncertainty method that is described is general and can be applied to other domains, pollutants, and models.

The current MC analysis in the Houston ship channel domain shows that the uncertainty (defined as the 95% range) in the simulated averaged annual concentration for the spatially averaged census tract population centroid receptors, resulting from uncertainties in inputs and model parameters, is about a factor of 2.0, with only slight variation with model or pollutant. When the maximum annually averaged concentration at any receptor location is considered instead of the spatial average, the uncertainty increases by about 20%. In general, the uncertainties for AERMOD are about 20%–30% greater than the uncertainties for either ISCST3-urban or ISCST3-mixed urban/rural, largely because AERMOD directly incorporates surface roughness while ISCST3 requires a preselection of one of two classes (urban or rural) of surface roughness. If the differences in the ISCST3 urban versus rural simulations are included in the model uncertainty estimate, the ISCST3 model uncertainties are slightly larger than the AERMOD uncertainties.

The uncertainties in benzene and 1,3-butadiene concentration are found to be most strongly related to variations in on-road motor vehicle (mobile source) emissions. Some variations in industrial emission categories, such as petroleum refineries, are also found to be significantly correlated with uncertainties in concentrations. The location that shows the strongest correlations with motor vehicle emissions is the receptor sited adjacent to a busy highway (I-10). The uncertainties in simulated benzene and 1,3-butadiene concentrations are also significantly related to variations in three meteorological and dispersion model parameters—wind speed, surface roughness *z*_{0} (for AERMOD), and *σ _{z}* (vertical dispersion). However, surface roughness (implicitly parameterized by assigning either “urban” or “rural” land use to the area around a given source) is also important for ISCST3, because predicted concentrations for the ISCST3-mixed urban/rural option are about 20%–100% greater than for the ISCST3-urban option.

In general, there were about four to six of the emissions categories and three of the transport and dispersion model inputs and parameters whose uncertainties had significant (at the 95% confidence level) correlations with uncertainties in predicted benzene and 1,3-butadiene concentrations. The relative influence on the total uncertainties of the emissions category group versus the meteorological input and dispersion model parameter group was estimated by comparing the sums of the squares of the regression coefficients for the two groups. It is seen that in all but one case (AERMOD, spatially averaged over the census tract centroids for 1,3-butadiene), the emissions category group has the largest relative influence. For the maximum concentration at any receptor, the emissions category group (specifically the on-road mobile source category), dominates the explained variance, primarily because this occurs at the monitor adjacent to highway I-10. For the spatially averaged concentration for the census tract population centroids, there is more of a balance between the contributions of the emissions and the meteorological/dispersion groups.

The differences in uncertainty by source class are difficult to derive from the results, because all inputs were varied simultaneously and independently. However, it is possible to study the correlations for the various source classes for benzene and for 1,3-butadiene. It is found that uncertainties in on-road mobile sources are the major contributor to uncertainties in predicted concentrations of both benzene and 1,3-butadiene. The second-largest contributor to the total uncertainty is refineries (for benzene) and the chemical industry (for 1,3-butadiene). This specific result may result from the position of one receptor near a busy highway (I-10). It should be noted that the source categories used here closely parallel those from the EPA study (Frey and Zhao 2003), but are not necessarily a one-to-one match with the source classes defined by EPA for regulatory assessments (e.g., major area sources and mobile sources).

There are few differences among the models regarding the relative magnitudes of the uncertainty and the variables whose uncertainties have the strongest correlation with the modeled concentration uncertainties. The absolute concentrations predicted deterministically by the base case runs from the three models may vary by a factor of as much as 2, but the relative uncertainties (normalized by the base run concentrations) show much less variation. The uncertainty for AERMOD is about 20% larger than the uncertainty for the two ISCST3 options. However, because AERMOD accounts for surface roughness and the individual ISCST3 options do not, the explicit inclusion of uncertainties in surface roughness inputs in the MC uncertainty analysis for AERMOD would represent an additional contribution to the uncertainty. Because our results show that the choice of urban or rural terrain in ISCST3 can cause 20%–100% differences in predicted concentrations, the differences between ISCST3-urban and ISCST3-mixed urban/rural can be implicitly considered to be a measure of the uncertainty resulting from surface roughness variations. Consequently, accounting for this difference in ISCST3 options, it can be concluded that the uncertainty of ISCST3 is slightly larger than that of AERMOD.

## Acknowledgments

This research has been sponsored by the American Petroleum Institute (API), with Howard Feldman and Richard Karp as the program managers. Prof. H.C. Frey of North Carolina State University is a consultant on the project and has contributed many useful ideas and comments. The authors appreciate the suggestions and datasets provided by Joseph Touma, James Thurman, and John Irwin of the U.S. EPA.

## REFERENCES

Cimorelli, A. J., and Coauthors, 2005:

*:*AERMOD—Description of model formulation. U.S. EPA Tech. Rep. EPA-454/R-03-004, 91 pp. [Available online at http://www.epa.gov/ttn/scram/7thconf/aermod/aermod_mfd.pdf.].Cullen, A. C., and H. C. Frey, 1999:

*The Use of Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty in Models and Inputs*. Plenum, 335 pp.Draxler, R. R., 1984: Diffusion and transport experiments.

*Atmospheric Science and Power Production,*D. Anderson, Ed., U.S. Department of Energy, 367–422.EPA, 1995: Description of model algorithms. Vol. II, User’s Guide for the Industrial Source Complex (ISC3) Dispersion Model, revised ed., EPA-454/b-95-003b, 120 pp.

EPA, 2000: National Air Toxics Program: The integrated urban strategy EPA-453/R-99-007. OAQPS/EPA, 156 pp.

EPA, 2002: Example application of modeling toxic air pollutants in urban areas. EPA-454/R-02-003, OAQPS/EPA, 111 pp. [Available online at http://www.epa.gov/scram001/guidance/guide/uatexample.pdf.].

Freeman, D. L., R. T. Egami, N. F. Robinson, and J. G. Watson, 1986: A method for propagating measurement uncertainty through dispersion models.

,*J. Air Pollut. Control Assoc.***36****,**246–253.Frey, H. C., and Y. Zhao, 2003: Development of probabilistic emission inventories of benzene, formaldehyde and chromium for the Houston domain. Prepared by North Carolina State University for Carolina Environmental Program and U.S. EPA, 192 pp.

Hanna, S. R., 2002: Meteorological modeling in MACCS2. Hanna Consultants Final Rep. P047, prepared for the U.S. Nuclear Regulatory Commission, 57 pp.

Hanna, S. R., Z. Lu, H. C. Frey, N. Wheeler, J. Vukovich, S. Arumachalam, and M. Fernau, 2001: Uncertainties in predicted ozone concentration due to input uncertainties for UAM-V photochemical grid model applied to the July 1995 OTAG domain.

,*Atmos. Environ.***35****,**891–903.Hanna, S. R., R. J. Paine, D. Heinold, and E. Kintigh, 2005a: Uncertainties in benzene and 1,3-butadiene emissions in Houston and their effects on uncertainties in concentrations calculated by AERMOD and ISC. Hanna Consultants Rep. P055, prepared for API, 89 pp.

Hanna, S. R., A. G. Russell, J. G. Wilkinson, J. Vukovich, and D. A. Hansen, 2005b: Monte Carlo estimation of uncertainties in BEIS3 emission outputs and their effects on uncertainties in chemical transport model predictions.

,*J. Geophys. Res.***110****.**D01302, doi:10.1029/2004JD004986.Heinold, D., R. Paine, and H. Feldman, 2003: Quantitative evaluation of the EPA urban air toxics modeling strategy: Results of sensitivity studies.

*Proc. AWMA Annual Meeting,*Paper 69639, Pittsburgh, PA, AWMA, CD-ROM.Hoffman, F. O., Ed. 1996: A guide for uncertainty analysis in dose and risk assessments related to environmental contamination. NCRP Commentary No. 14, National Council on Radiation Protection and Measurement, 54 pp.

IAEA, 1989:

*:*Evaluating the reliability of predictions made using environmental transfer models. IAEA Safety Series No. 100, 106 pp.Irwin, J. S., S. T. Rao, W. B. Petersen, and D. B. Turner, 1987: Relating error bounds for maximum concentration estimates to diffusion meteorology uncertainty.

,*Atmos. Environ.***21****,**1927–1937.

AERMOD benzene frequency distributions (occurrences per interval) of annually averaged concentrations for the 100 MC runs for (top) the maximum or peak at any individual receptor anywhere on the domain and (bottom) the spatial average over all 43 census tract population centroids.

Citation: Journal of Applied Meteorology and Climatology 46, 9; 10.1175/JAM2540.1

AERMOD benzene frequency distributions (occurrences per interval) of annually averaged concentrations for the 100 MC runs for (top) the maximum or peak at any individual receptor anywhere on the domain and (bottom) the spatial average over all 43 census tract population centroids.

Citation: Journal of Applied Meteorology and Climatology 46, 9; 10.1175/JAM2540.1

AERMOD benzene frequency distributions (occurrences per interval) of annually averaged concentrations for the 100 MC runs for (top) the maximum or peak at any individual receptor anywhere on the domain and (bottom) the spatial average over all 43 census tract population centroids.

Citation: Journal of Applied Meteorology and Climatology 46, 9; 10.1175/JAM2540.1

(a) Significant points on the CDF based on 100 MC runs for predicted annually averaged benzene concentrations (*μ*g m^{−3}) for (top) the maximum or peak at any individual receptor anywhere on the domain and (bottom) the spatial average over all 43 census tract population centroids. (b) As in (a), but for 1,3-butadiene concentrations.

Citation: Journal of Applied Meteorology and Climatology 46, 9; 10.1175/JAM2540.1

(a) Significant points on the CDF based on 100 MC runs for predicted annually averaged benzene concentrations (*μ*g m^{−3}) for (top) the maximum or peak at any individual receptor anywhere on the domain and (bottom) the spatial average over all 43 census tract population centroids. (b) As in (a), but for 1,3-butadiene concentrations.

Citation: Journal of Applied Meteorology and Climatology 46, 9; 10.1175/JAM2540.1

(a) Significant points on the CDF based on 100 MC runs for predicted annually averaged benzene concentrations (*μ*g m^{−3}) for (top) the maximum or peak at any individual receptor anywhere on the domain and (bottom) the spatial average over all 43 census tract population centroids. (b) As in (a), but for 1,3-butadiene concentrations.

Citation: Journal of Applied Meteorology and Climatology 46, 9; 10.1175/JAM2540.1

Multiple regression coefficients for benzene [e.g., see Eq. (1)] and 1,3-butadiene. Boldface numbers are significant at the 95% level. ISCM refers to ISCST3-mixed urban/rural and ISCU refers to ISCST3-urban.

Fraction of explained variance contributed by the group of emissions inputs in Table 1. The remaining fraction is contributed by the group of meteorological inputs. ISCM refers to ISCST3-mixed urban/rural and ISCU refers to ISCST3-urban.