Recent years have seen the increasing inclusion of per-retrieval prognostic (predictive) uncertainty estimates within satellite aerosol optical depth (AOD) data sets, providing users with quantitative tools to assist in the optimal use of these data. Prognostic estimates contrast with diagnostic (i.e. relative to some external truth) ones, which are typically obtained using sensitivity and/or validation analyses. Up to now, however, the quality of these uncertainty estimates has not been routinely assessed. This study presents a review of existing prognostic and diagnostic approaches for quantifying uncertainty in satellite AOD retrievals, and it presents a general framework to evaluate them based on the expected statistical properties of ensembles of estimated uncertainties and actual retrieval errors. It is hoped that this framework will be adopted as a complement to existing AOD validation exercises; it is not restricted to AOD and can in principle be applied to other quantities for which a reference validation data set is available. This framework is then applied to assess the uncertainties provided by several satellite data sets (seven over land, five over water), which draw on methods from the empirical to sensitivity analyses to formal error propagation, at 12 Aerosol Robotic Network (AERONET) sites. The AERONET sites are divided into those for which it is expected that the techniques will perform well and those for which some complexity about the site may provide a more severe test. Overall, all techniques show some skill in that larger estimated uncertainties are generally associated with larger observed errors, although they are sometimes poorly calibrated (i.e. too small or too large in magnitude). No technique uniformly performs best. For powerful formal uncertainty propagation approaches such as optimal estimation, the results illustrate some of the difficulties in appropriate population of the covariance matrices required by the technique. When the data sets are confronted by a situation strongly counter to the retrieval forward model (e.g. potentially mixed land–water surfaces or aerosol optical properties outside the family of assumptions), some algorithms fail to provide a retrieval, while others do but with a quantitatively unreliable uncertainty estimate. The discussion suggests paths forward for the refinement of these techniques.
The capability to quantify atmospheric aerosols from spaceborne measurements arguably goes back to 1972 with the launch of the Multispectral Scanner System (MSS) aboard the first Landsat satellite
At present there are several dozen sensors of various types suitable for the quantification of aerosols in flight, and more that have begun and ended operations in between. In addition to the variety of instruments, a variety of algorithms have been developed to retrieve aerosol properties from these measurements
Satellite instruments which have been used for column AOD retrieval; arranged by sensor type.
As Table
Retrieval algorithms are used to process the calibrated observations (referred to as level 1 or L1 data) to provide level 2 (L2) data products, consisting of geophysical quantities of interest. These L2 products are typically on the L1 satellite observation grid (or a multiple of it) and often further aggregated to level 3 (L3) products on regular space–time grids. For further background and a discussion of satellite data processing levels, see
Acronyms for some aerosol retrieval algorithms, data records, and/or institution names applied to one or more satellite instruments from Tables
L2 retrieval algorithm development is typically guided by information content studies, sensitivity analyses, and retrieval simulations to gauge which quantities a given sensor and algorithmic approach can retrieve and with what uncertainty
Increases in the quality of instrumentation, retrieval algorithms, models, and computational power have prompted an increasing desire for the provision of pixel-level uncertainty estimates in L2 aerosol data products. This has been driven in part by data assimilation (DA) applications, which need a robust error model on data for ingestion into numerical models
Driven by these needs, many AOD data sets now provide prognostic uncertainty estimates; in some cases these additions have been developed to satisfy these user needs, while in others they have always been available as they are inherent to the retrieval technique. Unlike AOD validation, however, which has had a fairly standard methodology to briefly review the ways in which uncertainty information has been conveyed in satellite aerosol data products (Sect. to provide a framework for the evaluation of pixel-level AOD uncertainty estimates in satellite remote sensing, which can be adopted as a complement to AOD validation exercises going forward, and use this framework to assess AOD uncertainty estimates in several AOD retrieval products (Sect. to discuss the strengths and limitations of each these approaches, and suggest paths forward for improving the quality and use of L2 (pixel-level) uncertainty estimates in satellite aerosol remote sensing (Sects.
The International Standards Organization document often known as the GUM (Guide to Uncertainty in Measurement) provides standardized terminology for discussing uncertainties A A The The The
The error can only be known when the true value of the measurand is also known, which is rare. This is the province of validation exercises:
For validation exercises AERONET AOD data are often taken as a reference truth because the uncertainty on AERONET AOD data
In contrast to error, the uncertainty can be estimated for each individual measured value (retrieval). The term “expected error” (EE) is often used in the aerosol remote sensing literature
AOD and extinction data sets providing prognostic uncertainty estimates as well as associated key references for uncertainty estimate calculation. Where applicable, algorithm names are given first with instrument names in parentheses. See Tables
Examples of existing prognostic uncertainty estimates for AOD or aerosol extinction data sets are given in Table
The formal methods which have been applied to date are in general Bayesian approaches, which can be expressed in the formalism of
Note that here
As
These smoothness and a priori constraints provide a regularization mechanism to suppress “noise-like” variations in the retrieved parameters when they are not well-constrained by the measurements alone, although there is a danger in that overly strong constraints can suppress real variability. As a result, a priori constraints on AOD itself are often intentionally weak compared to those on other retrieved parameters. Strictly, the MAP is a maximum likelihood estimate (MLE) only if the retrieval does not use a priori information, although it is often referred to as an MLE regardless
The rest of the error propagation methods in Table The forward model must be appropriate to the problem at hand and capable of providing unbiased estimates of the observations. Typically if the forward model is fundamentally incorrect, and/or any a priori constraints strongly inappropriate, the retrieval will frequently not converge to a solution or have unexpectedly large The covariance matrices The forward model must be approximately linear with Gaussian errors near the solution. This assumption sometimes breaks down if the measurements are uninformative on a parameter and a priori constraints are weak or absent, and the resulting state uncertainty estimates will be invalid. This can be tested The retrieval must have converged to the neighbourhood of the correct solution (i.e. near the global, not a local, minimum of the cost function), which can be a problem if there are degenerate solutions. In practice algorithms try to use reasonable a priori constraints, first guesses, and make a careful selection of which quantities to retrieve vs. which to assume
A detailed further discussion on these conditions from the perspective of temperature and trace gas retrievals, which share some similar conceptual challenges to aerosol remote sensing, is provided by
A particular challenge for the formal error propagation techniques is the second point above: how to quantify the individual contributions to the error budget necessary to calculate the above covariance matrices? This difficulty has motivated some of the empirical approaches in Table
The MISR algorithms use different approaches. Both the land and water AOD retrieval algorithms perform retrieval using each of 74 distinct aerosol optical models (known as “mixtures”) and calculate a cost function for each. In earlier algorithm versions
This approach was refined (for retrievals over water pixels) by
AOD and extinction data sets providing sensitivity analyses and/or diagnostic uncertainty estimates, with associated key references for uncertainty. Where applicable, algorithm names are given first with instrument names in parentheses. See Tables
Available AOD data sets which do not currently provide prognostic uncertainty estimates are listed in Table
Sensitivity analyses are often complemented by dedicated validation papers which summarize the results of comparisons against AERONET, MAN, or other networks
Both the diagnostic and prognostic techniques typically (implicitly or explicitly) make the assumption that the sensor and retrieval algorithm are unbiased and that the resulting uncertainty estimates are unbiased and symmetric. However, it is well-known that many of the key factors governing retrieval errors are globally (e.g. sensor calibration,
Uncertainty propagation approaches such as OE can in principle account for systematic uncertainty sources, as they (and any spectral or parameter correlations) can be included in the required covariance matrices. This can produce estimates of total uncertainty which are reasonable for an individual retrieval, but the true (large-scale) error distributions would then not be symmetric, lessening their value. Likewise, systematically biased priors can lead to systematically biased retrievals. As a result, it would be desirable to remove systematic contributions to the retrieval system uncertainty as far as possible. In practice this is often done through validation exercises, whereby diagnostic comparisons can provide clues as to the source of biases, which are then (hopefully) lessened in the next version of the algorithm. Distributions of the residuals of predicted measurements at the retrieval solution can also be indicative of calibration and forward model biases at the wavelength in question.
A possible solution to this is to perform a vicarious calibration, calculating a correction factor to the sensor gain as a function of time and band by matching observed and modelled reflectances at sites where atmospheric and surface conditions are thought to be well-known (e.g. thick anvil clouds, Sun glint, and AERONET sites). The derived correction factor then accounts for the systematic uncertainty on calibration and the radiative transfer forward model, although if this latter term is non-negligible then the vicariously calibrated gains will still be systematically biased (albeit less so for the application at hand). This has the advantage of transforming the calibration uncertainty from a systematic to a more random error source at the expense of creating dependence on the calibration source and radiative transfer model. There is therefore a danger in creating a circular dependence between the vicarious calibration and validation sources as it can hinder understanding of the physics behind observed biases. Further, this has the side effect of potentially increasing the level of systematic error in other quantities or in conditions significantly different from those found at the vicarious calibration location if the forward model contribution to systematic uncertainty is significant
The notation adopted herein is as follows. The AOD is denoted
Figure
Scatter density joint histogram (on a logarithmic scale) of the simulated expected uncertainties and retrieval errors in Fig.
An important nuance which bears repeating is that the distributions of estimated uncertainty and actual error in Fig.
When comparing satellite and reference data, the total expected discrepancy (ED) between the two for a single matchup, denoted
In the ideal case
The above distribution analyses are informative on the overall magnitude of retrieval errors compared to expectations (as well as, in the case of the PDF analysis, whether there is an overall bias on the retrieved AOD). However, alone they say little about the skill in assessing variations in uncertainty across the population. Taking things a step further, the data can be stratified in terms of ED and a quantile analysis performed to assess consistency with expectations. This is equivalent to taking a single location along the
Expected AOD discrepancy against percentiles of absolute AOD retrieval error. Symbols indicate binned results from the numerical simulation; within each bin, paler to darker tones indicate the 38th, 68th, and 95th percentiles (approximate
An example of this is shown in Fig.
The binned analysis is similar to the assessment of forecast calibration in meteorology
Figures
Here, the reference AOD
AERONET sites used and their categorization.
Data from a total of 12 AERONET sites, listed in Table
The reasons for identifying a particular site as complex are as follows. Over land, Ilorin (Nigeria) and Kanpur (India) can exhibit complicated mixtures of aerosols with distinct optical properties and vertical structure
This breakdown is inherently subjective as all retrievals involve approximations; the dozen sites chosen are illustrative of different aerosol and surface regimes but not necessarily indicative of global performance. The purpose of this study is to define and demonstrate the framework for evaluating pixel-level uncertainties and provide some recommendations for their provision and improvement. It is hoped that, with growing acceptance of the need to evaluate pixel-level uncertainties, this approach can be applied on a larger scale. The sites were chosen as they are fairly well-understood and have multi-year data sets (data from all available years were considered from the analysis). Note that some of the satellite data sets considered here do not provide data at some sites for various reasons (discussed later).
Example results of matchup and filtering criteria for MISR data at Ascension Island. Red points indicate matchups included for further analysis on the basis of filters described in the text, and grey indicates those excluded from analysis. Horizontal and vertical error bars indicate the
The matchup protocol is as follows. AERONET data are averaged within
These matchup criteria are stricter than what is commonly applied for AOD validation
This work considers satellite AOD products from seven algorithm teams; five of these contain both land and water retrievals (albeit sometimes with different algorithms), while two only cover land retrievals. Only pixels retrieved as land are used for comparison with AERONET data from land sites in Table
Four of the data sets (three land, one water) are derived from MODIS measurements; there are two MODIS sensors providing data since 2000 and 2002 on the Terra and Aqua satellites, respectively. The sensors have a 2330
The DB algorithm retrieves AOD only over land and was introduced to fill gaps in DT coverage due to bright surfaces such as deserts (although it has since been expanded to include vegetated land surfaces as well). The latest version is described by
BAR also performs retrievals only over land; it uses the same radiative transfer forward model as DT but reformulates the problem to retrieve the MAP solution of aerosol properties and surface reflectance simultaneously for all vegetated pixels in a single granule
For all MODIS products, data from the latest C6.1 are used. All products are provided at nominal (at-nadir)
The MISR sensor also flies on the Terra platform and consists of nine cameras viewing the Earth at different angles, with a fully overlapped swath width around 380
The ATSRs were dual-view instruments measuring near-simultaneously at nadir and near 55
ORAC is a generalized OE retrieval scheme which has been applied to multiple satellite instruments. Here, the version 4.01 ATSR2 and AATSR from the ESA CCI are used
ADV uses the ATSR dual view over land to retrieve the contribution to total AOD from each of three aerosol CCI components (with the fraction of the fourth dust component prescribed from a climatology) by assuming that the ratio of surface reflectance between the sensor's two views is spectrally flat. This has some similarity with the
Aside from pixel and/or swath differences, for both ADV and ORAC the implementation of the algorithms is the same for the three sensors. Matchups from the two (for ADV) or all three (for ORAC) sensors are combined here in the analysis to increase data volume due to the similarity in sensor characteristics and algorithm implementation. Note, however, that the difference in viewing directions between (A)ATSR and SLSTR (i.e. forward vs. rear) means that different scattering angle ranges are probed over the two hemispheres, which influences the geographic distributions of retrieval uncertainties. For both of these data sets, a large majority of matchups (75 % or more) obtained are with AATSR, as the ATSR2 mission ended before the AERONET network became as extensive as it is at present, and the SLSTR record to date is short. The results do not significantly change if only AATSR data are considered.
Unlike the other data sets considered here, the SEVIRI sensors fly on geostationary rather than polar-orbiting platforms. This analysis uses data from the first version of the CISAR algorithm
Site-to-site corrected sampling
CISAR is also an OE retrieval scheme, which in its SEVIRI application accumulates cloud-free measurements from three solar bands over a period of 5 d and simultaneously retrieves aerosol and surface properties, reporting at each SEVIRI time step. Surface reflectance is modelled following
Number of matchups obtained for each AERONET site and data set, together with climatological cloud fraction.
With the above criteria, the number of matchups
To make the counts more comparable between sites a sampling-corrected count
Evaluation of pixel-level uncertainty estimates for overland retrievals. Each row corresponds to a different AERONET site, and colours are used to distinguish data sets. The left-hand column shows a CDF of the absolute normalized retrieval error
As Fig.
Graphical evaluations of the pixel-level uncertainties are shown in Figs.
Mean and standard deviation of normalized error
A further way to look at the data is provided by Fig.
Calibration skill scores
Turning to the land sites (Fig.
For the complex land sites, the picture is different. At Ilorin, MODIS DB and ADV tend to overestimate uncertainty, while the others underestimate it. This site was chosen as a test case because of the complexity of its aerosol optical properties, which are more absorbing than assumed by many retrieval algorithms and can show large spatiotemporal heterogeneity due to a complex mix of sources
The most absorbing component in the MISR aerosol mixtures has an SSA of 0.80 at 558
The case at Pickle Lake is more diverse: similar to the straightforward sites, MODIS DT, DB, and BAR all overestimate uncertainty. ADV and MISR are fairly close to theoretical values; despite this, their skill scores are fairly low (Table
Aside from DB, DT, and MISR, skill scores (Table
For the water sites (Fig.
ADV and ORAC are more systematic in their underestimation of uncertainty over water compared to over land, although as the over-water errors are often fairly small in absolute terms, they appear fairly large in relative terms. This difference in the ATSR-based records between land and ocean sites is intriguing. ADV assumes 5 % uncertainty in the TOA signal, while ORAC includes separate measurement and forward model terms for a slightly lower total uncertainty overall (typically 3 %–4 % dependent on band and view), which in part explains ORAC's larger normalized errors. The common behaviour either implies (1) that the calibration of the sensors may be biased or more uncertain than expected for these fairly dark ocean scenes or (2) that the over-water surface reflectance models or (for ORAC) their uncertainties (either in their contribution to forward model error in
Despite the expected complexities at Cape Verde from mixtures of low-level sea spray and higher-altitude nonspherical mineral dust
Mbita is in some sense the inverse of the land site Pickle Lake, and similar comments apply. MODIS DT uncertainties are reasonable, although the data volume is fairly low relative to expectations from Fig.
Venice is sampled close to the expected rates by ADV, CISAR, MODIS DT, and ORAC (Fig.
Pixel-level uncertainty estimates in AOD products are an important complement to the retrievals themselves to allow users to make informed decisions about data use for data assimilation and other applications. Ideal estimates are prognostic (predictive), and these are increasingly being provided within data sets; when they are absent, diagnostic estimates can be used as a stopgap. This study has reviewed existing diagnostic and prognostic approaches, provided a framework for their evaluation against AERONET data, and demonstrated this framework using a variety of satellite data products and AERONET sites. It is hoped that this methodology can be adopted by the broader community as an additional component of data product validation efforts. Several conclusions about the performance of these existing estimates follow.
All tested techniques show skill in some situations (in that the association between estimated uncertainty and observed error is positive, and on average magnitudes are reasonable), although none are perfect, and there is no clear single best technique. Small data volumes for some sensors and locations limit the extent to which performance in the high-uncertainty regime can be probed. The points in Fig. While skilful, the uncertainties are not always well-calibrated; i.e. they are often systematically too large or too small. If characterization of the error budgets of the retrievals cannot be significantly improved, it is plausible that a simple scaling (using e.g. averages of the standard deviations on the The formal error propagation techniques (employed here by BAR, CISAR, and ORAC) are very powerful. Their differing behaviour and performance illustrate the difficulties in appropriately quantifying terms for the forward model, a priori covariance matrices, and appropriate smoothness constraints. For these sites, CISAR tends to overestimate the uncertainty most strongly, BAR to overestimate slightly, and ORAC to underestimate (more strongly over water than land). The simpler approach taken by ADV (Jacobians from a flat 5 % error on TOA reflectance) tends to be about right over land but also underestimates the true uncertainty over water. The empirical validation-based MODIS DB approach works well but on average overestimates the total uncertainty and at these sites has little bias overall. That may indicate that the sites used here are coincidentally better-performing than the global results used to fit the expression. This points to the fact that the expression (which draws on AOD, geometry, quality flag, and surface types) captures many, but not all, of the factors relevant for quantifying total uncertainty. The diagnostic MODIS DT approaches perform reasonably well if used instead as prognostic uncertainty estimates; they have a tendency to be insufficiently confident (overestimate uncertainty) on the low end and overconfident (underestimate uncertainty) on the high end. Despite the possibility for unphysical negative AOD retrievals in the DT land product, both land and ocean results indicate a systematic positive bias in the retrievals. MISR's two approaches (applied for land and water surfaces) are both based on diversity between different candidate aerosol optical models. They both perform well at most sites, although they have a tendency to underestimate the total uncertainty slightly. The implication from this is that the diversity in AOD retrievals from different candidate optical models does capture the leading cause of uncertainty in the MISR retrievals. The fact that they are underestimates does imply at least one remaining important factor which is not captured by this diversity, which could perhaps be a systematic error source such as a calibration or retrieval forward model bias.
More broadly, these results suggest paths for the development and refinement of pixel-level AOD uncertainty estimates for existing and new data sets. For algorithms attempting AOD retrievals from multiple candidate aerosol optical models, the diversity in retrieved AOD between these different models could be a good proxy for part of the retrieval uncertainty. The MODIS DT ocean and ORAC algorithms both perform retrievals for multiple optical models. As ORAC is already an OE retrieval, this aerosol-model-related uncertainty is one of the few components not directly included in the existing error budget, so it could perhaps be added in quadrature to the existing uncertainty estimate. MODIS DT provides only a diagnostic AOD uncertainty estimate; diversity between possible solutions (which draw from 20 possible combinations of four fine modes and five coarse modes) could be explored as a first-order prognostic extension or replacement of that. One caveat is that this metric is only useful when the candidate set of optical models is representative; results at Ilorin, where aerosol absorption is often stronger than assumed in retrieval algorithms and the MISR approach does not perform well, illustrate that this is not always the case.
A general principle behind the error propagation techniques is the assumption of Gaussian departures from some underlying forward model. When this is not true, the techniques tend to fail. The Ilorin case is one such example of this. Another is the higher-level issue of coastal or lake areas, as most algorithms make binary retrieval decisions with non-linear implications (e.g. treat pixel as land or water for surface reflectance modelling), which cause problems if pixels are either misflagged or “contaminated” and contain mixed water or land. The algorithms tested here tend to deal with this in one of two ways. The first is simply to fail to provide a valid retrieval at all; in this case, the uncertainty estimates for available retrievals tend to be reasonable, although the data volume is significantly less than expected. The second option is to provide a retrieval but consequently provide a poor estimate (and typically an underestimate) of the associated uncertainty. Neither is entirely satisfactory. Performing retrievals at a higher spatial resolution with strict filtering might ameliorate these issues, as a smaller fraction might then be contaminated or misflagged; however, the resolutions of the sensor measurements and land mask (and its quality) place hard constraints on what could be achieved. Another option might be to attempt retrievals using both land and water algorithms for these pixels and either report both or an average (including the difference between them as an additional contribution to the uncertainty estimate). This would provide some measure of the potential effect of surface misclassification and at the least provide a larger uncertainty estimate to alert the data user about problematic retrieval conditions. A deeper understanding of the representativity of AERONET sites on satellite retrieval scales would be useful to better understand the distributions of retrieval success rates and errors. This is a topic of current research
A further difficulty in the assumption of Gaussian random errors is that sensor calibration uncertainty tends to be dominated by systematic effects rather than random noise. While in practice it is often (as in the algorithms assessed here) treated as a random error source, when it is a dominant contribution to the retrieval error budget it will tend to skew the retrievals toward one end of the notional uncertainty envelopes. This may explain some of the systematic behaviour along the
The framework for evaluating uncertainties here is general and not restricted to AOD. In practice, however, it is difficult to extend it to other aerosol-related quantities at the present time. For profiling data sets (such as lidar), uncertainties in extinction profiles are often strongly vertically correlated as the effects of assumptions propagate down the profile
For the total column, other key quantities of interest include the Ångström exponent (AE), fine-mode fraction (FMF) of AOD, and aerosol SSA. The AE can easily be assessed using this framework, although AERONET AE itself can be quite uncertain in the low-AOD conditions which predominate in many locations around the globe
Issues with SSA are somewhat more difficult; AERONET almucantar inversions have an uncertainty in SSA around
AERONET data are available from
AMS conceptualized the study, provided MODIS DB data, performed the analysis, and led the writing of the paper. ACP provided ORAC data. PK, AL, and TM provided ADV and BAR data. FP provided MODIS DT data. MW provided MISR data. YG and ML provided CISAR data. TP and KS provided general guidance and insight through ESA aerosol CCI and AeroSat validation and uncertainty characterization activities; TP also contributed significantly to table outlining and referencing approaches to uncertainty characterization. All authors contributed to editing the paper.
The authors declare that they have no conflict of interest.
The work of lead author Andrew M. Sayer was performed as part of development for the forthcoming NASA Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission (
This research has been supported by the NASA.
This paper was edited by Alexander Kokhanovsky and reviewed by three anonymous referees.