Tropospheric Emission Spectrometer ( TES ) satellite observations of ammonia , methanol , formic acid , and carbon monoxide over the Canadian oil sands : validation and model evaluation

The wealth of air quality information provided by satellite infrared observations of ammonia (NH3), carbon monoxide (CO), formic acid (HCOOH), and methanol (CH3OH) is currently being explored and used for a number of applications, especially at regional or global scales. These applications include air quality monitoring, trend analysis, emissions, and model evaluation. This study provides one of the first direct validations of Tropospheric Emission Spectrometer (TES) satellite-retrieved profiles of NH3, CH3OH, and HCOOH through comparisons with coincident aircraft profiles. The comparisons are performed over the Canadian oil sands region during the intensive field campaign (August– September, 2013) in support of the Joint Canada–Alberta Implementation Plan for Oil Sands Monitoring (JOSM). The satellite/aircraft comparisons over this region during this period produced errors of (i) +0.08± 0.25 ppbv for NH3, (ii)+7.5± 23 ppbv for CO, (iii)+0.19± 0.46 ppbv for HCOOH, and (iv) −1.1± 0.39 ppbv for CH3OH. These values mostly agree with previously estimated retrieval errors; however, the relatively large negative bias in CH3OH and the significantly greater positive bias for larger HCOOH and CO values observed during this study warrant further investigation. Satellite and aircraft ammonia observations during the field campaign are also used in an initial effort to perform preliminary evaluations of Environment Canada’s Global Environmental Multi-scale – Modelling Air quality and CHemistry (GEM-MACH) air quality modelling system at high resolution (2.5× 2.5 km). These initial results indicate a model underprediction of∼ 0.6 ppbv (∼ 60 %) for NH3, during the field campaign period. The TES/model CO comparison differences are ∼+20 ppbv (∼+20 %), but given that under these conditions the TES/aircraft comparisons also show a small positive TES CO bias indicates that the overall model underprediction of CO is closer to ∼ 10 % at 681 hPa (∼ 3 km) during this period. Published by Copernicus Publications on behalf of the European Geosciences Union. 5190 M. W. Shephard et al.: Tropospheric Emission Spectrometer (TES) satellite observations


Introduction
There is a total of more than ∼ 170 billion barrels (∼ 2.7 × 10 7 m 3 ) of proven fossil fuel reserves in the Canadian oil sands region in Alberta, Canada.The bitumen (thick and sticky oil often mixed with sand, water, or clay) located near the surface may be surface mined, but the deeper deposits need to be extracted through different stimulation methods, either by heating or water flooding, and then pumped to the surface.The Canadian Association of Petroleum Producers (CAPP) states that production from the oil sands has grown from 0.1 mBPD (million barrels per day) in 1980 to 1.8 mBPD in 2012 and is expected to more than double and reach 4.5 mBPD by the year 2025 (CAPP, 2013).With this increasing growth come increasing environmental and health concerns associated with the petroleum extraction development and operations (e.g. Kelly et al., 2010), including air quality issues and acid deposition.Despite these concerns, there are relatively few current published studies on air quality monitoring from the Canadian oil sands (e.g.Percy et al., 2013;Gordon et al., 2015;McLinden et al., 2012McLinden et al., , 2014)), and additional monitoring and emission modelling efforts are required to better understand the oil sands emissions and their impacts.To help address this need, the joint Canada and Alberta plan for monitoring of the air, water, and wildlife in and around the oil sands was created (Abbatt et al., 2011).Satellite observations are included in this effort.
Satellite observations can provide regional and global scale coverage over relatively long time periods (typically over a 5-15 year time period for a single sensor).They provide unique observations for air quality monitoring in and around the Canadian oil sands, as has previously been demonstrated by the NASA Aura Ozone Monitoring Instrument (OMI) nitrogen dioxide (NO 2 ) and sulphur dioxide (SO 2 ) measurements (McLinden et al., 2012(McLinden et al., , 2014)).The Aura Tropospheric Emission Spectrometer (TES) sensor has also been collecting special observations directly over the oil sands petroleum extraction regions of Alberta, Canada, for more than 2 years.These special satellite observations in conjunction with specifically designed coincident aircraft vertical profile measurements over the oil sands provide a rare opportunity for direct validation of satellite NH 3 , HCOOH and CH 3 OH retrievals, and evaluation of air quality model simulations (e.g.emissions and processes) of ammonia.
Ammonia is a short-lived gas, often only residing in the atmosphere from hours to a day (Seinfeld and Pandis, 1998;Aneja et al., 2001).It is an important base that reacts in the atmosphere with sulphuric acid (H 2 SO 4 ) and nitric acid (HNO 3 ) to form ammonium sulphate ((NH 4 ) 2 SO 4 ) and ammonium nitrate (NH 4 NO 3 ) respectively, which are significant constituents of fine particulate matter (PM 2.5 ).In this aerosol phase NH 3 can last from days to several weeks in the atmosphere (Galperin and Sofiev, 1998;Park et al., 2004;Paulot et al., 2014) and can be responsible for long-range transport of reactive nitrogen on the order of 100s of km (Galloway et al., 2008).Deposition of these aerosols can lead to soil acidification (e.g.Carfrae et al., 2004).Despite ammonia's contribution to adverse health impacts (e.g. Lee et al., 2015), climate radiative forcing by aerosols, and playing a significant environmental role in the deposition of reactive nitrogen, historically anthropogenic emissions of NH 3 have largely been unregulated.The lack of regulation has contributed to the lack of observations and large uncertainties in our knowledge of ammonia emissions.Global ammonia emissions levels have increased several fold since preindustrial times and are the only precursor source of ambient aerosol particles whose global emissions are projected to rise throughout the next century (Moss et al., 2010;Lamarque et al., 2010;Ciais et al., 2013).Thus, ammonia is expected to play an even more significant role in the future in determining air quality, climate change, and environmental degradation.Recent satellite observations are providing valuable insight on ammonia concentrations and emissions both on regional and global scales (e.g.Beer et al., 2008;Clarisse et al., 2009;Shephard et al., 2011;Shephard and Cady-Pereira, 2015;Van Damme et al., 2014;Zhu et al., 2013).Furthermore, ammonia and particulate matter are listed as Canadian criteria air contaminants (CACs; Environment Canada, 2013) in order to help address air quality issues such as smog and acid rain.
Methanol (CH 3 OH) is the most abundant non-methane volatile organic compound (VOC) and a source (precursor) of carbon monoxide (CO), formaldehyde (HCHO), and tropospheric ozone (O 3 ) through secondary photochemical production (Singh et al., 1995(Singh et al., , 2001;;Tie et al., 2003;Millet et al., 2006;Duncan et al., 2007;Choi et al., 2010, Hu et al., 2011).The main source of methanol emissions on the global scale is generally considered to be terrestrial plants (Millet et al., 2008a;Stavrakou et al., 2011;Guenther et al., 2012) during cell wall growth (Fall and Benson, 1996;Fall, 2003) with other more minor sources being biomass burning (Holzinger et al., 1999;Andreae and Merlet, 2001) and anthropogenic emissions (Holzinger et al., 1999;de Gouw et al., 2005;Hu et al., 2011), which can be important at regional scales.Methanol plays a pronounced photochemical role early in the plant growth seasons when its emissions are high and when isoprene emissions are still relatively low (Wells et al., 2012).For example, Wells et al. (2014) showed that in April in the northern midlatitudes methanol contributes up to 25 % of the secondary production of CO and HCHO; with the later onset of the growing season in the more northern boreal regions methanol can contribute up to ∼ 50 % of the local CO and HCHO production.The lifetime of methanol in the atmosphere is on the order of 5-6 days (Millet et al., 2008b;Stavrakou et al., 2011), which is why methanol is generally more abundant than isoprene in the atmosphere, which can have up to 4 times greater emissions but has a lifetime of just hours (Paulot et al., 2012;Xie et al., 2013).Formic acid (HCOOH) is a dominant source of atmospheric acidity and is the dominant contributor (60-80 %) to acid rain over boreal forest regions (i.e.surrounding the oil sands operations; Stavrakou et al., 2012).Thus, it is important for pH-dependent processes in the atmosphere.The main source of atmospheric formic acid is secondary photochemical production (Millet et al., 2015) from precursors including isoprene, monoterpenes, other terminal alkenes (e.g.Neeb et al., 1997;Lee et al., 2006;Paulot et al., 2011), andalkynes (Hatakeyama et al., 1986;Bohn et al., 1996).Direct emissions of formic acid are thought to be smaller and include biomass and biofuel burning (e.g.Goode et al., 2000), biogenic emissions from plants and soils (e.g.Kesselmeier et al., 1998;Kuhn et al., 2002;Jardine et al., 2011;Sanhueza and Andreae, 1991), agriculture (e.g.Ngwabie et al., 2008), and urban emissions (e.g.Kawamura et al., 1985;Talbot et al., 1988).Formic acid is a major contributor to acid rain in remote environments (Keene and Galloway, 1988;Andreae et al., 1988) and reduces the pH in rainwater by 0.25-0.5 units over boreal forests and Amazonia in the summertime, accounting for as much as 60-80 % of the rainwater acidity over these remote regions in the summer (Stavrakou et al., 2012).The average lifetime of formic acid is ∼ 3-4 days (Stavrakou et al., 2012), and it is mainly removed through wet and dry deposition.For regions where there is also a significant source of dust, such as fugitive dust from large transport vehicles in the oil sands mining locations (Watson et al., 2014), there can be an irreversible uptake of formic acid on dust (Falkovich et al., 2004;Hatch et al., 2007;Paulot et al., 2011).Recent work has shown that the atmospheric abundance of formic acid is much larger than expected based on current knowledge of its budget (Millet et al., 2015;Stavrakou et al., 2012;Paulot et al., 2011).The fact that the discrepancy is widespread, manifesting over forests (Stavrakou et al., 2012), cities (Le Breton et al., 2012;Yuan et al., 2015), oil and gas fields (Yuan et al., 2015), and in the free troposphere (Paulot et al., 2011), implies a key gap in present understanding and the presence of one or more substantial missing sources (Millet et al., 2015).
Carbon monoxide (CO) is one of the primary atmospheric pollutants and listed as a Canadian CACs (Environment Canada, 2013).CO is a colourless toxic gas that can have severe effects on human health (e.g.Burnett et al., 1998a, b).The role of CO in tropospheric chemistry and climate is well established (Logan et al., 1981;Shindell et al., 2006).In addition to its photochemical source from the oxidation of methane and other VOCs, sources of CO is incomplete combustion, which occurs in open fires, domestic biofuel use, vehicle use, and industrial activities.Reaction with the hydroxyl radical (OH) is the main removal process for CO.The lifetime of CO is a few weeks in mid-to-high latitudes, long enough to allow intercontinental transport.Satellite observations of global CO have been made by multiple sensors over the past decades (Deeter et al., 2014;McMillan et al., 2011;Luo et al., 2007a;George et al., 2009).
In addition to the information provided by each satelliteretrieved species on its own, the relatively short-lived species including ammonia, methanol, and formic acid can be used with other simultaneously retrieved species to provide ratios (tracers) that can be used for identifying and constraining sources (i.e.biomass burning or biogenic emissions; e.g.Coheur et al., 2009;Wells et al., 2014;Luo et al., 2015); if the species has a longer lifetime, as does CO, the ratios can also be used for determining loss rates.As an example of source identification, a high correlation between HCOOH and CH 3 OH along with a weak correlation between HCOOH and CO might indicate a dominance of biogenic emissions over a region, season, or episode.
Satellite observations of these species are inferred from measured spectral radiances, which generally require a complex retrieval inversion process with assumptions on the profile shape and its variability (e.g.Bowman et al., 2006;Shephard et al., 2011;Cady-Pereira et al., 2012, 2014).The available retrieval information from these species is limited as the infrared spectral signal is often less than 0.3 % (or less < 1 K brightness temperature) of the total background signal (on the order of 300 K brightness temperature).Thus, these satellite retrievals can be challenging and require validation against other available observations.To this end, aircraft observations from the intensive Joint Canada-Alberta Oil Sands Monitoring (JOSM) air component field campaign held over the oil sands region during August and September of 2013 are used.One of the goals of the aircraft campaign was to validate satellite observations with coincident aircraft in situ observations in order to obtain better estimates on the capabilities and errors of the satellite retrievals in this environment.
In general it is inherently difficult to validate the satellite data with in situ observations due to the nature of the remote sensing sampling, especially if the species of interest varies significantly in space or time in the atmosphere (e.g.Shephard et al., 2008a).Also, in situ vertical profile measurements of these compounds from aircraft require fast response instrumentation that has not been available until recently.Thus, to date there have been relatively few coincident "validation" profiles for the more recently developed NH 3 , CH 3 OH, and HCOOH retrieval algorithms that can be used to directly evaluate the TES satellite performance.The evaluations of these molecules performed thus far, although very useful, can be seen as more qualitative or "indirect" comparisons due to sampling differences (e.g.surface vs. boundary layer) or non-coincident observations (i.e.Pinder et al., 2011;Wells et al., 2012;Shephard andCady-Pereira, 2015, Sun et al., 2015).There have also been some general NH 3 , CH 3 OH, and HCOOH comparisons between satellites such as TES and infrared atmospheric sounding interferometer (i.e.Clarisse et al., 2010;Wells et al., 2012) and TES and Cross-track Infrared Sounder (CrIS) (Shephard and Cady-Pereira, 2015).Presented in this analysis are direct comparisons of TES-retrieved NH 3 , CH 3 OH, and HCOOH profiles, along with CO, with the coincident aircraft profile observations over a small targeted oil sands region during early September of 2013.These direct satellite/aircraft comparisons provide actual error values in terms of bias and uncertainties that are used to evaluate the estimated errors reported for the TES operational retrieval (e.g.observational error) and from simulations (Shephard et al., 2011;Cady-Pereira et al., 2012, 2014) under conditions representative of summertime/autumn atmospheric conditions over oil sands region.
Also provided are initial TES comparisons against Environment Canada's Global Environmental Multi-scale -Modelling Air quality and CHemistry (GEM-MACH) model (Makar et al., 2015a, b) runs simulated at a high resolution of 2.5 × 2.5 km 2 over the oil sands region during the JOSM field campaign.In addition, GEM-MACH is also being used by Environment Canada to provide ongoing experimental air quality forecasts at 2.5km resolution for a domain covering Alberta and Saskatchewan.The model is also being used for human health and acidifying deposition impacts of oil sands sources -hence obtaining accurate emissions of these trace gases are essential for the success of the model simulations.
Here we demonstrate the satellite's potential value in evaluating the model performance for these trace gases.

Satellite
TES is a Fourier transform spectrometer (FTS) sensor that was launched on the NASA Aura satellite on 15 July 2004 (Beer et al., 2001).It is a well calibrated high-spectralresolution FTS (0.10 cm −1 apodized) instrument with four bands spanning the infrared spectral region from 650 to 2250 cm −1 with good radiometric accuracy (Worden et al., 2006;Shephard et al., 2008b;Connor et al., 2011).It has the capability to simultaneously observe a number of species with atmospheric signatures in the infrared portion of the spectrum.
In addition to TES's original standard products, NH 3 (implemented in Version 5), CH 3 OH, and HCOOH (implemented in Version 6) have relatively recently become standard TES operational product additions.The TES-retrieved products are generated using an optimal estimation retrieval method (Bowman et al., 2006).The specific retrieval details as well as retrieval characteristics for NH 3 , CH 3 OH, and HCOOH are provided in Shephard et al. (2011), Cady-Pereira et al. (2012), and Cady-Pereira et al. (2014) respectively.Some general summary characteristics of all three of these retrievals are provided here.Due to the relatively weak atmospheric signal of NH 3 , CH 3 OH, and HCOOH in the infrared spectra, the individual retrievals generally provide at most ∼ 1 independent piece of information (represented by degrees of freedom for signal (DOFS)).CO typically has slightly more information but still less than 2 DOFS.These retrievals are most sensitive to atmospheric concentrations in the lower troposphere, generally between 900 and 600 hPa (1-4 km; Luo et al., 2007a, b).
As presently there are no actual errors based on direct profile comparisons for NH 3 , CH 3 OH, and HCOOH, we summarize the current estimated retrieval errors for these species in Table 1.Most of these values were obtained from Observing System Simulation Experiments (OSSE) simulations.For HCOOH we also present statistics of the estimated retrieval errors from the set of TES Global Surveys from July 2009.The OSSE simulations have the advantage of a known true state with which to compare; however, they do not include additional systematic errors (i.e.spectroscopic errors, propagation errors (e.g.temperature), interfering species, etc.), which generally results in an underestimate of the true error.CO is one of the original TES standard products and has been more extensively evaluated against in situ measurements and data from other satellites (Luo et al., 2007a, b;Lopez et al., 2008) deriving actual errors.These previous studies show that the TES-retrieved lower-to-mid-tropospheric CO is biased slightly low compared to other satellite measurements but within the variability and the observation errors of all the data analysed.The estimated TES CO retrieval error is 10-20 % in the troposphere.
Another characteristic that needs to be taken into consideration for satellite infrared retrievals of NH 3 , CH 3 OH, and HCOOH are the minimum detection limit, since background surface concentrations for these species can be below 1 ppbv.The minimum detection limit for NH 3 is a profile with a surface value of at least ∼ 1 ppbv, which given the typical profile shape when the concentrations decrease sharply in altitude, corresponds to a profile value of ∼ 0.4 ppbv at ∼ 825 hPa where the TES NH 3 observations are typically most sensitive (Shephard et al., 2011).TES's minimum detection limit for HCOOH is a peak profile value of ∼ 0.7 ppbv (typically at the surface), under conditions with good thermal contrast between the atmosphere and surface (∼ 10 K), with reduced sensitivity under conditions with weaker thermal contrast (Cady-Pereira et al., 2014).Similarly, the minimum detection limit for CH 3 OH is a profile with a peak value of ∼ 0.5-1 ppbv (Cady-Pereira et al., 2012).
This analysis mainly focuses on the TES satellite observations of NH 3 , CO, HCOOH, and CH 3 OH observations over the Canadian oil sands region.TES started performing special oil sands transect observations on 14 July 2012 and in general makes a special observation over the Canadian oils sands region every 2 to 7 days depending on the TES observation schedule.The transects consist of 20 consecutive 5 × 8 km 2 pixels spanning 240 km in a nearly south/north direction centred on the surface mining region near Fort MacKay in Alberta, Canada (e.g.see pixels in Figs. 1 or 2).

Aircraft
During  files near Fort MacKay, Alberta, that were coincident with the TES overpass for satellite validation purposes occurred on 3 September (flight 18) and 5 September (flight 20) in 2013.These days were selected during the campaign for periods when there were scheduled TES oil sands special observations, and the atmosphere was relatively cloud free.Figure 1 shows the flight tracks coloured as a function of relative aircraft altitude for flights 18 and 20.Since the TES special oil sands transects were designed so that the oil sands surface mining region was near the middle of the ∼ 240 km transect, the spatial difference between the aircraft and the furthest pixel in the TES transect was < ∼ 120 km at the TES overpass time.Flight 20 was designed as a "transformation flight" in which the aircraft sampled the same air mass leaving the main oil sands refining facilities starting at the TES overpass time and sampled several times downwind throughout the afternoon.This resulted in some aircraft observations being further than 120 km away, but none more than ∼ 200 km away.Limited-altitude (partial) profiles provided additional data that can be compared with the satellite observations.De-tails of the aircraft observations for the satellite species validated in this analysis are provided in the subsequent sections.

Ammonia (NH 3 )
NH 3 measurements were conducted with a dual quantum cascade laser trace gas monitor (Aerodyne Inc., Billerica, MA, USA; McManus et al., 2008), collecting data at 1 Hz.Outside air was sampled through a heated Teflon inlet tube shared with a high-resolution time-of-flight chemical ionization mass spectrometer (HR-ToF-CIMS; see Sect.2.2.4); the flow rate through the QCL was 10.8 L min −1 .The response time was approximately 60 s.Calibrations were performed before, once during, and after the project using a zero-air generator (Sabio, Model 1001, Georgetown, TX) and permeation tubes with known release rates (Vici Metronics, Poulsbo, WA).In-flight zero checks were done before, 2-3 times during, and after each flight by switching the flow from the inlet to an activated charcoal scrubber (model Junior King, Koby, Marlboro, MA).The average ammonia volume mixing ratio was 1.2 ± 0.2 (standard deviation) ppbv, with a median of 1.0 ppbv.The lower quartile was 0.5 ppbv and the upper quartile 1.7 ppbv.The aircraft data for the whole project were compared with a stationary surface NH 3 instrument running simultaneously near Fort MacKay (ambient ion monitor/ion chromatograph; J. Murphy, personal communication, 2015), with the distributions of the surface and aircraft mixing ratios comparing well (not shown).Aircraft data gaps for NH 3 occurred during flight 18 and parts of flight 20 due to instrumental problems in flight.The 1 sigma uncertainty for a 1 Hz measurement during flight 20 is estimated to be ±0.3 ppbv (∼ ±35 %).

Carbon monoxide (CO)
CO measurements were made with an off-axis integrated cavity output spectrometer (CO-23r; Los Gatos Research Inc., Mountain View, CA; Provencal et al., 2005) at 2 Hz, and averaged to 1 Hz.CO mixing ratios for the project ranged from 74 to 774 ppbv with a mean of 110 ± 20 (standard deviation) ppbv and a median of 107 ppbv.The lower quartile (25 %) was 96 ppbv and the upper quartile (75 %) 119 ppbv.
Based on instrument calibrations the CO measurements can have a bias error up to 2 ppbv and a 1 sigma standard deviation of v0.5 ppbv.

Methanol (CH 3 OH)
A proton transfer-time of flight-mass spectrometer (PTR-ToF-MS, Ionicon Analytik) was used to measure VOCs on the aircraft.Details of the PTR-ToF-MS technique have been described previously (Jordan et al., 2009;Graus et al., 2010).Briefly, this instrument uses soft ionization of target VOC compounds with H 3 O+ as the reagent ion.Methanol was detected as CH 3 OH (H+) at m/z 33.03.VOC data were collected at a sampling rate of 0.5 Hz.During the flights, the PTR drift tube pressure and temperature were maintained constant at 2.15 mbar and 60 • C respectively.Ambient air was sampled through a 6.35 mm (1/4 inch) Teflon tube at a flow rate of 6 L per minute.A portion of this ambient air (270 L min −1 ) was drawn into the PTR inlet at standard pressure and temperature.The response time for the instrument was 2 s.Instrumental backgrounds were determined using a custom-built zero-air generating unit containing a catalytic converter heated to 350v with a continuous flow of 1 L per minute ambient air.The catalyst removed methanol and other VOCs from the ambient air while maintaining the humidity of the sampled air.A total of four instrument zeros were sampled during each flight for 5 min each.Zeros were interpolated and subtracted from the methanol peak.Methanol was calibrated on the ground with a 1.01 ppm gas standard mixture containing 17 VOCs (Ionimed) diluted with zero air.The detection limit for methanol, defined as 2 times the standard deviation of the blank catalyst value, was 0.64 ppbv.The uncertainty in the aircraft CH 3 OH observations during this period is ∼ ±20 %.The data were processed using TOFWARE (Tofwerk AG, Switzerland) with peak fitting that is able to accurately integrate and separate the methanol peak from adjacent peaks and from the baseline.This method was previously described by Moussa et al. (2015).Above 5800 m altitudes, the PTR-ToF-MS was unable to maintain a constant drift pressure and therefore data collected while the aircraft flew above this altitude were removed and reported as invalid.

Formic acid (HCOOH)
Formic acid measurements were conducted with an HR-ToF-CIMS (Aerodyne Research Inc.) using acetate reagent ion (A-CIMS).A detailed description of the instrument and principles of operation have been given elsewhere (Bertram et al., 2011;Lee et al, 2014).To reduce the residence time in the overall sampling manifold, the total flow was maintained at > 15 L min −1 , resulting in a residence time of less than 1 s.Instrumental backgrounds were determined 3-5 times per flight for a duration of 5 min each by diverting the sample flow through dual acidic gas traps.Calibrations of formic acid were conducted both in the field and post-study using a liquid calibration unit (Ionimed Analytic), which provided stable gas streams of analyte by volatilizing a known aqueous standard of formic acid.A constant flow of 1 mL min −1 containing a known gaseous concentration of isotopically labelled formic acid (C 13 ) was also introduced into the A-CIMS to correct for any dynamic fluctuations in response factors.The detection limit for formic acid defined as 2 times the standard deviation of the blank t value was estimated to be 20 pptv, with a 2 s time resolution.At higher altitudes (> ≈ 1500 m), the pressure of the ion-molecule reaction region of the chemical ionization mass spectrometer (CIMS) could not be reliably controlled due to pumping limitations, resulting in portions of the data at upper altitudes being inval-idated and not available for the satellite comparisons.The uncertainty (1 σ ) in the CIMS HCOOH is primarily contributed by the uncertainty in derived response factors (±10-15 %), although other factors may introduce unknown systematic biases that have not been fully quantified.Such factors include variations in flow, pressure and temperature, transmission through lines, degradation of calibration standards, and uncertainty in fitting mass spectral peaks in software.The overall uncertainty is estimated to be ∼ 20-25 %.

Global Environmental Multi-scale -Modelling Air quality and CHemistry model
The model used by Environment Canada for the JOSM oil sands simulations is GEM-MACH.GEM-MACH is a comprehensive air quality simulation system which operates in an online configuration with Environment Canada's meteorological forecast model (GEM).It was first described in Moran et al. (2010), and a recent intercomparison between GEM-MACH and other air quality models using annual observations can be found in Im et al. (2015a, b) and Makar et al. (2015a, b).Note that the direct and indirect aerosol feedback effects were not included in these simulations.A threelevel nested grid version of GEM-MACH model is used in the simulations over Canadian oil sands region, where the innermost and highest-resolution grid has a grid size of 643 × 544 with a spatial resolution of 2.5 × 2.5 km 2 covering the provinces of Alberta and Saskatchewan (domain of 2 186 200 km 2 ).The time steps for the high-resolution simulations were 2 min for the chemistry and 1 min for the meteorology.Formic acid and methanol are lumped model VOC species in GEM-MACH; therefore they are not specifically modelled and readily available for evaluation against satellite observations.For these initial comparisons we focused on GEM-MACH ammonia and carbon monoxide simulations over the oil sands region.GEM-MACH anthropogenic emissions, including ammonia and carbon monoxide, are generated using the Sparse Matrix Operator Kernel Emissions (SMOKE) emissions processing system (Houyoux et al., 2000;CEP, 2003) Makar et al. (2009).These high-resolution GEM-MACH oil sands runs did not include any biomass burning or natural emissions sources and presently do not include an ammonia bidirectional flux.
3 Satellite and aircraft comparisons

Comparison methodology
The comparison approach selected depends on the goals of the study and the quantities being compared.Since the main goal here is to validate just the retrieved information provided by satellite measurements it is often desirable to perform a profile comparison using the satellite observation operator, especially for species with limited information content.This approach provides direct comparisons of the satelliteretrieved quantities by taking into consideration the reduced vertical resolution of the retrieved values, as well as removing the influence of the a priori information (e.g.profile shape) used in the inversion of the satellite observed radiances to concentration values at each level.Alternatively, if the comparison is performed on the retrieved profiles (observed atmospheric state + a priori) without taking into consideration the a priori profile, x a , one would get a different comparison result for each selected a priori profile, which can easily be changed even post-retrieval (Rodgers and Conner, 2003;Kulawik et al., 2008).Since the TES retrievals use an optimal estimation approach this direct comparison is achieved in a straightforward manner by applying the satellite observation operator to the comparison profile, x c .The observation operator applies the a priori vector, x a , used in the retrieval and the satellite-retrieved averaging kernel, A, which specifies the satellite sensitivity and vertical resolution (halfwidth-of-half-maximum of the rows of the averaging kernel).This method generates an estimated profile, x est c , representing what the satellite would measure for the atmospheric profile sampled by either the aircraft or model mapped onto the retrieval pressure levels, x mapped c , with the following operation: Thus, differencing x est c and the retrieved profile, x, removes the effect of the a priori, with the remaining differences presumed to be associated with the satellite measurement error on the retrieval and systematic errors resulting from parameters that were not well represented in the radiative transfer forward model (e.g.temperature errors, interfering gases, spectroscopic errors, and instrument calibration).
There are typically greater than 200 instantaneous aircraft observations being averaged onto each coarse satellite profile level used in these comparisons.Thus, assuming uncorrelated aircraft observations with similar levels of uncertainty, the weighted mean aircraft values, x mapped c , used in these satellite comparisons have at least a ∼ 10× reduction in the single value uncertainties (reported in Sect.2.2).This reduces the uncertainties in the aircraft-estimated comparison profile, x est c , down to a few percent, rendering them much less than the satellite uncertainties and allowing them to be neglected in the satellite/aircraft comparison differences.Also, for this analysis we assume the in situ aircraft data are unbiased and attribute any systematic differences in the satellite/aircraft comparisons to satellite biases.

Back trajectories
Flight 20 on 5 September 2013 was a transformation flight where the plume from the oil sands surface mining region was tracked and sampled downwind from the TES overpass time for several hours.In order to match the instantaneous satellite overpass observations along the ∼ 240 km transect with the aircraft observations, we performed model back trajectories from the aircraft "profiles".The model trajectories were computed using the Canadian Meteorological Centre's trajectory model procedure (Environment Canada, 2012), which use the 3-D wind field output of a numerical weather prediction model.These trajectories used the wind fields from the 2.5 km GEM-MACH model.The model was run with a 2 min time step; thus, each trajectory includes latitude, longitude, and altitude information every 2 min up to 2 days prior to the trajectory arrival time.These back trajectories in Fig. 2 show that the aircraft profiles, noted by letters in measurement succession, are sampling the same air mass as it is advected along during the afternoon going eastward from Alberta into Saskatchewan.The back trajectories bracketing the bottom to the ∼ 750 hPa levels of each aircraft profile show that the profiles span approximately six of the TES satellite footprints as the aircraft approaches Saskatchewan.This would indicate that for flight 20 the aircraft profiles would match up best with the available TES observations from the 9-14 pixels counting from the south.However, it should be noted that for the coincident aircraft spirals timed with the TES overpass times, the spatial difference between the aircraft and the furthest pixel in the TES transect is still < ∼ 120 km for any TES pixel.

Altitude comparisons
The aircraft profiles used in the comparison are the two coincident spirals from flight 18 and the five sets of upward and downward profiles consisting of the high-altitude spirals at the TES overpass time and four smaller lower-altitude partial sets of profiles later in the afternoon from flight 20.Each of these aircraft profiles were compared against as many valid TES-retrieved profiles as possible that were ≤ ∼ 35 km away from the spirals at the TES overpass time for both flights.In addition, the six TES pixels deemed suitable based on the back-trajectory analysis in Fig. 2 were compared against the partial aircraft profiles sampling the same plume downwind later in the afternoon for flight 20.Note, for ammonia there are no aircraft measurements available for flight 18.
Although back trajectories were conducted to provide guidance on the spatial and temporal coincidence criteria for the comparisons, the lack of variability of these short-lived species over this region during these two flights greatly re-duces the sensitivity of the selected coincidence criteria.This can be seen in the aircraft observations as a function of time (refer to aircraft flight observations in plot (b) of the individual comparison figures shown below).Also, as an additional test we repeated the summary comparison analysis for each species without applying any coincidence criteria and the statistical results (not shown) do not significantly vary from the results using the selected coincidence criteria based on the back trajectories shown in this analysis.It should be noted that this is not generally the case for short-lived minor species with localized emission sources such as ammonia.This is more indicative of "background" regional amounts, which is consistent with the more homogeneous regional nature of the concentrations typically seen across the TES special transect observations over this region during the 2012-2014 period.
As the total number of profiles in the summary statistics is relatively small, we report a median value for the bias and the standard deviation derived from the robust median absolute deviation for the variability (Leys et al., 2013), which are more robust statistics that are less influenced by outliers.
As the goal of the comparisons in this study is to validate the satellite observations, the TES observation operator in Eq. ( 2) was applied to all the aircraft profiles to account for both the reduced vertical resolution of the satellite data and the influence of any a priori information (i.e.profile shape).The aircraft profiles were extended to the full vertical range of the satellite by scaling the a priori profile to match the ends of the aircraft profile (using the shape of the a priori profile).To reduce the impact of numerical errors when applying the log-space observation operator at upper levels, where the concentrations are orders of magnitude smaller than in the troposphere with virtually no associated averaging kernel values (i.e.Worden et al., 2013), a linearized observation operator was applied and the levels between 100 hPa and 0.1 hPa were combined into one.It is also valuable to compare the actual error statistics derived from these TES/aircraft comparisons with the estimated profile errors routine calculated and reported for each observation.Note that the observation error estimates from the operational TES retrieval are reported and plotted in this analysis for comparison purposes (as opposed to the total error estimates) as the TES observation operator has already been applied to the comparison profiles, which takes into consideration the smoothing error component (Shephard and Cady-Pereira, 2015).The retrieval observation error estimates vary depending on the atmospheric conditions.Thus, for representative comparison purposes during JOSM the operational retrieval estimated observation errors at selected levels from the examples in the following sections are provided in Table 2 for reference.

Methanol (CH 3 OH)
Presented in Fig. 3 is a comparison for a single methanol TES/aircraft example profile from flight 18 for the downward shows the original aircraft (grey) profile, the aircraft profile mapped onto the TES retrieval levels (blue), and this same profile with the TES observation operator (Eq.2) applied (red); this latter profile can be directly compared with the TES-retrieved profile (purple).The TES retrieval observation error estimates are also plotted as error bars.The TES a priori profile is provided in green.(d) contains the difference between the TESretrieved profile (purple) and the aircraft profile (red) using the same colour scale as (a) for the retrieval altitude levels.
part of the spiral that was coincident with the TES overpass.The rows of the TES averaging kernel in Fig. 3a show that the peak CH 3 OH sensitivity occurs at 825 hPa.The aircraft plotted in Fig. 3b shows that there was one dedicated aircraft spiral up and down between 12:45 and 13:30 LST and another smaller "profile" later in the afternoon ∼ 16:00 LST that was not used in this study.The same aircraft profile plotted as a grey line plot in Fig. 3c shows the fine vertical structure for the CH 3 OH observations.Applying the TES observation operator to the aircraft profile smoothes out the aircraft profile to the TES coarse vertical resolution and inserts the TES a priori information with the resulting profile shown in red (essentially an estimate of what the satellite would report for the atmospheric profile measured by the aircraft).In this example the a priori profile (plotted in green) is larger than the aircraft measured atmospheric state; thus, in the region of the profile where there is limited information the satellite observation operator will smooth and "pull" the aircraft profile towards the a priori.Figure 3d shows the difference between the TES-retrieved profile (purple) and the aircraft profile with the observation operator applied (red).In this example the retrieved TES profile is less than the profile measured by the aircraft with a maximum difference of ∼ −1.1 ppbv for a value of ∼ 3.2 ppbv at 825 hPa.A summary of the CH 3 OH profile comparisons in flights 18 and 20 is plotted in Fig. 4. Similar to the example single profile shown in Fig. 3, the TES profile values are generally less than the aircraft observed profiles with a maximum median difference of −1.23 ppbv (∼ −45 %) for a median value of 2.76 ppbv at 825 hPa.At the TES peak sensitivity retrieval level of 750 hPa the bias is −1.1 ppbv (∼ −54 %) for a median value of 2.06 ppbv.This actual TES negative bias differs from the estimated positive bias based on simulations (Cady-Pereira et al., 2012); however, the simulations did not include any systematic errors.A possible source of error could be errors in the retrieved ozone profile.The CH 3 OH spectral feature is located within the ozone band, and any interfering errors from the ozone retrieval will propagate into the CH 3 OH retrieval.The corresponding standard deviation at 750 and 825 hPa are ±0.39 ppbv (∼ ±20 %) and ±0.41 ppbv (∼ ±15 %) respectively.These values are consistent with the TES-estimated uncertainty errors based on simulations of ±0.52 ppbv (±22 %) at 825 hPa under conditions with a mean 825 hPa value of 2.3 ppbv (Cady-Pereira et al., 2012).These values are also consistent with (slightly lower than) the reported TES observation error estimate of ∼ 25-30 % for these atmospheric conditions (Fig. 3c).Note, for these conditions no retrieved surface level values pass the minimum information content threshold of having a diagonal element of the averaging kernel ≥ 0.05.

Carbon monoxide (CO)
A similar comparison to the one reported above for CH 3 OH was repeated for CO.Even though CO also has limited information content it generally has slightly more information than CH 3 OH, HCOOH, and NH 3 .The representative example in Fig. 5a has 1.18 DOFS, with the rows of the averaging kernel peaking in sensitivity around 700 hPa.The downward spiral of profile "A" shown in Fig. 5b is plotted in detail as a profile in Fig. 5c along with the comparison pixel 12 profile from the TES transect.Again, one can see the smoothing of the original aircraft profile as it gets mapped onto the TES retrieval and the TES observation operator is applied.Figure 5c shows that TES does well in capturing the general profile shape of the smoothed aircraft profile, but the retrieved profile is slightly larger than the aircraft observations.The differences between the aircraft and TES plotted in Fig. 5d show a maximum difference of +10 ppbv at ∼ 700 hPa, which corresponds to a relative difference of +10 %.
A summary of all the flight 18 and 20 comparisons for CO is provided in Fig. 6.Under the atmospheric loading conditions during this intensive observation period TES-retrieved a median value of 100 ppbv with a TES/aircraft bias difference of +7.5 ppbv (7.5 %) and a standard deviation of ±22.8 ppbv (23 %) at the TES peak CO retrieval sensitivity level of 681 hPa.Compared to previous error estimates, under these conditions we have a slight positive bias where previous studies have shown a small negative bias in the mid-to upper troposphere (i.e.Luo et al. 2007a, b).The uncertainty estimates are similar to the ones previously reported and range from 10 to 20 %.Also, these results over the oil sands have slightly higher errors than typical TES CO reported operational retrieval estimated observation uncertainty error of ∼ 10 % (Table 2) under these atmospheric conditions.

Formic acid (HCOOH)
The formic acid profile comparisons are somewhat limited due to the aircraft instrument issues at higher altitudes as noted in Sect.2.2.4.However, there were still many partial profile comparison opportunities where the aircraft observations extended to the TES peak sensitivity level (∼ 750 hPa), (Fig. 7a), which is a big advantage over only using surface values where there is little satellite measurement information.Figure 7 is a comparison example of the downward part of the partial aircraft profile "D" from flight 20 at ∼ 16:20 LST (Fig. 7b) with the TES transect pixel 14 from 13:19 LST.As shown from the trajectory analysis in Fig. 2 the aircraft profile is sampling approximately the same air mass that was previously measured by TES pixel 14 at the satellite overpass time ∼ 3 h earlier.The detailed comparison in Fig. 7c shows that both the selected a priori profile and the TES-retrieved profile (∼ 1.5 ppbv) are higher than the native aircraft profile (∼ 1 ppbv).Thus, when the TES observation operator is applied to the aircraft profile it shifts the aircraft observations to larger values providing a good comparison to the actual TES observations (not its a priori information).In other words, if an a priori profile with smaller values closer to the aircraft observations were swapped into the TES-retrieved profile (either prior or post-retrieval) the resultant retrieved profile (purple line comprised of observation + a priori) would approach the atmospheric state measured by the aircraft (blue line).In this example, the differ- ence between TES and the aircraft is ∼ 0.07 ppbv (or 6 %) for a value of ∼ 1.1 ppbv at the peak TES sensitivity level of 750 hPa.
The summary values generated from all the available profile comparison values from flights 18 and 20 are presented in Fig. 8.This figure shows that at the peak TES sensitivity level of 750 hPa the median retrieved profile value is 1.04 ppbv with a bias of 0.19 ppbv (∼ 20 %) and a standard deviation of ± 0.46 ppbv (∼ 45 %).Note that the differences between the mean and median values can be large, indicating that there are a few large outliers in the sample, and the reason why the more robust statistics are reported for these comparisons.The actual uncertainty errors are similar in magnitude to the errors of ∼ ±0.4 ppbv for values in the 1.0-2.0ppbv range (∼ ±30 %) previously reported from TES retrieval simulations in Table 1.The aircraft comparison results show that under these conditions the TES retrieval has a small positive bias of ∼ +0.2 ppbv, which differs slightly from the very little to no bias reported from simulation analysis (Cady-Pereira et al., 2014), but as noted previously, there were no systematic errors included in those simulations.Both the actual errors presented in this study and the previous simulated error values (Table 1) appear to generally be a little higher than the TES-reported observation error estimate of ∼ 25 % under these conditions (Table 2).

Ammonia (NH 3 )
Figure 9 contains an example profile comparison of TES pixel 7 with partial aircraft profile "B" (Fig. 9b). Figure 9a shows the peak sensitivity level of the TES NH 3 retrieval to be 825 hPa for this example.The detailed profile comparison in Fig. 9c shows that the TES-retrieved profile (purple) measured higher concentrations than the original aircraft observations below ∼ 825 hPa and lower above.However, the a priori profile (green) selected in the retrieval has much higher concentrations than the observations (blue), "pulling" the retrieved profile (purple) to larger values.In fact, once the influence of the a priori and the coarse vertical resolution of the satellite are taken into consideration the TES observations themselves are slightly lower than the aircraft below 750 hPa.This is another example of how detailed comparisons are required if the goal of the comparison is to validate what the satellite observations themselves are providing and not just the retrieved product, which can contain a significant amount of a priori information when there is limited measurement information content.Figure 9c and d show that for a value of ∼ 0.7 ppbv at 825 hPa the TES/aircraft difference is −0.16 ppbv (∼ 23 %).
Figure 10 contains the summary results from all the available comparisons for NH 3 .Note for NH 3 there are no available aircraft profiles from flight 18 so all the comparisons are from flight 20.Also, given the relatively small amounts of NH 3 detected in this region on this day, there are only 5 out of the possible 20 pixels along the TES transect that contain enough retrieval information for the comparison.Thus, in order to make statistical inferences, all the available pixels along the TES transect are compared with all the available aircraft profiles (A-E; Fig. 10b).Even though NH 3 can be short-lived and the emission sources localized, there is not a lot of variability seen in the 5 × 8 km 2 pixels across the TES 240 km transect when looking at all the observations available from the TES oil sands special observations taken over the 2013-2015 period (Shephard et al., 2014).Therefore, selecting all the available observations as being representative is a reasonable approach.The summary of the TES/aircraft profile comparison results in Fig. 10 shows that for the median profile value of 0.97 ppbv at 825 hPa NH 3 there is a small positive bias of 0.08 ppbv (∼ 8 %) with a standard deviation of ±0.25 ppbv (∼ 25 %).This bias of +7 % is the same as the reported value by Shephard et al. (2011), with the standard deviation being about twice as large as the ±10 % reported from their simulations but more in-line with the TES-estimated observation error uncertainty of ±15-30 % reported by the operational algorithm under these conditions (Fig. 9c).

Altitude comparison summary
For convenience all the altitude comparisons previous presented and discussed in detail for each species are provided in Fig. 11.This allows for the intercomparisons of the errors associated with each of the species analysed in this study for this period over the Canadian oil sands region as a function of pressure.It also present the results in a similar to the magnitude summary figure provided in the next section.

Magnitude comparisons
In the previous sections we showed the actual errors as a function of height.In addition it is also useful to report the actual comparison errors as a function of the species volume mixing ratio in both absolute and relative terms.Figure 12 shows the results from the satellite/aircraft comparisons with the differences binned by the magnitude of the observations, as opposed to by altitude as shown previously.For consistency, the same data screening was used as before in that each profile selected has at least 0.5 DOFS and each level selected has a diagonal averaging kernel value of at least 0.05.Note that bins were only reported when they had at least 10 data points, and data points were not included for retrieval pressure levels above ∼ 380 hPa (close to the maximum aircraft observational level).These overall results are generally consistent with the previous results presented as a function of altitude, likely because the mixing ratios for these species typically decrease with increasing altitude (decreasing pressure), but there are some differences resulting from the different binning.One general point that should be highlighted is the magnitude range (e.g.limited range of NH 3 and CO) over which the comparisons were performed.Ammonia values below 2.0 ppbv typically have a bias of ∼ 10 % with an uncertainty of ∼ ±25 %.Methanol values in the range of ∼ 1 to 3 ppbv generally have a bias of ∼ −40 to −50 % with an uncertainty of ∼ ±10 to ±20 %.The formic acid results   show that for values under 1.5 ppbv there is a positive bias of ∼ +10 to +20 % with an uncertainty of ∼ ±20 %.However, for larger values from 2 to 3 ppbv the positive bias jumps to ∼ +60 % with a smaller uncertainty of < ±10 %.Carbon monoxide values below 135 ppbv tend to have a small bias varying between ∼ −7 and +7 % depending of the magnitude bin.However, there is also an increase in the bias to ∼ +30 % for values between 135 and 170 ppbv.

Initial GEM-MACH model evaluation
The validated TES observations over the oil sands region can be used with more confidence for a variety of applications.Provide here are examples of using the satellite observations for initial model evaluation.Satellite/model comparisons are performed from both ammonia and carbon monoxide as formic acid and methanol are not specifically modelled in GEM-MACH and available for satellite comparisons.The satellite/model comparisons were performed following the same procedure as the satellite/aircraft in that the TES observation operator was also applied to the model profile, which accounted for the satellite retrieval a priori and vertical sensi- tivity (i.e.vertical resolution).The main difference is that the match-ups do not have the same space and time constraints of the satellite/aircraft comparisons since the model provides a 3-D field of observations at a time step of 2 min for the chemistry.All the available model simulations for the full JOSM campaign period were searched for matchups with the TES transects collected on seven different days.Unlike the aircraft comparisons, each TES pixel was compared against just the closest simulation.Note that it would be possible to extend these comparisons to cover the already completed 2year period of the TES special oil sands observations provided that the high-resolution oil sand model simulations are generated.For this initial comparison just the 2.5 × 2.5 km 2 model profile closest to the centre of the TES 5 × 8 km 2 footprint was used (i.e. the model profiles were not aggregated to be of similar spatial extent of the satellite footprint as it would not impact the results).As done with the aircraft comparisons, the comparisons were restricted to TES retrievals that contained at least 0.5 DOFS.

Carbon monoxide (CO)
Presented in Figs. 13 and 14 are the initial satellite/model CO comparison results.The single profile comparison example is from 3 September 2013 for TES pixel 12 at 13:31 LST, which corresponds to the TES/aircraft comparison in Fig. 5 and is one of the pixels directly over the oil sands mining region.For this profile both the TES/aircraft (+9.8 ppbv at 681 hPa) and TES/model (+8.3 ppbv at 681 hPa) show sim- ilar differences indicating that the model is doing very well at capturing the aircraft observed CO concentrations (at the TES resolution and sensitivity) for this example.
A summary of the CO satellite/model comparisons for all co-located and coincident profiles that meet the DOFS ≥ 0.5 criteria for the JOSM period is provided in Fig. 14.These results show that the model underpredicts the CO concentrations relative to what is observed by the satellite; for example at 681 hPa the median TES/model is +19.6 ppbv (+19 %).Comparing this to the corresponding summary value with the TES/aircraft bias difference of +7.5 ppbv (+7.5 %) indicates the GEM-MACH model underprediction of CO is closer to ∼ 10 % under these conditions.The model underprediction is reduced near the surface with a bias of +11.0 ppbv (+9 %) at 908 hPa level, which is double the TES/aircraft difference of +6.1 ppbv (∼ +5 %).However, it should be noted that the TES sensitivity near the surface is reduced as showed by the reduced values in the averaging kernel diagonal values at 908 hPa in Fig. 14.    15 is a single profile comparison example from 5 September 2013, which is the same day as aircraft flight 20.Pixel 7 is compared with the coincident and co-located model profile, which corresponds to the same TES pixel that was compared with the aircraft profile in Fig. 9 (note that the aircraft observations were taken about 1 h after the satellite overpass).The noticeable difference in this model comparison is the much lower ammonia levels (∼ < 0.05 ppbv) in the model simulations compared with both the satellite (Fig. 15) and the aircraft (Fig. 9), which is ∼ 0.6 ppbv at 825 hPa.It should also be noted that the model surface values never get above ∼ 0.2 ppbv across the oil sands mining region.
A summary of the NH 3 model/satellite comparisons for co-located and coincident profiles that meet the DOFS ≥ 0.5 criteria for the JOSM period is provided in Fig. 16.These results show that the model underpredicts the ammonia concentrations relative to what is observed by the satellite; for example at 825 hPa the median difference is +0.59 ppbv (∼ 60 %).Presently there are a number of updates being investigated to address this apparent underprediction of NH 3 by GEM-MACH over the oil sands region during the JOSM period: the inclusion of biomass burning in the GEM-MACH 2.5 × 2.5 km 2 special oil sands simulations (even though there were no large forest fires burning nearby during this The uncertainty values are 1 sigma standard deviations computed from the more robust median absolute deviation statistic.Note: pressure levels in bold are the average TES peak sensitivity levels for the conditions during these JOSM observations.Additional reported levels are provided for comparison purposes with previous studies.period), inclusion of a NH 3 bidirectional flux model (Bash et al., 2013a;Zhu et al., 2015), updating the diurnal emission profile of NH 3 (Bash et al., 2013b), inclusion of natural sources, and potential underestimates in the CAC NH3 inventory.For example, the compensation point, the concentration at which emissions from the surface are equal to atmospheric deposition, for NH 3 over conifers ranges from ∼ 0.2 to ∼ 0.6 ppb in unpolluted conditions at 10 • C (Zhang et al., 2010) and could possibly account for some of the model underestimate.

Conclusions
Presented in this study are TES actual errors derived from comparisons with aircraft observations taken during the intensive field campaign over the oil sands region in Alberta, Canada.The comparison results are from the aircraft observations designed to be coincident with the Aura TES overpass times for two flights with clear-sky conditions at the beginning of September 2013.Even with the dedicated validation satellite/aircraft observations, the comparison results represent a limited range of sampling conditions that occurred during this intensive study period (i.e. they do not span the full magnitude range that can be observed by TES globally under many atmospheric conditions).In this analysis we are fortunate to have comparison values of the exact quantity being retrieved (i.e.volume mixing ratio values at profile levels) and a retrieval procedure that provides the vertical sensitivity (i.e.averaging kernels) for each profile so that we can directly validate the satellite observations and not the impact of the a priori profile selection.Thus, we do not need to rely on other indirect methods to try to account for the vertical resolution and the influence of the a priori information (i.e.compute the representative volume mixing ratio; Shephard et al., 2011), which is often required when comparing different quantities (i.e.single column or surface observations) when there is limited information content.The TES/aircraft profile comparison average differences for these atmospheric conditions are presented in Table 3.These actual errors generally compare well with both the estimated retrieval observation errors from previous studies (Table 1) and estimated errors reported in the TES operational retrieval product for these atmospheric conditions (Table 2).However, there are some notable exceptions that require further investigation with additional validation observations: (i) the relatively large negative bias of ∼ −45 % for CH 3 OH, (ii) the jump of ∼ +50 % in relative bias of the HCOOH for values > 2.0 ppbv, and (iii) the sharp increase in the relative bias reported for CO values > 135 ppbv during this study (possibly due to the small sample size).
In addition to the aircraft comparisons, the satellite retrievals of ammonia and carbon monoxide were compared against special high-resolution model simulations carried out over the oil sands region during the JOSM field campaign.Only ammonia and carbon monoxide model comparisons were performed as GEM-MACH does not explicitly model formic acid and methanol.These initial comparisons identified a general underprediction of ammonia concentrations by the model relative to both aircraft and satellite observations.This apparent underprediction of ammonia concentrations from the satellite/model comparisons of ∼ 0.6 ppbv over the oil sands region is currently being investigated both with the high-resolution regional GEM-MACH 2.5 km model and the lower spatial resolution global GEOS-Chem model, which incorporates biomass burning, bidirectional fluxes, and the newest diurnal variability model and has an adjoint model to help identify where the ammonia over the oil sands originates (Zhu et al., 2015).The CO is much better predicted in the model with TES/model comparison differences of ∼ +20 ppbv (∼ +20 %), but the slight positive bias from the TES/aircraft comparisons of ∼ 7.5 % indicates that the overall model underprediction of CO is closer to ∼ 10 % at 681 hPa (∼ 3 km).Also, since biomass burning was not included in these GEM-MACH simulations, any additional contribution from potential long-range transport of CO from biomass burning would further improve the model prediction of CO during this period over the oil sands region.
Acknowledgements.We would like to thank Susan S. Kulawik for helping provide us with the original updated TES version 6 lite files and Craig Stroud for his helpful discussions and support with the back-trajectory analysis.We would like to acknowledge other members of the aircraft team for their contributions to the airborne measurements used to validate the satellite observations, in particular Andrew Budden, Stewart Cober, Andrea Darlington, Andrew Elford, Anthony Liu, Peter Liu, Aaron McCay, Robert McLaren, Bill McMurty, Richard L. Mittermeier, Julie Narayan, Jason O'Brien, Andrew Sheppard, Ka Sung, Danny Wang, Mohammad Wasey, and the National Research Council Flight Research Laboratory team.We would also like thank the other members of the oil sands modelling working group for their insights on the initial GEM-MACH ammonia simulations over the oil sands region, in particular Michael Moran.Work at the University of Minnesota was supported by NSF (grant no.1148951).Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official agency policy.This study was supported in part by the Joint Canada-Alberta Implementation Plan for Oil Sands Monitoring, the Clean Air Regulatory Agenda (CARA), and NASA ACMAP (grant no.NNX10AG63G).
Edited by: M. Weber

Fig
to ± 20% Actual errors derived from comparisons with observations: Luo et al. (2007a, b), Lopez et al. (2008) * The uncertainties in these studies are reported as 1 sigma standard deviations.

Figure 1 .
Figure 1.The aircraft flight tracks for flights 18 and 20 and the TES transect of 5 × 8 km pixels (black polygons) spanning a total distance of ∼ 240 km overplotted on Google Earth images.The aircraft flights are colour coded as a function of relative altitude going from the lower-altitude blue colours (from as low as 150 m) to the higher-altitude red colours (reaching 6400 m).
the aircraft component of the JOSM field campaign there were dedicated aircraft observations made from the National Research Council Institute for Aerospace Re-search (NRC Aerospace) Convair-580 research aircraft that included flights designed for satellite validations.The unusually large number of cloudy days during the first part of the campaign limited the number of flights suitable for TES validation purposes.The dedicated aircraft spiral pro-

Figure 2 .
Figure 2. Model-generated back trajectories for JOSM flight 20 on 5 September 2013 over the Canadian oil sands region plotted spatially as a function of local standard time (LST).The boundaries of the oil sands region are outlined with black lines, with the surface mining areas indicated within this region near the centre of the plot.Each aircraft "profile" (either the up or down profile) is indicated alphabetically in measurement succession during the afternoon (e.g."A" is at 13:22 LST (TES overpass time) and "E" later in the afternoon at 17:00 LST).Plotted for each of these aircraft profiles are two back trajectories plotted corresponding to the lowest aircraft altitude and the ∼ 750 hPa aircraft profile levels, which spans the general vertical range where the satellite is most sensitive for NH 3 , HCOOH, and CH 3 OH.Also, plotted on the map are the TES footprints, colour coded by the overpass time (13:17-13:20 LST).

Figure 3 .
Figure 3.A representative aircraft/satellite comparison for a single CH 3 OH profile using the downward aircraft spiral from profile "A" with pixel 10 along the TES transect (counted from south to north).(a) contains the rows of the satellite averaging kernels at each retrieval level.(b) shows the aircraft flight observations for the day as a function of altitude and local standard time (LST), with each aircraft "profile" (pair of the up or down profiles) indicated alphabetically in measurement succession during the afternoon.The two dotted lines bound the observations selected to generate the comparison aircraft profile.(c)shows the original aircraft (grey) profile, the aircraft profile mapped onto the TES retrieval levels (blue), and this same profile with the TES observation operator (Eq.2) applied (red); this latter profile can be directly compared with the TES-retrieved profile (purple).The TES retrieval observation error estimates are also plotted as error bars.The TES a priori profile is provided in green.(d) contains the difference between the TESretrieved profile (purple) and the aircraft profile (red) using the same colour scale as (a) for the retrieval altitude levels.

Figure 4 .
Figure 4. Summary box-and-whisker plots of the satellite/aircraft comparisons during JOSM for CH 3 OH binned at the various retrieval altitudes.Only the TES pixels from 9 to 14 (counting from south to north) near the middle of the TES transect were included (based on trajectory results).The box edges are the 25th and 75th percentile, the line in the box is the median, the diamond is the mean, the whiskers are the 10th and 90th percentiles, and the circles are the outlier values outside the whiskers.The left panel contains a summary of the retrieved TES profile values, the middle panel contains the TES/aircraft profile differences (with the satellite observation operator applied), and the right panel is the diagonal of the averaging kernel as an indication of the TES's vertical sensitivity.

Figure 5 .
Figure 5. Representative single CO profile aircraft/satellite comparison and associated plots.Plotted is the downward aircraft spiral of profile "A" compared with pixel 12 along the TES transect.Plotting convention is the same as Fig. 3.

Figure 6 .
Figure 6.Summary box-and-whisker plots of the satellite and aircraft comparisons during JOSM for CO, with the same plotting convention as Fig. 4.

Figure 7 .
Figure 7. Representative single HCOOH profile aircraft/satellite comparison and associated plots.Plotted similarly as Fig. 3, but this is profile "D" from the transformation flight 20 compared with pixel 14 from the TES transect.

Figure 8 .
Figure 8. Summary box-and-whisker plots of the satellite and aircraft comparisons during JOSM for HCOOH, with the same plotting convention as Fig. 4.

Figure 9 .
Figure 9. Representative single NH 3 profile aircraft/satellite comparison and associated plots.Similar to Fig. 3, but this is example profile "B" of NH 3 from the transformation flight 20, compared with pixel 7 from the TES transect.

Figure 10 .
Figure 10.Summary box-and-whisker plots of the satellite and aircraft comparisons during JOSM flight 20 for NH 3 .All the available TES pixels from along the TES transect were included using the same plotting convention as Fig. 4.

Figure 11 .
Figure 11.Summary plot of the actual errors (TES/aircraft) from the JOSM comparisons plotted as a function of pressure for NH 3 , CH 3 OH, HCOOH, and CO.

Figure 12 .
Figure 12.Summary plot of the actual errors (TES/aircraft) from the JOSM comparisons plotted as a function of volume mixing ratio (VMR) for NH 3 , CH 3 OH, HCOOH, and CO.

Figure 13 .
Figure 13.Single CO profile GEM-MACH model/satellite comparison and associated plots.(a) follows the same convention as Fig. 3, with the model profile replacing the aircraft profile.(b) contains the difference between the TES-retrieved profile (purple) and the model profile (red).(c) contains the rows of the satellite averaging kernels at each retrieval level.(d) shows the 2-D simulated NH3 model field at 962 hPa that corresponds most closely to the TES overpass at 13:30 LST on 3 September 2013.The profiles being compared are for the locations outlined in magenta, with the larger box showing the TES footprint and the smaller inner box the model grid box.

Figure 14 .
Figure 14.Summary box-and-whisker plots of the satellite and model comparisons during JOSM for CO using the same plotting convention as Fig. 4.

Figure 15 .
Figure 15.Single NH 3 profile GEM-MACH model/satellite comparison and associated plots.(a) follows the same convention as Fig. 3, with the model profile replacing the aircraft profile (note since the model grey line is smooth it is obscured by the mapped blue line on the plot).(b) contains the difference between the TESretrieved profile (purple) and the model profile (red).(c) contains the rows of the satellite averaging kernels at each retrieval level.(d) shows the 2-D simulated NH 3 model field at 956 hPa that corresponds most closely to the TES overpass at 13:17 LST.The profiles being compared are for the locations outlined in magenta, with the larger box showing the TES footprint and the smaller inner box the model grid box.

Figure 16 .
Figure 16.Summary box-and-whisker plots of the satellite and model comparisons during JOSM for NH 3 using the same plotting convention as Fig. 4.

4. 2
Ammonia (NH 3 ) Ammonia has not been extensively validated in the GEM-MACH model.Presented in Figs. 15 and 16 are the initial satellite/model comparison results. Figure

Table 1 .
Summary of reported estimates of TES retrieval errors.
based on the 2010 Canadian Air Pollutant Emission Inventory (obtained from Environment Canada, http://www.ec.gc.ca/pollution/default.asp?lang=En&n=E96450C4-1) and the projected 2012 US (obtained from US EPA; http://www.epa.gov/ttn/chief/emch/index.html#2005)national emissions inventories based on 2005.The NH 3 chemistry used in the model is described in detail in

Table 2 .
TES operational retrieval observation error estimates for JOSM examples.Note: pressure levels in bold are the average TES peak sensitivity levels for the conditions during these JOSM observations.Additional reported levels are provided for comparison purposes with previous studies.

Table 3 .
TES/aircraft comparison statistics (actual errors) at peak satellite sensitivity level during JOSM.