MAX-DOAS NO 2 observations over Guangzhou , China ; ground-based and satellite comparisons

In this study, the tropospheric NO2 vertical column density (VCD) over an urban site in Guangzhou megacity in China is investigated, by means of MAX-DOAS measurements during a campaign from late March 2015 to mid-March 2016. A MAX-DOAS system was deployed at the Guangzhou Institute of Geochemistry of the Chinese Academy of Sciences and operated there for about one year, during the spring and summer months. The tropospheric NO2 VCDs retrieved by the MAX15 DOAS are presented and compared with space-borne observations from GOME-2/MetOp-A, GOME-2/MetOp-B and OMI/Aura satellite sensors. The comparisons reveal good agreement between satellite and MAX-DOAS observations over Guangzhou, with correlation coefficients ranging between 0.76 for GOME-2B and 0.99 for GOME-2A. However, the tropospheric NO2 loadings are underestimated by the satellite sensors on average by 25.1%, 10.3% and 5.7%, respectively for OMI, GOME-2A and GOME-2B. Our results indicate that GOME-2B retrievals are closer to those of the MAX-DOAS 20 instrument due to the lower tropospheric NO2 concentrations during the days with valid GOME-2B observations. In addition, the effect of the main coincidence criteria is investigated, namely the cloud fraction (CF), the distance (d) between the satellite pixel center and the ground-based measurement site, as well as the time period within which the MAX-DOAS data are averaged around the satellite overpass time. The effect of CF and time window criteria is more profound on the selection of OMI overpass data, probably due to its smaller pixel size. The available data pairs are reduced to half and about one third for CF≤0.3 25 and CF≤0.2, respectively, while, compared to larger CF thresholds, the correlation coefficient is improved to 0.99 from about 0.6, the slope value is almost doubled (~0.8) and the mean satellite underestimation is reduced to about half (from ~7 to ~3.5×1015 molecules/cm2). On the other hand, the distance criterion affects mostly GOME-2B data selection, because GOME2B pixels are quite evenly distributed among the different radii used in the sensitivity test. More specifically, the number of collocations is notably reduced when stricter radius limits are applied, the r value is improved from 0.76 (d≤50 km) to 0.93 30 (d≤20 km), and the absolute mean bias decreases about 6 times for d≤30 km compared to the reference case (d≤50 km). 1 Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2017-394 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 30 November 2017 c © Author(s) 2017. CC BY 4.0 License.

by natural and anthropogenic sources; in the first category are lightning (Schumann and Huntrieser, 2007), agricultural fertilization and the use of nitrogen-fixing plants (Vinken et al., 2014, and references therein) and biomass burning (Mebust et al., 2014). In the latter category are fossil fuel and biofuel combustion, power plant and industrial emissions, ground and air transport, and so on (Olivier and Berdowski, 2001).
The rapid growth of the Chinese economy during the last decades has led to an increase in emissions of air pollutants. Air quality in Chinese megacities has been of great concern in the atmospheric and environmental science community. NO 2 is an important trace gas in the troposphere in Chinese megacities Ma et al., 2013;Jin et al., 2016) and there is significant evidence that secondary aerosols formed from NO x , as well as SO 2 and volatile organic compounds, contribute to haze pollution events which are frequently observed over urban agglomerations in China (Fu et al., 2014;Jiang et al., 2015;Huang et al., 2014). The investigation of the global and regional spatial gradients and temporal variations of trace gases and the identification of their main emission sources can lead to a better understanding of the haze pollution events and the mechanisms forcing them, offering a useful tool for governments and policy makers in planning and implementing control regulations .
Guangzhou is the capital of the province of Guangdong in southeastern China. It is the third most populous city in China, with Shanghai and Beijing being the first two, and one of the most populated metropolitan agglomerations globally. It is located on the Pearl River Delta (PRD) about 120 km north-northwest of Hong Kong. The PRD is one of the most economically developed regions in China and one of the largest urban areas, and it includes nine cities with a combined population of about 60 million. It is a heavily industrialized area and a major port serving as a transportation and trade hub. The PRD suffers from poor air quality and visibility due to rapid industrialization, massive increase in vehicle population and, also, transportation of air pollutants from the nearby cities of Hong Kong and Macau (Wang et al., 2005;Guo et al., 2009). Air quality in the PRD region is characterized by high concentration levels of primary pollutants, such as NO x and SO 2 , as well as by secondary air pollutants, e.g., ozone and fine particulate matter (Chan and Yao, 2008;Wang et al., 2008;Huang et al., 2012). Shao et al. (2009) amply demonstrated the significant contribution of high NO x levels to the formation of ground-level ozone.
Due to its important role as an air quality indicator, NO 2 has been observed and monitored from space-borne instruments for the past three decades. Although a rapid growth in NO x emissions has been observed over China by satellite sensors during the previous two decades (Zhang et al., 2007;Liu et al., 2017), a sharp decline is evident in recent years van der A et al., 2017). Satellite observations constitute an important tool of investigat-ing the air pollution levels and trends in global (e.g., Velders et al., 2001;Schneider and van der A, 2012) and regional (e.g., Zyrichidou et al., 2009;Hilbol et al., 2013) scales. However, the satellite data retrieval is subject to several uncertainty sources related to the spectra analysis and the air mass factor (AMF) calculation, which affect the retrievals of the low tropospheric atmospheric content. The errors introduced by the AMF calculation can be attributed to the a priori profile, the aerosol and cloud properties and the surface albedo assumed (Boersma et al., 2004Leitão et al., 2010;Heckel et al., 2011;Lin et al., 2014Lin et al., , 2015. Several validation studies show significant underestimation of tropospheric trace gases, such as NO 2 , from satellite observations over regions with strong spatial gradients in tropospheric pollution (e.g., Celarier et al., 2008;Kramer et al., 2008;Chen et al., 2009;Irie et al., 2012;Ma et al., 2013). For example, in Irie et al. (2012) a bias of < 10 % between satellite and ground-based tropospheric NO 2 observations has been reported over Tokyo, Japan. Moreover, Kramer et al. (2008) have calculated a negative difference of OMI tropospheric NO 2 columns from CMAX-DOAS corresponding data of 1.78 × 10 15 molecules cm −2 over Leicester, UK. Drosoglou et al. (2017) have calculated an underestimation of 6.60 ± 5.71 × 10 15 molecules cm −2 for OMI and of about 10 ± 8 × 10 15 molecules cm −2 for both GOME-2A and GOME-2B tropospheric NO 2 observations over Thessaloniki, Greece. Considering that the NO 2 is distributed mainly in the planetary boundary layer (PBL), wellestablished ground-based measurements of tropospheric vertical columns and profiles of NO 2 are essential for the validation and, subsequently, the improvement of satellite retrievals.
Several studies have validated satellite NO 2 products over North China and the Yangtze River Delta region using ground-based observations (e.g., Ma et al., 2013;Chan et al., 2015;Jin et al., 2016;Wang et al., 2017b) or have used the satellite measurements of NO 2 to estimate NO x emissions (e.g., Ding et al., 2015;Han et al., 2015). However, there are only a few studies for the PRD area (e.g., . In most cases, underestimation of tropospheric NO 2 from satellite sensors is reported. For example, in Ma et al. (2013) a systematic underestimation of 43 % for SCIA-MACHY and 26-38 % for OMI, depending on the data set used, was derived over the Beijing area. In the work of Chan et al. (2015), MAX-DOAS tropospheric NO 2 measurements performed in Shanghai were found 2-3× higher compared to corresponding OMI data. Jin et al. (2016) also reported an underestimation of NO 2 in the troposphere by space-borne observations during winter months. However, different results are presented in Wang et al. (2017b); a systematic positive bias of 1 % has been estimated for OMI and ∼ 30 % for both GOME-2A and GOME-2B over the city of Wuxi.
Within the framework of the EU FP7 MarcoPolo project (Monitoring and Assessment of Regional air quality in China using space Observations, Project Of Longterm sinoeuropean co-Operation), a MAX-DOAS system was installed by Aristotle University of Thessaloniki (AUTH) in Guangzhou and operated there for about 1 year. In this study, the tropospheric NO 2 vertical column densities derived by the MAX-DOAS are presented and compared with tropospheric NO 2 retrievals from OMI/Aura, GOME-2/MetOp-A and GOME-2/MetOp-B satellites. The instrument comprises a thermoelectrically cooled miniature CCD spectrograph which detects the radiation in the wavelength range ∼ 300-450 nm with a resolution of about 0.35 nm and acquires fast spectral measurements of both direct solar light and sky radiance. The prototype system was developed in 2006 at the Laboratory of Atmospheric Physics of the Aristotle University of Thessaloniki (LAP-AUTH), Greece (Kouremeti et al., 2008(Kouremeti et al., , 2013. Currently, there are three MAX-DOAS systems routinely operating in the greater area of Thessaloniki, Greece. Their operation and their capability in retrieving the tropospheric NO 2 have been tested successfully under different air pollution conditions and NO 2 loadings (Drosoglou et al., 2017).
Guangzhou is the largest city located in Pearl River Delta region and it is affected from elevated concentrations of NO x (e.g., Zhou et al., 2007;Chan and Yao, 2008). Guangzhou is characterized by humid subtropical monsoon climate and suffers from occasional typhoons and frequent afternoon thunderstorms during the period from early March to mid-October. Under such weather conditions, instrument operation should be interrupted and the outdoors part of the system should be dismounted and brought indoors. This resulted in significant gaps in the data series of NO 2 . In addition, the instrument was not operating from late August 2015 to late February 2016, due to accidental damage to the optical fiber, and subsequently due to problems in the remote access of the system, which was essential for controlling the operation of the instrument. Nevertheless, the MAX-DOAS observations of tropospheric NO 2 were quite sufficient to be compared with satellite datasets and provide useful information for future validation works for the Guangzhou area. In Guangzhou the system performed sky radiance measurements at different elevation angles between 2 • and the zenith and at several selected azimuth angles free of significant obstacles in the surrounding area. Around 40 % of the scattered light measurements were performed at two main azimuthal directions (115 and 315 • ; Fig. 2a and b). Additional elevation sequences were performed at azimuth angles 80 • relative to the solar azimuth as presented in Fig. 2b. The derived tropospheric columns of NO 2 are characterized by homogeneous spatial distribution along the effective light paths of the MAX-DOAS (Fig. 2b). Thus, observations for all available azimuthal directions were used for the comparisons with the satellite datasets.
The acquired spectral measurements were analyzed according to the DOAS method (Platt, 1994;Platt and Stutz, 2008) with the aid of the QDOAS v2.111 software (http://uv-vis.aeronomie.be/software/QDOAS/, last access: 16 April 2018) developed by the Royal Belgian Institute   Chance and Spurr (1997) for Space Aeronomy (BIRA-IASB) and S[&]T (https://www. stcorp.nl/, last access: 16 April 2018; Danckaert et al., 2016). The zenith spectrum of each sequence interpolated at the time of the off-axis measurement was used as the Fraunhofer reference in order to minimize the stratospheric effect in the resulting differential slant column density (dSCD; Hönninger et al., 2004). The main DOAS analysis settings are summarized in  Fig. 3. The method used in this study to derive the vertical column density (VCD) of NO 2 is similar to the one applied in Drosoglou et al. (2017). For the conversion of dSCD into VCD a look-up table (LUT) of differential air mass factors (dAMFs) was constructed by simulations performed with the uvspec radiative transfer model (RTM), libRadtran version 1.7 (Mayer and Kylling, 2005), using a pseudo-spherical discrete ordinates radiative transfer method (Buras et al., 2011). dAMFs are calculated by subtracting the AMF at 90 • from the AMF at the off-axis elevation viewing angles. The aerosol single-scattering albedo was assumed to be 0.9, which is a typical value for urban areas in China (e.g., Li et al., 2007, and references therein), while for the aerosol asymmetry factor a value of 0.7 was used (e.g., Xia et al., 2007). For the surface albedo a value of 0.1 was assumed to be representative of an urban area (Feister and Grewe, 1995;Webb et al., 2000). Moreover, NO 2 was assumed to be distributed uniformly in a well-mixed layer extending from the surface up to 1 km height. The vertical profile of aerosol extinction used for the RTM simulations was extracted from the CALIPSO climatology database (

Satellite tropospheric NO 2 observations
Within the European Space Agency Tropospheric Emission Monitoring Internet Service, www.temis.nl (last access: 16 April 2018), tropospheric NO 2 columns derived from observations by the GOME-2/MetOp-A, GOME-2/MetOp-B and OMI/Aura space-borne instruments has been used in this study. The two EUMETSAT MetOp satellites are flying in Sun-synchronous orbits with Equator-crossing times of approximately 09:30 LT and a repeat cycle of 29 days. They were launched in 2006 and 2012, respectively. The default swath width of the GOME-2 scan is 1920 km, which gives a nadir pixel size of 80 km × 40 km (across-track × alongtrack) and enables global coverage in about 1.5 days. The current primary GOME-2B is operated in this mode, whereas the older GOME-2A is operated in a reduced swath with a swath width of 960 km and nadir ground pixel size of 40 km × 40 km since June 2013. Further description of the GOME-2 instruments may be found in Munro et al. (2015) and Hassinen et al. (2016). The NASA Aura satellite was launched in 2004 also in a polar orbit and with Equatorcrossing time of 13:30 LT. The Ozone Monitoring Instrument (OMI) is a compact nadir viewing, wide swath (daily global coverage), ultraviolet-visible (270 to 500 nm) imaging spectrometer with a foot pixel size at nadir is 13 km × 25 km and, in contrast to the GOME-2 instruments, this foot pixel size is not constant but increases for the off-nadir positions. Further description of the OMI instrument may be found in Levelt et al. ( , 2017. Tropospheric NO 2 overpass data from OMI, GOME-2A and GOME-2B satellite sensors have been collected from the www.temis.nl project for the operational period of the MAX-DOAS system for the city of Guangzhou. The tropospheric NO 2 columns are derived from satellite observations based on slant column NO 2 retrievals performed with the DOAS technique, and the KNMI combined modeling/retrieval/assimilation approach. The slant columns from the GOME-2 observations are derived by BIRA-IASB, whereas the slant columns from OMI by KNMI/NASA. For the retrieval of OMI NO 2 product the DOMINO v2.0 algorithm was used . The algorithm used for the generation of GOME-2A and GOME-2B products (TM4NO2A version 2.3) is described by Boersma et al. (2004). Apart from the overpass datasets, monthly mean values averaged on different spatial grids, are also provided within the www.temis.nl service. For visualization purposes, such monthly mean gridded data for July 2015 were downloaded, plotted only for the area surrounding Guangzhou and are shown in Fig. 5. The values given are the result of averaging and gridding mostly clear retrievals (cloud radiance fraction < 50 %, i.e., cloud fractions approximately < 20 %). White areas in the plots indicate that no meaningful measurement has been available during the month, because a location was persistently covered by clouds, or because of instrument failure. The gridding procedure accounts for the fraction of a satellite pixel overlapping with a particular grid cell and so the contribution of every pixel to the monthly mean is weighted with the overlap fraction. Note that the mean tropospheric NO 2 column for different grid cells may have very different overlap statistics, i.e., grid cell x may have been covered by only one meaningful retrieval, whereas grid cell y may be the average of 30 successful cloud-free retrievals. . Monthly averages of GOME-2A, GOME-2B and OMI NO 2 tropospheric VCDs for July 2015 are presented. The GOME-2A monthly tropospheric NO 2 mean values (on a 0.25 × 0.25 • grid) are shown in (a), the GOME-2B, also on a 0.25 × 0.25 • grid, in (b) and OMI/Aura on a finer, 0.125 × 0.125 • , grid in (c). The star symbol shows the MAX-DOAS location. The NO 2 observations averaged and gridded here correspond to cloud radiance fraction < 50 %. Note that the color bars have different ranges.

Comparisons of ground-based and space-borne tropospheric NO 2 data sets
Observations of tropospheric NO 2 from three satellite sensors (OMI, GOME-2A and GOME-2B) have been compared with the tropospheric columns derived by the MAX-DOAS system. For the comparison, we used space-borne retrievals corresponding to satellite pixel center located within a distance (d) of up to 50 km from the ground-based site and for SZA ≤ 75 • . In the case of OMI, the closest pixel was selected for the comparisons, whereas in the case of GOME-2 sensors, the average measurement of all pixels within 50 km was calculated. For the OMI dataset, only the pixels unaffected by the so-called "row anomaly" (OMI, 2012) were used, and only those corresponding to a cross-track dimension smaller than 60 km. In addition, satellite data were screened for clouds and only observations characterized by cloud fraction (CF) ≤ 30 % were used. For the tropical conditions prevailing in Guangzhou this CF value is the minimum acceptable to be used as a threshold for our datasets, leading to a sufficient number of data available for reliable comparisons, as smaller CFs are rather rare. Each satellite observation is compared with the mean value of the MAX-DOAS measurements recorded within 1 h centered at the satellite overpass time. In the next section the effect of the criteria selection in the comparisons of the ground-based and satellite data pairs is discussed at length. The coincidence criteria applied in this section and described above are used as the reference case in the sensitivity study of Sect. 4 ( Table 2). The tropospheric NO 2 VCDs retrieved from the groundbased radiance spectra measured at 15 and 30 • elevation viewing angles and at all available azimuth viewing angles were used in the comparison with corresponding space-borne observations. The system had been proven to be able to retrieve NO 2 with a spectral fitting residual of the order of 10 −3 , typical residual values of mini MAX-DOAS systems (Drosoglou et al., 2017). The value of 1 × 10 −2 has been used as a threshold to filter out disturbed retrievals under variable conditions, such as when fast moving clouds of mist emerge from the nearby river in the Guangzhou area.
Tropospheric NO 2 in Guangzhou exhibits large variability both in single measurements and in hourly averages with maximum values exceeding 60 × 10 15 molecules cm −2 (see right plot of Fig. 2). The hourly averaged values range between 10 and 40 × 10 15 molecules cm −2 . Several studies have shown similar tropospheric NO 2 VCD levels over other Chinese cities. For example, Jin et al. (2016) reported monthly averaged tropospheric NO 2 VCDs within the same range over Gucheng in North China for the spring and summer time period. Ma et al. (2013) showed that the daytime mean tropospheric NO 2 VCD over Beijing varies from 5 to 133 × 10 15 molecules cm −2 with an average of 36 × 10 15 molecules cm −2 during summertime. The average diurnal variation of the tropospheric NO 2 column derived from the MAX-DOAS measurements at the elevation angles Table 2. Statistics of the comparison of tropospheric NO 2 VCD derived from Phaethon and the three satellite sensors, using the reference coincidence criteria (first data column) and for several different cases of CF filtering, time period around overpass and distance limit between the MAX-DOAS station and the satellite pixel center. of 15 and 30 • in Guangzhou is shown in Fig. 6 as hourly averages (±1σ ) over three different MAX-DOAS data subsets, each including the overpass days of one of the three satellite sensors, i.e., GOME-2A, GOME-2B and OMI. More specifically, the three subsets have been extracted from the whole operational period of the MAX-DOAS instrument considering only the days for which the satellite NO 2 overpass data corresponded to the selection criteria mentioned above.
A double peak appears at around 10:00 am and 18:00 local time, indicating higher anthropogenic emissions. The minimum NO 2 levels around local noon reflect the destruction of NO 2 due to photochemical processes (Seinfeld and Pandis, 1998). Unfortunately, our MAX-DOAS dataset covers only spring and summer months and it cannot reveal possibly different diurnal patterns during late-autumn and winter seasons, as observed over industrial areas at midlatitudes due to different emission strength and NO 2 lifetimes . A double-peak diurnal cycle has been also reported for other Chinese cities in previous studies, e.g., for Beijing (Ma et al., 2013) and Wuxi (Wang et al., 2017a) in spring and summer. A similar pattern for NO 2 surface concentration in Guangzhou city has been found by Qin et al. (2009) using measurements performed by a long-path DOAS instrument from 10 to 24 July 2006. The large day-to-day variability mentioned already is also evident in this figure from the calculated large standard deviations (up to ±19 × 10 15 molecules cm −2 ). Most of the satellite retrievals seem to fall well within the standard deviations of the MAX-DOAS measurements close in time with the satellite overpass, indicating a generally good agreement in the NO 2 levels observed in the Guangzhou area both from space and from the ground. Interestingly, during the collocation days of GOME-2B the tropospheric NO 2 levels observed by MAX-DOAS close to GOME-2B overpass time, both in terms of the average value and the standard deviation, are lower relative to those measured around the same time , GOME-2A (blue) and GOME-2B (red) data. The error bars indicate the standard deviation of the hourly averages (±1σ ). The filled circles represent the NO 2 overpass data of the three satellite sensors that are included in the comparison with the MAX-DOAS measurements and the filed diamonds their average. The symbols are color-coded similarly to diurnals; blue, red and green indicate GOME-2A, GOME-2B and OMI overpass data, respectively. on GOME-2A and OMI collocation days. Moreover, there is only one common day of collocations between the MAX-DOAS and both GOME-2A and GOME-2B satellite sensors. The above findings could partly explain the very good agreement of the GOME-2B averaged tropospheric NO 2 column with the MAX-DOAS hourly data, despite the larger pixel size of GOME-2B (80 km × 40 km) compared to GOME-2A (40 km × 40 km in reduced swath) and OMI (13 km × 25 km at nadir).
The comparison results of the space-borne and groundbased collocations are summarized in Table 2 and presented as time series in Fig. 7 and scatter plots in Fig. 8. These figures as well as the first data column in Table 2 refer to the reference coincidence criteria as described in the beginning of this section. For the linear regression an error-weighted fitting has been applied. Evidently the number of coincident data pairs is rather small and varies for the three satellite sensors (about double the number for GOME-2B), due to gaps in MAX-DOAS data in conjunction with the different overpass times of the satellites. Also connected to the overpass time are the larger NO 2 values reported by MAX-DOAS in the case of the GOME-2 sensors (overpass around 10:00 LT), compared to OMI (overpass around 13:30 LT), as it is evidenced also from Fig. 6. MAX-DOAS and satellite observations are, qualitatively, in good agreement with the calculated correlation coefficients ranging between 0.996 for OMI and 0.795 for GOME-2B. OMI shows a closer to unit slope (0.98) than GOME-2A and GOME-2B (0.94 and 0.83, respectively). GOME-2B shows the smallest mean difference compared to the ground-based measurements, i.e., −1.8 × 10 15 molecules cm −2 (−5.7 %), probably due to the relatively lower tropospheric NO 2 loading observed in the city of Guangzhou by the MAX-DOAS during the overpass days of GOME-2B (Fig. 6). In contrast to GOME-2B, our comparison results indicate a systematic underestimation of OMI at higher tropospheric NO 2 VCDs (mean bias of −3.52 × 10 15 molecules cm −2 or −25.1 %) and a similar negative bias of GOME-2A from the ground-based observations (−3.9 × 10 15 molecules cm −2 or −10.3 %). The statistical results from the comparison of MAX-DOAS with OMI are more significant due to the lower scattering of the data pairs. The 95 % confidence interval (CI) of r is 0.985-0.999, while the 95 % CI of the mean bias range between −4.935 and −2.098 × 10 15 molecules cm −2 . In the case of GOME-2B the 95 % CI range is comparable to the one calculated for OMI (∼ 7 × 10 15 molecules cm −2 ). For GOME-2A this range is estimated to ∼ 12 × 10 15 molecules cm −2 , which is almost double compared to GOME-2B and OMI, possibly due to the short collocation data set in combination with its large variability. However, we should stress again that these statistics have been derived from a very small number of data points.
Our findings are within well agreement with the results of other studies over Chinese areas, which in most cases report underestimation of satellite data. For example, Ma et al. (2013) showed an underestimation in tropospheric NO 2 over Beijing by OMI DOMINO NO 2 product between 26 and 38 %, depending on the DOMINO algorithm version and the time period, and monthly mean MAX-DOAS NO 2 1.1-1.5× higher than the DOMINO v2.0 product. They also estimated similar correlation coefficient ranging between 0.91 and 0.93. In the study of Wu et al. (2013) Fig. 7. Ground-based observations are compared with OMI (a), GOME-2A (b) and GOME-2B (c) overpass data. The grey and black lines stand for the y = x reference line and the error-weighted linear fit, respectively. The 95 % confidence interval of the fitting is also shown (green area). The corresponding comparison statistics are presented in the first data column of Table 2. VCDs from mobile DOAS are compared with corresponding OMI retrievals revealing an underestimation of high NO 2 values by the satellite sensor and r of about 0.97. Chan et al. (2015) reported MAX-DOAS NO 2 VCDs 2-3× higher than OMI data over Shanghai during the Shanghai World Expo 2010 and correlation coefficients between 0.67 and 0.93 at four different sites, depending on the air pollution levels. In Wang et al. (2017b), although good consistency is found between the MAX-DOAS and OMI DOMINO v2.0 NO 2 retrievals, with r = 0.85 and a systematic bias of 1 %, for both GOME-2A and GOME-2B a significant overestimation of ∼ 30 % is reported and r of 0.57 and 0.45 has been estimated, respectively.
In general, satellite retrievals represent a weighted average over all the atmospheric layers contributing to the signal observed by the satellite sensor and, thus, suffer from relatively low sensitivity near the surface. This fact, in combination with an unrealistic a priori profile assumption, can lead to an underestimation of high NO 2 loadings due to local emission sources in polluted areas, such as the Guangzhou city (Eskes and Boersma, 2003). Also, part of the satellite underestimation can be attributed to the so-called gradientsmoothing effect (Ma et al., 2013) and aerosol shielding effect (Jin et al., 2016, and references therein), as well as to measurements contaminated by clouds.

Effects of the coincidence criteria selection on the comparisons
In order to investigate the effect of the coincidence criteria, the comparisons between the MAX-DOAS and the satellite datasets were repeated for various CF thresholds, namely 0.5, 0.4, 0.2, different time windows for the ground-based data averaging around the overpass time, i.e., 2, 3 and 4 h, and different radius limits for the area around the MAX-DOAS station within which the satellite pixel center is located, i.e., 20, 30 and 40 km. In each case, all the other criteria were kept in their reference value. The statistical results of each of the above comparison cases, including the reference case, are reported in Table 2. In Fig. 9 bar plots of the statistical results for the different coincidence criteria thresholds are presented. The agreement between ground-based and both GOME-2 sensors seems to be only slightly affected by the cloud screening applied (see also Table 2, data columns 1-4), likely due to their large pixel sizes. The average difference of GOME-2B from MAX-DOAS tropospheric NO 2 VCD is reduced to −1.80 × 10 15 (−5.73 %) and −0.37 × 10 15 molecules cm −2 (0.71 %) for CF ≤ 0.3 and CF ≤ 0.2, respectively. In the case of GOME-2A, the mean bias from the MAX-DOAS observations is increased for stricter CF thresholds and only the r value shows an improvement from 0.88 to 0.94 when a CF ≤ 0.2 is used. However, the metrics referring to GOME-2A can be assumed more reliable and statistically significant, considering the smaller 95 % CI values estimated for both r and mean bias compared to those for GOME-2B. Interestingly, the intercept values for both GOME-2A and GOME-2B are much higher for CF ≤ 0.2 compared to those for increased CF thresholds. However, the intercept cannot be reliably estimated when only a few data pairs (< 10 in the case of GOME-2A) are available and their dispersion should not be ignored. In fact, the intercept standard errors calculated in this study for GOME-2A and GOME-2B are relatively high.
In contrast, the choice of CF has a more significant effect on the comparisons of MAX-DOAS data with OMI observations: the available data points are reduced to half and about one-third for CF ≤ 0.3 and CF ≤ 0.2, respectively, while metrics are quite improved. This can be attributed to the higher spatial resolution of OMI compared to GOME-2 instruments, which can be 13 km × 24 km when pointing at nadir. The correlation coefficient and the slope of the linear regression are both improved, respectively, from 0.86 and 1.15 for CF ≤ 0.5, to 0.996 and 0.98 for CF ≤ 0.3, and to 0.996 and 0.97 for CF ≤ 0.2. Moreover, the intercept is improved from −1.68 × 10 15 (CF ≤ 0.5) to −0.243 and 0.07 × 10 15 molecules cm −2 (CF ≤ 0.3 and CF ≤ 0.2, respectively), while the mean bias is also reduced to more than half when either the CF ≤ 0.2 or the CF ≤ 0.3 is chosen. Also, the statistical significance of the comparisons with OMI is quite higher, due to the lower variability of the data pairs. The above results reconfirm that clouds is an important factor affecting both the satellite and ground-based measurements, and that under clear skies at least the OMI sensor is probing more accurately the tropospheric column of NO 2 even at strongly polluted environments like the area around the city of Guangzhou. In the study of Wang et al. (2017b), it is shown that the effects of cloud contamination become significant for CF > 40 and > 30 % for OMI and GOME-2 NO 2 product, respectively. Also, Jin et al. (2016) found significant improvement in the correlation between daily MAX-DOAS and OMI products at Gucheng, a rural site in North China, when more strict cloud-screening criteria were applied. More specifically, the correlation coefficient for KNMI OMI DOMINO algorithm increased from 0.74 to 0.90 and from 0.75 to 0.95 for the NASA OMNO2 level 2 product. Depending on the results of our analysis, a relatively low CF threshold (30 % or lower) is recommended to be used in future validation studies, especially for OMI products.
The MAX-DOAS data are averaged over a period of time around the satellite overpass time, in order to account for the horizontal gradients of tropospheric NO 2 that are smoothed out by space-borne measurements due to the large satellite footprint. The time window selection depends on the satellite ground pixel size and the lifetime of the trace gas under investigation in combination with the prevailing local weather conditions. For simplicity purposes, in this study, fixed values are used for every satellite and the whole collocation datasets. Four different time windows centered at the overpass time have been investigated, with the reference value included: 1, 2, 3 and 4 h. The results from the comparison between satellite and MAX-DOAS data are presented in data columns 1 and 8-10 of Table 2. For GOME-2A and GOME-2B the mean difference from the ground-based retrievals is in general lower when a window larger than 1 h around the overpass time is used and is reduced by more than half for a window of 4 h. The 95 % confidence range for the bias, although shifted, is quite stable in all cases. The lower differences calculated for wider time windows are in agreement with the large pixel sizes of these two satellite sensors. However, lower correlation between the MAX-DOAS and satellite datasets is derived for larger windows, which indicates greater dispersion of the data pairs. The effect on the comparisons with OMI is statistically more significant, which is expected due to its smaller pixel size. The correlation coefficient is reduced from 0.99 to values < 0.7 and the absolute mean difference is almost 3× higher for the time windows of 3 and 4 h compared to the reference case. Thus, we suggest  Figure 9. Bar plots of the statistical results of the comparisons between ground-based and satellite tropospheric NO 2 data pairs for the different coincidence criteria used in this study: cloud fraction (a, d, g, j), distance of satellite pixel center from the ground-based measurement location (b, e, h, k) and time window around satellite overpass (c, f, i, l). The mean difference (a, b, c) refers to the absolute difference of the satellite-derived tropospheric NO 2 from the MAX-DOAS observations. All the statistics are presented in Table 2. that a short time window is used in such studies over areas with strong local NO x emissions sources, depending on the satellite pixel size: about 1 h window in the case of OMI and GOME-2A and 1 or 2 h for GOME-2B. For the satellite validation two options are possible concerning the selection of the satellite overpass data available; either a temporal average value is calculated from spaceborne observations or the closest in distance pixel is selected within a predefined radius from the MAX-DOAS station. The KNMI/NASA OMI overpass dataset used in this study has been already filtered by the distance from the Guangzhou city, i.e., only the closest pixel is reported. From the BIRA GOME-2 datasets an average value of all the pixels within an optimum distance have been used in this study, in order to account for the GOME-2 large pixel size and the random noise of the satellite data. The optimum distance criterion may vary for different satellite sensors and different measurement locations, because it depends on many factors such as the satellite footprint, the trace gas under investigation and its horizontal gradients and the time period selected for the MAX-DOAS data averaging. In the present study, four different radii around the MAX-DOAS location have been in-vestigated, namely 20, 30, 40 km and the reference value of 50 km. The statistics estimated for the investigation of the distance criterion selection are reported in data columns 1 and 5-7 of Table 2. The effect of the distance criterion on the comparison of MAX-DOAS retrievals with OMI observations is rather weak. The calculated values of all statistics remain the same for distances 30-50 km. The correlation coefficient, slope, intercept and mean bias of OMI from MAX-DOAS are slightly affected for d ≤ 20 km, changing from 0.996 to 0.937, from 0.98 to 0.92, from −0.24 × 10 15 to 0.45 × 10 15 and from 3.52 to 3.11 × 10 15 molecules cm −2 , respectively. These results are attributed to the fact that the majority of the satellite pixels included in the comparisons are centered within 20 km from the ground-based location. The effect of the radius selection on the GOME-2A sensor is different compared to that on GOME-2B. The comparison of GOME-2A with the MAX-DOAS observations is only slightly affected for a distance limit of 30 km and somewhat improved for d ≤ 20 km. However, the comparison with GOME-2B seems to be more sensitive to the distance criterion applied. The number of collocations is one-fourth and two-thirds less for distance ≤ 30 and ≤ 20 km, respectively, compared to the reference case. In addition, the r value is notably improved from 0.795 to 0.953 for d ≤ 20 km, while better slopes, ranging between 0.94 and 0.99, appear for more stringent radius limits, and the absolute mean bias decrease about 6 times for d ≤ 30 km. The large change in bias from negative for d ≤ 30 km to positive for d ≤ 20 km cannot be easily explained. Nevertheless, considering the high r value and the quite larger than unity slope value (1.18), GOME-2B seems to overestimate NO 2 columns for high groundbased NO 2 observations. In general, the distance of the satellite pixel center from the ground-based location depends on the pixel size; for smaller satellite footprints, e.g., OMI and GOME-2A, the pixel center is mostly located within a radius of 20 km, while for coarser satellite spatial resolution, e.g., GOME-2B, the pixel center can be within a distance of up to 40 km from the MAX-DOAS location. Thus, an upper distance threshold of 30 km seems to be an optimal selection, considering also the statistical results of this study.
According to our results, a set of coincidence criteria is recommended for the validation of space-borne measurements using MAX-DOAS observations over polluted areas. More specifically, a cloud fraction upper limit of 30 % and a maximum radius around the ground-based location of 30 km have led to a very good agreement within acceptable levels of confidence. Moreover, a time window centered at the satellite overpass time of 1 h for OMI and GOME-2A and 1 or 2 h for GOME-2B is recommended. Unfortunately, the criteria thresholds have been tested on a limited amount of groundbased and satellite collocation data. In addition, the statistical significance of the comparisons with the GOME-2A and GOME-2B sensors is restricted due to the higher dispersion of the coincident measurements.

Conclusions
In this study, tropospheric NO 2 VCD measurements performed with the MAX-DOAS system of AUTH in Guangzhou, China, are presented and used for comparisons with relevant satellite products. The data were collected during a 1-year campaign that was held in the framework of the EU FP7 MarcoPolo project (Monitoring and Assessment of Regional air quality in China using space Observations, Project Of Longterm sino-european co-Operation). The MAX-DOAS data are compared with corresponding OMI/Aura, GOME-2/MetOp-A and GOME-2/MetOp-B overpass data, revealing good correlation coefficients, i.e., 0.996, 0.882, 0.795, respectively, and slope values ranging between 0.83 and 0.98. However, the NO 2 levels in the troposphere are underestimated by the satellite sensors on average by 3.5 (25.1 %), 3.9 (10.3 %) and 1.8 (5.7 %) × 10 15 molecules cm −2 , respectively, for OMI, GOME-2A and GOME-2B. Similar results have been reported by several studies for OMI observations over other Chinese cities (Ma et al., 2013;Wu et al., 2013;Wang et al., 2017b). However, the agreement of our MAX-DOAS measurements with GOME-2A and GOME-2B retrievals is better compared to other studies (e.g., Wang et al., 2017b). The underestimation of tropospheric NO 2 by satellite sensors can be mainly explained by the relatively low sensitivity of spaceborne measurements near the surface, the a priori profile assumed for the AMF calculations, the gradient-smoothing effect and the aerosol shielding effect.
Interestingly, GOME-2B shows the smallest underestimation despite its large pixel size (80 km × 40 km). By investigating the diurnal cycles of the ground-based tropospheric NO 2 VCD in Guangzhou as an average of the collocation days for each satellite separately, we conclude that the better agreement between the MAX-DOAS and GOME-2B retrievals can be partly attributed to the significantly lower tropospheric NO 2 loadings observed by MAX-DOAS during the GOME-2B overpass days. We revealed a diurnal pattern of tropospheric NO 2 with two maxima located around latemorning (10:00 LT) and late-afternoon (18:00 LT), indicating higher anthropogenic emissions, and a minimum close to local noon (∼ 14:00 LT), reflecting photochemical sinks of tropospheric NO 2 . Similar diurnal variation for the NO 2 surface concentration in Guangzhou city has been found by Qin et al. (2009). A double-peak diurnal cycle has been also shown for other Chinese cities, e.g., for Beijing (Ma et al., 2013) and Wuxi (Wang et al., 2017a) in spring and summer.
In order to investigate the effect of the coincidence criteria, the comparisons between ground-based and space-borne tropospheric NO 2 retrievals were repeated for various CF thresholds, different time windows for the averaging of the MAX-DOAS data around the overpass time and different upper limits for the distance of the satellite pixel center from the MAX-DOAS site. The effect of the MAX-DOAS averaging time window on the comparisons with OMI is more significant, probably due to its smaller pixel size. Although the agreement between OMI and MAX-DOAS is worse for larger time windows, for GOME-2 sensors the results are slightly improved. This finding can be explained by the smoothing of the horizontal NO 2 gradients due to the GOME-2 large pixel size. On the other hand, the distance criterion has no significant effect on OMI and GOME-2A results because most of the overpass data are located within 20 km from the ground-based station. In the case of GOME-2B better slope values and mean biases are achieved for d ≤ 40 and ≤ 30 km, while the correlation coefficient is better for d ≤ 20 km. The CF threshold seems to have the most profound effect on the comparisons between satellite and MAX-DOAS datasets. Especially in the case of OMI, the underestimation is substantially suppressed when more stringent cloud screening is applied (CF ≤ 20 %), reducing the average difference to −3.44 × 10 15 molecules cm −2 (less than half the value for CF ≤ 50 %) and improving the correlation coefficient to 0.996 and the slope to 0.98 from 0.862 and 1.15, respectively.
It should be mentioned here that in this study the MAX-DOAS tropospheric NO 2 time series covers about 1 year in total, with observations during spring and summer months only. This means that all our findings are representative of the spring-summer seasons only and no information is available on the NO 2 patterns in the area during late-autumn and winter seasons, which are characterized by different emissions strengths and lifetimes of NO 2 . Moreover, only a limited number of coincident space-borne and ground-based data are available, which, combined with the relatively high scattering of the data in the case of GOME-2A and GOME-2B, leads to lower statistical level of confidence. Nevertheless, the findings of this study can be useful for future validation efforts.
Competing interests. The authors declare that they have no conflict of interest.