Inter-technique validation of tropospheric slant total delays

An extensive validation of line-of-sight tropospheric slant total delays (STD) from Global Navigation Satellite Systems (GNSS), ray tracing in numerical weather prediction model (NWM) fields and microwave water vapour radiometer (WVR) is presented. Ten GNSS reference stations, including collocated sites, and almost 2 months of data from 2013, including severe weather events were used for comparison. Seven institutions delivered their STDs based on GNSS observations processed using 5 software programs and 11 strategies enabling to compare rather different solutions and to assess the impact of several aspects of the processing strategy. STDs from NWM ray tracing came from three institutions using three different NWMs and ray-tracing software. Inter-techniques evaluations demonstrated a good mutual agreement of various GNSS STD solutions compared to NWM and WVR STDs. The mean bias among GNSS solutions not considering post-fit residuals in STDs was−0.6 mm for STDs scaled in the zenith direction and the mean standard deviation was 3.7 mm. Standard deviations of comparisons between GNSS and NWM ray-tracing solutions were typically 10 mm± 2 mm (scaled in the zenith direction), depending on the NWM model and the GNSS station. Comparing GNSS versus WVR STDs reached standard deviations of 12 mm± 2 mm also scaled in the zenith direction. Impacts of raw GNSS post-fit residuals and cleaned residuals on optimal reconstructing of GNSS STDs were evaluated at intertechnique comparison and for GNSS at collocated sites. The use of raw post-fit residuals is not generally recommended as they might contain strong systematic effects, as demonstrated in the case of station LDB0. Simplified STDs reconstructed only from estimated GNSS tropospheric parameters, i.e. without applying post-fit residuals, performed the best in all the comparisons; however, it obviously missed part of tropospheric signals due to non-linear temporal and spatial variations in the troposphere. Although the post-fit residuals cleaned of visible systematic errors generally showed a slightly worse performance, they contained significant tropospheric signal on top of the simplified model. They are thus recommended for the reconstruction of STDs, particularly during high variability in the troposphere. Cleaned residuals also showed a stable performance during ordinary days while containing promising information about the troposphere at low-elevation angles.

Abstract. An extensive validation of line-of-sight tropospheric slant total delays (STD) from Global Navigation Satellite Systems (GNSS), ray tracing in numerical weather prediction model (NWM) fields and microwave water vapour radiometer (WVR) is presented. Ten GNSS reference stations, including collocated sites, and almost 2 months of data from 2013, including severe weather events were used for comparison. Seven institutions delivered their STDs based on GNSS observations processed using 5 software programs and 11 strategies enabling to compare rather different solutions and to assess the impact of several aspects of the processing strategy. STDs from NWM ray tracing came from three institutions using three different NWMs and ray-tracing software. Inter-techniques evaluations demonstrated a good mutual agreement of various GNSS STD solutions compared to NWM and WVR STDs. The mean bias among GNSS solutions not considering post-fit residuals in STDs was −0.6 mm for STDs scaled in the zenith direction and the mean standard deviation was 3.7 mm. Standard deviations of comparisons between GNSS and NWM ray-tracing solutions were typically 10 mm ± 2 mm (scaled in the zenith direction), depending on the NWM model and the GNSS station. Comparing GNSS versus WVR STDs reached standard deviations of 12 mm ± 2 mm also scaled in the zenith direction. Impacts of raw GNSS post-fit residuals and cleaned residuals on optimal reconstructing of GNSS STDs were evaluated at intertechnique comparison and for GNSS at collocated sites. The use of raw post-fit residuals is not generally recommended as they might contain strong systematic effects, as demonstrated in the case of station LDB0. Simplified STDs reconstructed only from estimated GNSS tropospheric parameters, i.e. without applying post-fit residuals, performed the best in all the comparisons; however, it obviously missed part of tropospheric signals due to non-linear temporal and spatial variations in the troposphere. Although the post-fit residuals cleaned of visible systematic errors generally showed a slightly worse performance, they contained significant tropospheric signal on top of the simplified model. They are thus recommended for the reconstruction of STDs, particularly during high variability in the troposphere. Cleaned residuals also showed a stable performance during ordinary days while containing promising information about the troposphere at low-elevation angles. and wet parts, it is possible to retrieve the amount of water vapour in the atmosphere along the path followed by the GNSS signal.
During the processing of GNSS observations only the total delay in the zenith direction (zenith total delay, ZTD) above the GNSS antenna can be estimated for each epoch or for a time interval. ZTDs from GNSS reference stations are operationally assimilated into numerical weather prediction models (NWMs) for almost a decade (Bennitt and Jupp, 2012;Mahfouf et al., 2015). In Europe, this activity is coordinated mainly in the framework of the EUMETNET EIG GNSS Water Vapour Programme (E-GVAP, 2005(E-GVAP, -2017, phases I-III, http://egvap.dmi.dk). Many recent studies demonstrated a positive impact of the ZTD or integrated water vapour (IWV) assimilation on precipitation weather forecasts, especially of the short-time ones (Vedel and Huang, 2004;Guerova et al., 2006;Shoji et al., 2009;Guerova et al., 2016). In contrast, continuous developments in NWM forecasting and nowcasting tools, as well as increasing needs for better predictions of severe weather events, stress the demand of high-quality humidity observations with high spatial and high temporal resolutions. While ZTDs provide information in zenith directions above GNSS stations, linear horizontal tropospheric gradients give information about the first-order spatial asymmetry around the station. Besides, slant tropospheric delays (STDs) can provide additional details about the horizontal asymmetry in the troposphere, more specifically in the directions from a receiver to all observed GNSS satellites. With the increasing number of GNSS systems and satellites, the atmosphere scanning will be more complete, hence gaining even more interest. Bauer et al. (2011) showed a positive impact of STD assimilation into the Mesoscale Model 5 (MM5) and Kawabata et al. (2013) demonstrated a significant advantage of assimilating STDs into a high-resolution model in the case of forecasting local heavy rainfall event against the scenario of assimilating ZTDs only. Also, Shoji et al. (2014) and Brenot et al. (2013) showed promising techniques for prediction of severe weather events using advanced GNSS tropospheric products such as horizontal gradients and STDs. The GNSS tomography technique aiming at the three-dimensional reconstruction of the water vapour field (Flores et al., 2001) uses STDs as input data as well.
Obviously, the quality of the tomography depends on both the accuracy of the STDs (Bender et al., 2009) and the observation geometry .
Validation of GNSS slant delays with independent measurements is not a new research topic. GNSS slant delays were validated against water vapour radiometer (WVR) measurements in Braun et al. (2001Braun et al. ( , 2002 and Gradinarsky (2002). First attempts to derive slant delays from NWM fields and to compare them with GNSS STDs were carried out by De Haan et al. (2002) and Ha et al. (2002). Additional effort to evaluate GNSS slant delays using WVR and NWM data was done at GFZ Potsdam over the last few years. Bender et al. (2008) showed an existing high correla-tion within the three sources (GPS, WVR, NWM) of slant wet delays (SWDs) and tried to quantify the effect of removing multipath from GPS post-fit residuals using a stacking method what was also done by Kačmařík et al. (2012). Deng et al. (2011) validated tropospheric slant path delays derived from single-and dual-frequency GPS receivers with NWM and WVR data. Shangguan et al. (2015) compared GPS versus WVR slant IWV values (SIWVs) using a 184day dataset. They also analysed the influence of the elevation angle setting and the meteorological parameters (used for the conversion to IWV) on the comparison results. More recently, a validation of multi-GNSS slant total delays retrieved in real time from GPS, GLONASS, Galileo and BeiDou was presented by Li et al. (2015a) using WVR and NWM as independent techniques for the assessment. Using multiple GNSS constellations brought a visible advantage, in terms of not only the number of available slants but also their higher accuracy and robustness.
Nevertheless, most of the studies presented thus far were limited to only a single strategy for obtaining GNSS STDs and usually restricted to a limited set of stations and/or a relatively short time period. The main purpose of this study is an extensive comparison of various solutions from GNSS processing, NWM ray tracing and WVR measurements using one common dataset as well as a comparison of results from collocated stations. The GNSS solutions evaluated in this work used 5 different software programs and 11 strategies and exploited the GNSS4SWEC benchmark dataset . Then, the paper studies the impact of various approaches on STD estimates and aims to find the most suitable strategy for estimating the GNSS-based STDs.
Section 2 briefly introduces the validation study dataset, and Sect. 3 describes the process of retrieving GNSS STDs including an overview of the different GNSS solutions. Section 4 provides a description of STDs generated from NWMs, and Sect. 5 summarizes WVR principals and WVRbased STD solutions. Section 6 introduces the methodology used in the validation of STDs, and Sects. 7 and 8 study the results achieved at single GNSS reference stations and at closely collocated stations, respectively.

Experiment description
The presented work has been carried out in the context of the EU COST Action ES1206 "Advanced Global Navigation Satellite Systems tropospheric products for monitoring severe weather events and climate" (GNSS4SWEC; http: //www.cost.eu/COST_Actions/essem/ES1206, 2013-2017). Three mutually cooperating working groups (WG) have been established to cover the proposed topics: (1) WG1 for Advanced GNSS processing techniques, (2) WG2 for GNSS for severe weather monitoring, and (3) WG2 for GNSS for climate monitoring. Validation of STDs belongs mainly under WG1, which is oriented toward the development of new advanced tropospheric products, among other topics. The idea of preparing a common benchmark dataset, which could serve efficiently for most planned activities, was designed in the beginning of the project; the data were collected, cleaned and documented, and reference products were generated and assessed . The selected geographical area is situated in central Europe (Austria, Germany, the Czech Republic, Poland) where severe weather events, including extensive floods on Danube, Moldau and Elbe rivers, occurred between May and June 2013. The benchmark dataset gathers observations from 430 GNSS reference stations, 610 meteorological synoptic stations, 21 radiosonde launching sites, 2 WVR, 2 meteorological radars and output fields from the ALADIN-CZ NWM over a period of 56 days. ZTDs and horizontal tropospheric gradients from the reference GNSS and NWM-derived tropospheric products were already evaluated, and all resulted in very good agreement . All STDs used in this paper were computed by exploiting the benchmark dataset.
From the complete benchmark dataset, we selected a subset of 10 GNSS reference stations situated at six different locations (Table 1). The selection was based on the following requirements: (1) long-term quality of observations and its stability, (2) availability of another GNSS reference station in the site vicinity, (3) availability of another instrument capable of STD measurements in the site vicinity and (4) the location of the station with respect to its altitude and the weather events which occurred during the evaluation period. The subset also includes collocated (dual) GNSS stations that played an important role in the validation. The collocated stations observed GNSS satellites with the same azimuth and elevation angles, so that they should theoretically deliver the same or very similar tropospheric parameters -ZTD, linear horizontal gradients and slant delays. Post-fit residuals of carrier-phase observations at the collocated stations should represent common effects due to the local tropospheric anisotropy, while systematic differences could remain due to instrumentation and environmental effects such as antenna and receiver characteristics and multipath. Only STDs from the WVR at Potsdam, collocated with the GNSS stations POTM and POTS, were available for this study because the second WVR, located at Lindenberg and collocated with the GNSS stations LDB0 and LDB2, was operated only in the zenith direction during the period of the study.

STD retrievals from GNSS observations
The STD cannot be estimated directly from GNSS data since the total number of unknown parameters in the solution would be higher than the number of observations. Instead, the total delays in the zenith direction above the GNSS station (i.e. ZTD) are adjusted together with, optionally, total tropospheric linear horizontal gradients (G) to account for the first-order asymmetry of the local troposphere. The es-timates are valid for individual processing epochs whenever using a stochastic approach or for a given time interval when modelling the troposphere with a deterministic process, e.g. by a piece-wise constant or linear model.
In practice, the ZTD is decomposed into an a priori model, usually by introducing the zenith hydrostatic delay (ZHD; see Saastamoinen, 1972), and the estimated corrections, representing (mainly) the zenith wet delay (ZWD). Similarly, the STD is decomposed to the ZHD, ZWD, G and post-fit residuals (RES) as described in Eq. (1), where ele is the elevation angle and azi is the azimuth angle in degrees. The STD value is given in metres.
The elevation angle dependency of STD is described by the mapping functions, separately for the hydrostatic (mf h ) and the wet (mf w ) components. Nowadays, the Vienna Mapping Function (VMF1; see Böhm et al., 2006a) -or VMF1-like concept -is commonly used in GNSS data processing. Also, the empirical mapping function Global Mapping Function (GMF; see Böhm et al., 2006b) is popular since it is consistent with VMF1 and easier to implement (independent on external data needing updates). Both the VMF1 and the GMF are applicable down to 3 • -elevation angles. The first-order horizontally asymmetric delay G(ele, azi) in Eq. (1) reflects local changes in temperature and particularly in water vapour. MacMillan (1995) proposed a model describing the gradient delay as a function of the elevation and azimuth angles: where mf g (ele) = mf h (ele) · cot (ele). Chen and Herring (1997) replaced the elevation-dependent term mf h (ele) · cot (ele) by the gradient mapping function mf g (ele) = 1/(sin (ele) · tan (ele) + C), with C = 0.0032, nowadays commonly used in GNSS data processing. Typical range for G N and G E is below 1-2 mm, but gradients can reach up to 7 mm during extreme weather events. The gradient of 1 mm corresponds to about 55 mm slant delay correction when projected to 7 • -elevation angle. Additionally, post-fit residuals RES may contain unmodelled tropospheric effects not covered by the estimated tropospheric parameters. Such remaining effects are supposed to be caused mainly by higher spatial and temporal variations of the humidity or its significant horizontal asymmetry in the troposphere. Obviously, residuals contain also other un-modelled effects such as multipath, errors in antenna-phase centre variations or satellite clocks. For eliminating such systematic effects, cleaning of post-fit residuals is applied by generating elevation-and azimuth-dependent correction maps as described by Shoji et al. (2004). For each solution and each station, we thus computed mean values of post-fit residuals in 1 × 1 • bins using the whole benchmark For the analysis of GNSS L1 and L2 carrier-phase observations, the least-squares adjustment or Kalman-filter approach was applied to estimate the ZWDs and the two horizontal gradient components G N and G E at each GNSS site (Table 1). Afterwards, Eq. (1) was used to compute STDs for each satellite in view. Whenever zero-differenced (ZD) post-fit residuals were available for any solution, three variants of the solution are presented in the paper: (1) solution without residuals (nonRES), (2) solution with raw residuals (rawRES) and (3) Table 2 with a few specific notes important for the interpretation of the results.
GOP delivered two solutions based on the Precise Point Positioning (PPP) technique (Zumberge et al., 1997) and using the in-house developed application Tefnut (Douša and Václavovic, 2014) derived from the G-Nut core library (Václavovic et al., 2013). Considering all available GNSS solutions, only GOP used a stochastic modelling approach to estimate all parameters. Additionally, GOP provided two solutions: (1) GOP_F using Kalman filter (forward filter only), i.e. capable of providing ZTD, tropospheric gradients and STDs in real time; and (2) GOP_S applying the backward smoothing algorithm (Václavovic and Douša, 2015) on top of the Kalman filter in order to improve the quality of all estimated parameters during the batch-processing interval and to avoid effects such as the PPP convergence or re-convergence. Some institutions also delivered two STD solutions which differ in a single processing setting. The aim was to evaluate their impact on STDs: (a) TUO_G and TUO_R exploit GPS-only and GPS + GLONASS observations, respectively; (b) TUW_3 and TUW_7 apply an elevation cut-off angle of 3 and 7 • respectively; and (c) ROB_G and ROB_V use the GMF and VMF1 mapping functions, respectively. Additionally, ROB solutions are the only ones based on the processing of double-difference (DD) observations and providing ZD carrier-phase post-fit residuals converted from the original DD residuals using the technique described in Alber et al. (2000). For other DD solutions, variants without adding residuals were compared only.
In total, we validated 11 solutions computed with five different GNSS processing software. Five of the solutions used GPS and GLONASS observations and six solutions used GPS-only observations; five of them are based on DD observations and six of them are computed using zero-difference data in PPP analysis. More information about TUW solutions can be found in Möller et al. (2016), about GFZ in Bender et al. (2009 and Deng et al. (2011) and about CNAM in Morel et al. (2014). For ROB, TUO and WUE solutions we refer the reader to Dach et al. (2015).

Description of ERA-Interim STD solution (ERA/GFZ) and NCEP-GPS STD solution (GFS/GFZ)
The ERA-Interim and NCEP-GFS STD solutions by GFZ are based on "assembled" STDs. At first, for the considered station and epoch, a set of ray-traced STDs (various elevation and azimuth angles) is computed using technique described in Zus et al. (2014). Secondly, from this set of raytraced STDs, the tropospheric parameters (i.e. zenith delays, mapping function coefficients, first-and higher-order gradient components) are determined. Finally, for the required azimuth and elevation angle the STD is "assembled" using the tropospheric parameters. For a detailed description of the tropospheric parameter determination the reader is referred to Douša et al. (2016). The differences between the "assembled" and ray-traced STDs are sufficiently small in particular for elevation angles above 10 • . In essence, the largest uncertainty in the "assembled" (and ray-traced) STDs remains the uncertainty of the underlying NWM refractivity field. This uncertainty is estimated to be about 8-10 mm close to the zenith, increasing to about 8-10 cm at an elevation angle of 5 • . Similar uncertainty of around 8 mm for the zenith direction was also found for ALADIN-CZ model in Douša et al. (2016).

Description of ALADIN-CZ STD solution from BIRA (ALA/BIRA)
To compute STDs from ALADIN-CZ, a simplified strategy has been used to model the curve path followed by GNSS signals through the neutral atmosphere, as suggested by Saastamoinen (1972). The delays simulated with this strategy show small differences in comparison to straight-line simulations (differences of about 4, 5 and 10 mm, respectively, at 15, 10 and 5 • elevation). Simulations have computed STDs down to 3 • elevation; however, under an elevation of 15 • a proper ray-tracing strategy, as mentioned in Sect. 4.1, should be applied.
For each latitude-longitude grid point and each level of ALADIN-CZ model, the NWM outputs considered to compute STDs are geopotential height (geopotH in m), pressure (P in Pa), temperature (T in K), partial pressure of water vapour (e in Pa) and mixing ratio of liquid and solid water (in kg kg −1 ). The ground pressure of each column is also retrieved. The geopotential height is converted to the altitude above the geoid: where g 0 is a standard gravity acceleration (mean value of 9.80665 m s −2 from the World Meteorological Organization, WMO); R e = 6378137/(1.006803 − 0.006706 × sin 2 (lat)) is the radius of the ellipsoid in metres for the latitude (lat in degrees); g is the gravity acceleration (in m s −2 ) of the considered location given as Then, the height above the geoid is converted to height above the WGS84 ellipsoid (in m) with the use of the EGM96 (Earth Gravitational Model; Lemoine et al., 1998) undulation. Note that for the region of the benchmark campaign the difference between geoid and WGS84 altitude is about 47 m.
Using the hypsometric equation, the ground pressure and the pressure of each level are considered to estimate the altitude for the different levels. In total ALADIN-CZ outputs provide 87 levels up to an altitude of about 55 km. However, to assess STDs from ALADIN-CZ, the integration was stopped at 15 km since the contribution of water vapour above this altitude is negligible. An adaptive step is considered (100, 200, 250, 500, 1000 m, respectively, for vertical altitudes between 0 and 1, 1 and 3, 3 and 5, 5 and 10, and 10 and 15 km). Bi-linear interpolations of ALADIN-CZ parameters at the altitude of the GNSS station and for each step of the integration were proceeded. Note that there is no station selected for the validation located below the first layer of ALADIN-CZ.
The expression of simulated STDs from ALADIN-CZ is the summation of these four contributions: where SHD int , SWD int and SHMD int are, respectively, the inside-model integration contribution of the hydrostatic, wet and hydrometeor delays, and STD ext is the external model contribution (over 15 km).
with k 2 = k 2 − k 1 · R d /R w , where k 1 (in K Pa −1 ), k 2 (in K Pa −1 ) and k 3 (in K 2 Pa −1 ) are the empirical refractivity coefficients of Bevis et al. (1994); R d and R w the gas constants, respectively, for dry air and water vapour (in J kmol −1 K −1 ); and T v is the virtual temperature (in K). For the estimation of the hydrometeor contribution inside the model, as presented in Eq. (8), (N lw , M lw ) and (N ice , M ice ) are atmospheric refractivity and mass content per unit of air volume of the liquid and ice water, respectively.
The estimation of coefficients α lw ∼ 1.45 and α ice ∼ 0.69 is presented in Brenot (2006). The ALADIN-CZ model provides mixing ratios of cloud water (liquid components) and pristine ice (solid water components). The mass content per unit of air volume is obtained using the associated mixing ratio, pressure, water vapour partial pressure and temperature. STD ext is obtained with the hydrostatic formulation (Saastamoinen, 1972) mapped with mf h (see Eq. 1) and using the elevation, latitude and pressure of the last step of the integration (i.e. at 15 km). Note that the wet contribution over 15 km is neglected since it is practically zero. The estimation of STD ext (about 0.21 m) provides sufficiently accurate modelling for the hydrostatic contribution over 15 km (as shown by the sensitivity test from Brenot, 2006).

Description of ALADIN-CZ solution from WUELS (ALA/WUELS)
The ray-traced tropospheric delays for WUELS' solution are based on piece-wise bent-2-D model propagation. Thus, it prevents us from knowing the exact trajectory in advance, in contrast to straight-line models, and must be solved iteratively based on the preceding ray refractive index. Similar examples are given by Böhm and Schuh (2003) and Hobiger et al. (2008). We assume the ray path does not leave the plane of constant azimuth for a given elevation angle to a satellite. The out-of-plane contribution to the delay is thus neglected, making the propagation two-dimensional (hence 2-D). The real ray path is then approximated by a finite number of linear ray pieces in WGS84 coordinates using Euler's formula for the Earth radius: where A is the azimuth angle between a satellite and a receiver, and M and N are radii of curvature along meridian and prime vertical, respectively. We follow height-dependent increments as presented in Rocken et al. (2001): 10, 20, 50, 100 and 500 m, respectively, for geometric altitudes between 0 and 2, 2 and 6, 6 and 16, and 16 and 36 km and above 36 km, which require meteorological parameters to be vertically interpolated in order to obtain finer resolution. Both P and e are interpolated exponentially from the two nearest layers, while the temperature change is considered linear. Horizontally, we find the four nearest nodes for each ray to perform weighted mean interpolation, where the weighting function equals the inverse squared distance. The reference hybrid level of the ALADIN-CZ model is determined by surface geopotential, which is converted to geopotential metres by dividing the geopotential values by g 0 . Meteorological parameters are expressed on pressure levels which represent standard vertical coordinates. The hypsometric equation is used to calculate geometric thickness between consecutive isobaric surfaces: where R d = 287.058 JK −1 kg −1 is the gas constant for dry air, and T m is the mean virtual temperature of the layer between P 1 and P 2 pressure levels in Kelvin. The conversion from ALADIN-CZ vertical coordinate system to geometric altitudes is then consistent with the BIRA approach described in Sect. 4.2. In WUELS' solution, the signal tracking is performed exploiting a full model vertical resolution up to the uppermost ALADIN-CZ layer at 55 km. Above the top layer, we adopt the US Standard Atmosphere (1976) to provide supplementary meteorological data up to 86 km. For each ray-path coordinate, the refractive index is calculated as a function of P (in hPa), e (in hPa) and T (in K) with empirically derived "best available" coefficients k given by Rueger (2002). N = (n−1)×10 6 = k 1 ·(P −e)/T +k 2 ·e/T +k 3 ·e/T 2 (11) The contribution of water droplets and ice crystals in the atmosphere is neglected in the total delay. All tropospheric delays are traced with respect to vacuum elevation angles. The electromagnetic delay is calculated for a given chord length (s) using the mean refractive index n between two consecutive rays, yielding the total delay in metres: which can be separated on hydrostatic and wet part using respective refractive indices. Additionally to the radio path length, the accumulated bending effect (bend) along the ray path is added to the hydrostatic mapping function which, together with the wet mapping function, can be calculated as follows: where ele i is the elevation angle for a given model layer and ele k is the outgoing elevation angle at uppermost altitude.

Assessment of the hydrostatic, wet and hydrometeor contributions to the slant delays
ALADIN-CZ NWM has been used to estimate the hydrostatic, wet and hydrometeor contributions to slant delays. During the whole period of the benchmark campaign, the maximum contribution of hydrometeors reached 17 mm at the zenith during the extreme weather events on 20-23 June . The 2-D fields of ZTD, ZHD, ZWD and ZHMD (zenith hydrometeor delays) are presented in Fig. 1. They illustrate the large-scale convection with the presence of hydrometeors along the convergence line associated with a strong contrast of dry and wet air masses. The contribution of hydrometeors to ZTD reached up to 7 mm (as scaled in the zenith direction) for the stations POTS and POTM at 15:00 UTC on 23 June 2013 (see Fig. 1d). According to satellite trajectories at this time for the station POTS, a maximum SHMD of 25.6 mm is observed for a satellite at 22 • -elevation angle. Figure 2 shows simulated differential STDs for a cone with a 10 • -elevation angle during the severe weather condition of the 23 June 2013 and mapped in the zenith direction (at 90 • ) using the mapping functions of Eq. (1): mf h for SHD and mf w for SWD and SHMD. For this 10 • cone, the minimum present values of total, hydrostatic, wet and hydrometeors delays simulated at 15:00 UTC on 23 June are given as STDmin, SHDmin, SWDmin and SHMDmin in Fig. 2. The respective differences of STD, SHD, SWD and SHMD and corresponding minimum values simulated at 15:00 UTC are presented in Fig. 2. The anisotropic variation of total, hydrostatic, wet and hydrometeor delays can be visualized on a skyplot. As a confirmation of Figs. 1b and 1d, 2 shows weak hydrostatic anisotropy. This anisotropy (up to 5.8 mm) is almost the same as the hydrometeors one (up to 6 mm). The area within the red curve is larger than the purple area (hydrostatic anisotropy), meaning that the total effect of the hydrometeor anisotropy is slightly larger than the one from the hydrostatic component. Note that Fig. 2 shows the anisotropies simulated at 10 • and mapped at 90 • (giving an idea of the variations in the zenith direction). The largest anisotropy is clearly induced by water vapour (values up to 20 mm in the south-east direction of POTS, also shown in Fig. 1c). With mean hydrostatic and hydrometeor anisotropies oriented in the opposite direction of the wet one, Fig. 2 presents a total anisotropy with weaker values (up to 12 mm) than the wet anisotropy.
To complement the snapshot of Fig. 2 the time evolutions of SHD, SWD, SHMD and STD in the direction of all observed GNSS satellites for the station POTS are presented in Fig. 3. Slant delays have been simulated in the direction of observed satellites (hydrostatic, wet, hydrometeor and total contributions) and, to avoid the effect of the elevation and to look at the same order of magnitude of delays, corresponding delays in the zenith direction have been computed and mapped using mapping functions presented in Eq. (1) (mf h for SHD and mf w for SWD and SHMD). These values (STD min = 2310.6 mm, SHD min = 2240.8 mm, SWD min = 43.1 mm and SHMD min = 0 mm), obtained during the whole period of the benchmark campaign, have been subtracted from their corresponding values simulated in direction of satellites. Then, the differences have been mapped back at 90 • . For day of year (DOY) 174 (i.e. 23 June 2013), we can see a contribution of hydrometeors up to 10 mm. Looking at the whole period of the benchmark campaign, the variation ranges of STD, SHD and SWD (mapped at 90 • ) are 275, 80 and 230 mm, respectively. Figure 3 illustrates the use of GNSS delay observations in meteorology (detection of variation of water vapour represented by the wet delay, as well as detection of heterogeneities from hydrostatic delays and occasionally from hydrometeors in specific severe weather cases). Note that there are no data available for POTS station between DOY 121 and 125 and DOY 160 and 163. For this reason, we have not simulated the slant delays for this period, as shown by the gaps in Fig. 3. The simplified strategy used to simulate curve slant paths gives some inaccurate simulations of slant delays for elevations between 3 and 5 • , shown by isolated values in Fig. 3. Such inaccuracies could be avoided by using a ray-tracing algorithm. For a comprehensive overview on ray-tracing algorithms and comparisons the reader is referred to Nafisi et al. (2012).

Water vapour radiometer measurements
During the benchmark period, the WVR located at GFZ Potsdam operated in a mode that scanned the atmosphere at selected elevation and azimuth angles. The instrument is situated on the same roof as the GNSS reference stations POTM and POTS. All three devices are within 10 m from each other. The HATPRO WVR from Radiometer Physics was set up to scan the atmosphere to extract profiles of atmospheric temperature, water vapour and liquid water using frequencies between 22.24 and 27.84 GHz and a window channel at 31.4 GHz. The WVR switches between "zenith mode" when it measures IWV and "slant mode" when it tracks GPS satellites using an in-built GPS receiver. In the latter case, SIWV values are delivered for the direction of satellites. Since the instrument can track only one satellite at one moment the number of observations is quite limited compared to slants from GNSS that are simultaneously observed from several GNSS satellites.
Our study focuses on the comparison of STDs, not SIWV. It was thus necessary to convert the WVR SIWV into STDs.
Firstly, WVR observations with rain flag and atmospheric liquid water (ALW) values exceeding 1 kg m −2 were rejected. Both rain and high values of ALW can significantly distort the quality of WVR measurements. Secondly, SIWV values were converted into SWDs using the Askne and Nordius (1987) formula and the refractivity constants from Bevis et al. (1994). ZHD values were computed with the precise model given by Saastamoinen (1972). For the described conversions, we used values of the atmospheric pressure and temperature measured in situ of the GNSS reference station POTS. A hydrostatic correction for the altitude difference between the meteorological station and the WVR position was applied to the atmospheric pressure values. ZHD values were mapped to elevation angles of the WVR using the hydrostatic mapping function derived from the NCEP-GFS (Douša et al., Figure 2. Skyplot of differential slant delays simulated at 10 • and mapped at 90 • , for a 360 • azimuthal range (at 15:00 UTC on 23 June 2013). For total, hydrostatic, wet and hydrometeors delays, a differential slant delay is the difference between a slant delay simulated and the respective minimum value (obtained considering slant delays simulated at 10 • elevation along all the azimuthal directions). 2016). In order to convert accurately SIWV to STDs, we took into account the influence of the hydrostatic horizontal gradients (see e.g. Li et al., 2015b). We used the hydrostatic horizontal gradients derived from the NCEP-GFS for that purpose. Finally, SHD and SWD values were summed up to deliver STDs. The described conversion of WVR SIWV to STDs aimed to minimally distort the accuracy of original WVR observations.

Methodology of STD comparisons
We provide the specificities of each type of technique comparisons in this section. Since NWM outputs are restricted to the time resolution of their predictions (typically 1, 3 or 6 h) and, since WVR is able to track only one satellite at one moment, all three sources provide different numbers of STDs per day. Therefore, three different comparisons are presented: (1) results for GNSS versus GNSS comparisons, (2) results for GNSS versus NWM comparisons and (3) results for GNSS versus WVR comparisons. Section 7 presents the validation at individual stations and Sect. 8 intercompares results obtained at GNSS dual stations. All the given results are obtained over the whole benchmark period. No outlier detection and removal procedure was applied during the statistics computation within the study.
Two variants of the comparisons are presented: "ZENITH" and "SLANT". "ZENITH" stands for original STDs mapped back to zenith direction using 1/sin(e) formula. Such mapping aimed to normalize STD differences for their evaluation in a single unit. The "SLANT" type of comparison denotes an evaluation of STDs at their actual elevation angles. To be more specific, slant delays were grouped into individual elevation bins of 5 • ; i.e. for example all slants with an elevation angle between 10 and 15 • were evaluated as a single unit. There was one exception regarding the size of a bin since the lowest one contained slants from elevation angles of 7 to 10 • , 7 • being the lowest elevation angle common to all GNSS STD solutions. This cut-off angle was thus used in all GNSS versus GNSS and GNSS versus NWM comparisons.
Presented values of biases and standard deviations were computed directly from all STDs within the processed benchmark campaign period, and therefore they are not based on any kind of daily or other averaging. In some tables, only median values of bias and standard deviation over all GNSS STD solutions (Tables 5, 7 and 8) or over all processed stations (Tables 3 and 4) are given to consolidate the presentation of validation results. Median was used as a parameter minimally affected by outliers.

GNSS versus GNSS comparisons
In the case of individual inter-GNSS solutions validation, the situation was straightforward and no interpolation nor specific hypothesis was necessary: the comparisons were done on a direct point-to-point basis of observations coming from identical azimuth and elevation directions.

GNSS versus WVR comparisons
To find pairs of STDs observations between WVR and GNSS, the following rules were used: (1) the time difference between both observations had to be shorter than 120 s and (2) the difference between both azimuth and elevation angles had to be smaller than 2.5 and 0.25 • , respectively. From these criteria, the maximum difference in elevation angle has the largest impact on the number of observation pairs found. Hence the smaller values for these settings, the smaller number of pairs found and the smaller standard deviations resulted between GNSS and WVR STDs. As an illustration, a change from 0.35 to 0.25 • led to the decrease of the number of STD pairs between the GNSS GFZ solution and the WVR at station POTS from 63 703 to 48 583 pairs; the standard deviation of the projected STD differences in the zenith direction then decreased from 14.6 to 11.7 mm too.
Since the bias practically remained unchanged (−6.1 mm versus −5.9 mm), the applied selection procedure mainly influenced the stability of the comparison between WVR and other sources of slant delays. When comparing GNSS versus WVR STDs, a cut-off elevation angle was set to 15 • to exclude low-elevation angle observations from WVR as their quality could be further degraded by a ground radiation or other local environment conditions.

GNSS versus NWM comparisons
Given the very small distances between collocated antennas and the coarse resolution of the global NWM models, STDs from NWM ray tracing using the ERA-Interim and the NCEP GFS models were derived only for one of the collocated stations. The same set of NWM-derived STDs was then used for the validation of the results at the collocated receivers.
7 Results at individual stations

GNSS versus GNSS
The total of STD pairs available for this part of the validation is roughly 1.7 million and varies from 140 987 to 206 320 according to the station.

Evaluation of all GNSS solutions versus the reference GNSS solution
Individual GNSS solutions were first compared to the GFZ solution in the zenith direction (ZENITH). We chose the "GFZ" solution as the reference because GFZ Potsdam has long-term experience in producing GPS slant delays and because the GFZ near-real-time solution for German GNSS reference stations is already being operationally delivered to the Deutscher Wetterdienst (German Meteorological Service) for NWM assimilation testing purposes (Bender et al., 2016). Figure 4 shows all the solutions using STDs calculated from the estimated ZTD and horizontal gradient parameters, i.e. without adding post-fit residuals. Adding raw or clean residuals, applied consistently to both compared and reference solutions, provided very similar graphs (not displayed). Colours in Fig. 4 indicate the processing software used in individual solutions. Medians of all solutions (dotted lines in each bin) are displayed for each station in order to highlight differences among the stations. These were observed mainly as biases ranging from −3.6 to 0.6 mm. The better agreement between GOP and GFZ solutions could be attributed to a similar strategy of both solutions compared to others. It is particularly visible for LDB0 and POTM stations where median values over all solutions differ by −2.3 and −3.6 mm, respectively. The reason for the divergent behaviour at the two stations has not been identified although site metadata were cross-checked carefully. A significant difference can also be noticed for TUW_3 and TUW_7 at the station KIBG where these solutions used individual antenna calibration files while all others solution used type mean calibration (Schmid et al., 2016). However, plots with standard deviations show agreements within 3-5 mm among all the stations and all solutions. The only exception is the GOP_F solution representing a simulated real-time analysis applying only a Kalman filter (not backward smoothing) and providing results by a factor of 2 worse compared to the others in terms of precision.

Impact of post-fit residuals
All individual GNSS STD solutions were compared independently using none (nonRES), raw (rawRES) and clean (clnRES) residuals. The comparison aimed to assess the impact of different strategies for reconstructing GNSS STDs. Figure 5 displays biases and standard deviations for all solutions when comparing STDs with and without raw residuals. Similarly, Fig. 6 shows results for STDs with and without clean residuals. Both comparisons demonstrate biases at a sub-millimetre level over all stations and solutions. Smaller biases are, however, observed in the latter case (clnRES), which demonstrates the presence of station-specific systematic errors in raw residuals (over all days of the benchmark campaign) projected into zenith directions. Although the decrease of biases is visible for all solutions, several solutions (GFZ, GOP, WUE) resulted in almost zero values over all the stations. It could be attributed to easier removal of systematic effects in PPP as absolute residuals are accessible directly. This is in contrast to the DD solutions by ROB with ZD residuals reconstructed using relative information in original values. Interestingly, the TUW PPP solutions seem to perform similarly to the ROB DD solution in this case. Comparing standard deviations in both figures demonstrates that the impact of cleaning residuals led to the standard deviations reduced by the factor of 1.2-1.5 over all stations and solutions, namely reaching 2.5-4.5 mm for clean residuals compared to 3.0-6.5 mm resulting from raw residuals. The station-specific behaviour is more obvious for the latter rather than for the former and, generally, the relative performance over all stations is in a good agreement among different solutions applying clean residuals (see Fig. 6). In particular, LDB0 and LDB2 stations show high discrepancies for raw residuals (see Fig. 5), while their standard deviations were significantly reduced after cleaning the residuals becoming more homogeneous with other stations. In this context it should be noted that the station LDB0 is missing in both ROB solutions since it has been excluded from the network solution during the pre-processing phase due to a lower quality of observations. Besides the GOP_F demonstrating simulated real-time solution, showing about 25 % worse standard deviations compared to other solutions in Fig. 6, we can also observe by a 12 % worse performance of the GOP_S solution using forward filtering and backward smoother. Both can be attributed to the stochastic model applied in the GOP software with epoch-wise parameter estimation and partly also to remaining deficiencies in imple-  mentations of all applied models -the only in-house software has been developed from scratch recently and, in contrast to others, could not have been extensively used in a variety of applications. Finally, there are rather small differences observed due to the applied strategy, namely forward versus backward filtering, GPS versus GPS + GLO and the cut-off 3 versus 7 • for elevation angles (statistically compared for STDs above the elevation cut-off angle of 7 • ). Table 3 summarizes statistics related to the figures providing medians and standard deviations over all stations. Notably, biases of STDs (over all stations) expressed in the zenith direction are negligible in all solutions, i.e. not affected by adding raw or clean residuals. The impact of adding raw residuals to the estimated model can be characterized by the median standard deviation of 3.9 mm (first two data columns), which may vary for different stations, e.g. as evident for stations LDB0 and LDB2 in Figs. 5 and 6. Adding cleaned residuals shows an overall impact of 2.8 mm (middle data columns) corresponding to the reduction of 29 % compared to raw residuals and up to 50 % for problematic stations such as LDB0 and LDB2. The comparison is understood as the impact of removing systematic errors from the residuals -in other words, as a degradation of STD quality when applying uncleaned residuals due to the contamination by systematic errors. From this reason, we would not recommend adding uncleaned (raw) residuals, but cleaned only, when providing STDs from GNSS. However, this comparison does not suggest any preference for using the estimated model without residuals or for adding clean residuals to reconstruct STDs. Both approaches still comprise of various errors due to approximations, local environmental effects, instrumentation effects or applied models. Additionally, the impact of cleaning the post-fit residuals for the reconstruction of STDs can be characterized by a median standard deviation of 2.6 mm when projected into the zenith direction, roughly 25 mm at the elevation of 6 • , which is estimated from differences between STDs using raw and clean residuals over all solutions and stations (last data columns).

Evaluation of ZTD processing settings
Individual GNSS solutions also provided variants using the same software and strategy but with modified settings. This allows us to assess its impact on the estimated parameters; see Table 4. Consequently, we evaluated STDs calculated without residuals expecting the impact (mainly) on estimated ZTDs and horizontal gradients. Biases reached a submillimetre level and were almost insignificant, with the ex- ception of using GMF versus VMF1 mapping function resulting in a positive bias of +0.94 ± 0.28 mm over all days and stations. Studied effects were sorted by the magnitude of standard deviations. The impact of the elevation angle cutoff (3 • versus 7 • ) resulted in a median of standard deviation below 1 mm; see TUW_3 and TUW_7. In this regard, it is necessary to mention that the difference between those two solutions comes mainly from estimated horizontal tropospheric gradients since no STDs below 7 • entered the STD validation. The impact of cut-off angle is also dependent on number and quality of observations below 7 • used in TUW_3 solution. The use of mapping functions based on climatology (GMF) or meteorological (VMF1) data resulted in a slightly larger impact, at the level of 2 mm, which is similar as the impact found for using single (GPS) or dual (GPS + GLO) GNSS constellations. The use of different temporal resolutions of ZTDs and gradients could not be avoided among various contributions due to limited capabilities of handling a high number of parameters. An assessment of the temporal resolution is also influenced by applying relative constrains in deterministic approach or setting a noise level in stochastic process. We compared two solutions (ROB_V and TUO_R) using the Bernese software and DD method with the same settings but different temporal resolutions of ZTDs and gradients. The results show discrepancies at a level of 3 mm which could be partly explained by different sampling, but we also assume contributions from specific differences in strategies such as data pre-processing. Last but not least, the impact of using a Kalman filter for simulating real-time solutions compared to the back-smoother (offline) solution resulted in the discrepancies represented by a standard deviation of 4.8 mm. For example, when the solution from GFZ (taken here as the reference) was compared against TUO, the standard deviation was computed from all valid absolute differences given as

Evaluation in the slant direction
and normalized standard deviation from all valid relative differences given as Since STDs are reconstructed mainly from ZTDs and horizontal gradients, any small differences between the two solutions in the zenith direction should become much larger after mapping down to lower elevations. Therefore, higher values of bias and standard deviation are expected with the decreasing of elevation angle. Indeed, we found that the agreement among individual solutions compared to the GFZ STDs is rather stable above the elevation angle of 30 • . Corresponding biases of individual elevation bins are within ±4 mm and standard deviations are slowly increasing up to 10 mm at 30 • . With elevation angles decreasing below 30 • the biases slightly increase for some solutions. In terms of standard deviation, the presumption about the dependency of statistics on the elevation angle is clearly visible in the increasing errors with the decreasing elevation angles (Fig. 7) while following an exponential decay up to 45 mm at 7 • . Normalized standard deviation remains almost constant over all elevation angles, indicating a very consistent relative performance of STDs among all the solutions. A similar behaviour is present at all stations although the absolute values can be higher for some stations or solutions, namely GOP_F for LDB0 and WTZZ with standard deviations reaching up to 72 mm.

GNSS versus NWM
STDs from four individual NWM ray-tracing solutions delivered by three different institutions entered the validation (see Sects. 4.1-4.3 for more information). Even though the time resolution of NWM is not continuous (only NWMbased results given at 00:00, 06:00, 12:00 and 18:00 UTC were used), the comparison with GNSS STDs measurements can be used to estimate the quality of the weather prediction. However, when the meteorological situation is well simulated by NWM, it is relevant for this study to compare the model with GNSS observations. To ensure the consistency of the comparison, only epochs for which STD values were available in all GNSS solutions were considered; i.e. if a single STD value was missing in any GNSS solution, then the STD values at the same epoch were also removed from all other GNSS solutions. This selection of observations and the low time resolution of the NWM models (6 h) led to a restricted set of STDs available for the validation consisting of 9866 observations in total. Both stations are situated in the mountainous area south-west of Salzburg. Since the same biases occur at neither GNSS versus ERA/GFZ nor GNSS versus GFS/GFZ comparisons, they are most likely due to a deficiency of the ALADIN-CZ orography representation. Note that ALA/BIRA and ALA/WUELS STDs show an unexpected opposite behaviour for KIBG and SAAL stations (Fig. 8), which is related to the difference in the strategy used. This is possibly due to the estimation of the altitude of parameters, their interpolations and the difference in the step of integration. Except at those two stations, similar biases as for ALA/BIRA can be also found for the GNSS versus ERA/GFZ comparison, ranging from −3 to +7 mm (+11 mm at POTM). Although the bias characteristics for GFS/GFZ are practically identical to those obtained for ERA/GFZ, the results for the NCEP GFS model are shifted by approximately +5 mm, resulting in biases ranging from +3 to +12 mm (+17 mm at POTM). The origin of this systematic deviation was identified in ZWD values estimated from the GFS model  and understood as the effect of the lower vertical resolution of NCEP GFS model compared to other NWMs, leading to larger errors in vertical interpolations. Standard deviations between GNSS STDs and ALA/BIRA, ERA/GFZ and GFS/GFZ solutions are usually around 10 mm when projected into the zenith. Generally, they are higher than the comparison of individual GNSS solutions presented in Sect. 7.1 and they are also more station dependent. Degradations can be observed at mountainous stations KIBG and SAAL for the ERA/GFZ, GFS/GFZ and ALA/BIRA STDs, reaching standard deviations up to 18 mm in the case of the ERA-Interim NWM.

Evaluation of all GNSS solutions without residuals in the zenith direction
The solution of ALA/WUELS performed differently compared to all other NWM solutions. It is biased against GNSS solutions, with biases ranging from +9 to +25 mm and highest values observed at stations KIBG and SAAL. Standard deviation values are also much higher by about a factor of 2.5 worse compared to values obtained from the GNSS versus GFS/GFZ comparison. The probable reason for this is that signal tracking was performed for vacuum elevation angles. As we discuss in the following subsection this impact is especially visible at low-elevation ray paths at which the signal has to travel through the troposphere for a longer time, enhancing the negative effect of underestimated delays.
Finally, comparisons between the three versions of GNSS solutions (nonRES, clnRES, rawRES) and the ALA/BIRA, ERA/GFZ and GFS/GFZ NWM solutions were done to test the influence of post-fit residuals on GNSS STDs. The ALA/WUELS solution was excluded from this comparison because of the lower quality of its STDs. All GNSS solutions without post-fit residuals reached slightly lower standard deviation values than the solutions which included either raw or cleaned post-fit residuals, while differences in biases were negligible (not displayed). An average increase of standard deviation was 4.5 % for clean residuals and 8.3 % for raw residuals. Indeed, because of their low horizontal and time resolution, the used NWMs can barely capture the very finescale tropospheric structures which are supposed to be included in the GNSS residuals. As a consequence, this comparison does not allow us to draw a clear conclusion about the potential benefits of post-fit residuals in the reconstruction of the GNSS STDs.

Evaluation in the slant direction
Statistics from the comparison of ALA/BIRA, ERA/GFZ and GFS/GFZ against all three versions of GNSS GFZ solution expressed at original elevation angles of slant delays are presented for the station POTS in Fig. 9. Significantly higher biases can be found at the lowest-elevation bin in all three solutions and at all stations (not displayed). At some stations, sudden increases of bias at individual elevation bins were observed. They happened at any elevation angle (different for each NWM STD solution) and were particularly visible in terms of normalized bias. These sudden increases of the bias might be either because the model sometimes cannot render the tropospheric structures at their exact locations (unexpected location of high/low values of water vapour partial pressure) or because models running at these resolutions have a tendency to smooth out such tropospheric heterogeneities. Comparing with a model running at convectivepermitting scale (e.g. 1 to 4 km) would help to sort out if the origin of such behaviours is the NWM STD or the GNSS STD.
For all stations, standard deviations present the shape with significantly higher values at elevations below 30 • followed by more gentle decrease towards the zenith direction. An exception was found at stations WTZR, WTZS and WTZZ where a rather smooth shape of the curve is disrupted with sudden changes of standard deviation at particular bins over all elevation angles. This is true mainly for GNSS versus ALA/BIRA solution, while results for GNSS versus ERA/GFZ and GNSS versus GFS/GFZ results show such changes less frequently and with lower magnitude by a factor of 2 or 3. Normalized standard deviations vary at all elevation angles for all validated stations without distinct common characteristics. Values range between 0.2 and 0.9 % with the highest values occurring usually at lower-elevation angles.
Results from the GNSS versus ALA/WUELS solutions (not displayed) show an enormous increase of both absolute and normalized bias and standard deviations at low-elevation angles below 25 • at all stations. They reached biases up to 350 mm and standard deviations up to 300 mm at some stations. Statistical parameters became more stable above 25 • , with occasional disturbances similar to those observed in other NWM-based solutions.

Summary of results for GNSS versus NWM
A summary of the GNSS versus NWM validation is presented in Table 5. For each reference station a median of bias and a median of standard deviation in the zenith direction between all GNSS solutions and a particular NWMbased solution are given. If we consider ALA/BIRA and ERA/GFZ only, without the two mountainous stations KIBG and SAAL, absolute biases between NWM and GNSS solutions stay mostly below 3 mm, which represents a very M. Kačmařík et al.: Inter-technique validation of tropospheric slant total delays good agreement between these independent sources used for retrieving slant delays. Standard deviations generally range from 8 to 12 mm in the zenith projection, with the exception of ALA/WUELS, which shows lower precision by a factor of 2.5. Statistics stem from the complete benchmark period, and it should be noted that the daily variation of GNSS STDs was much lower than of NWM ray-traced STDs. Significantly higher values of biases and standard deviations were observed at particular days for NWM solutions. A detail evaluation of daily statistics with respect to the extreme weather conditions is one of the topics that we will study in future.   Fig. 8, indicates a common issue with the GNSS data processing at the station POTM. It was particularly increased for GOP_F, GOP_S and GFZ PPP solutions. Secondly, a bias of about 5.5 mm in the zenith direction can be found between WVR and GNSS solutions even at station POTS. This bias roughly corresponds to 1 kg m −2 of IWV, which can be considered the achievable accuracy of either of the two techniques; however, WVR accuracy is more dependent on a proper instrument calibration.

GNSS versus WVR
Values of standard deviation, resulting mostly in 12 mm, are higher than those observed in GNSS versus GNSS comparisons (Sect. 7.1) and slightly higher than from GNSS versus NWM comparisons (Sect. 7.2). A cut-off elevation angle of 15 • was used for the comparison with WVR STDs instead of 7 • used in other validations. Additionally, it has to be noted that the results can be partly influenced with the settings applied for finding pairs between GNSS and WVR STDs (Sect. 6). STDs from WVR can thus originate from slightly different azimuth/elevation angles and times than the GNSS ones. All GNSS solutions perform similarly against WVR, with the exception of GOP_F due to the application of a real-time capable strategy.
The GNSS versus WVR validation at the station POTS using original elevation angles is displayed in Fig. 11. Although some differences between GNSS solutions are visible, all of them performed in a very similar manner. The decrease of values of four statistical parameters strongly follows the increase of elevation angle and, generally, it is steeper than statistics dependency of GNSS versus NWM. It indicates that slant delays from WVR below 40 • become generally unreliable, which is particularly clear from normalized biases and standard deviations at lower-elevation angles. A sudden increase of the values is observed at elevations of 55-60 • , most likely originating from WVR observations which are not yet understood.
Generally, standard deviations for all solutions using cleaned residuals (raw residuals) are on average 1.7 % (3.8 %) higher than for the solutions without residuals. Differences between solutions variants are smaller due to an overall higher uncertainty of WVR observations, but the results are in a good agreement with those obtained for GNSS versus NWM comparisons presented in Sect. 7.1.

Validation of results at collocated stations
Two erroneous techniques for STD retrievals have been compared in previous sections (GNSS vs. NWM, GNSS vs. WVR) without knowing the true reference. The errors stem from the observation noise on one hand and from the processing models including the model for adjusted parameters on the other hand. From this perspective, the higher standard deviations for GNSS STD solutions applying clean residuals compared to those using adjusted GNSS parameters only (without residuals) do not necessarily mean the lower quality of the former. GNSS and NWM models with limited temporal and spatial approximations are not able to represent true signal tropospheric delays between a receiver and all visible satellites. The simplifications certainly result in better agreement of STDs without residuals in Eq. (1), but they hardly represent the true tropospheric path delays, deviating particularly during the events with high spatiotemporal variations in the troposphere.
For this reason, we assessed all GNSS solutions at the collocated (dual) stations because for such constellation we are able to provide troposphere-free differences of STDs to evaluate noise of GNSS STD retrievals. We particularly focused on days with a high variability in the troposphere selected from the benchmark period. Dual stations were available in the benchmark campaign at three different locations in Germany. The first two sites collocate twin GNSS reference stations (LDB0 + LDB2 and POTM + POTS) and the third location collocates three individual reference stations (WTZR + WTZS + WTZZ). Nevertheless, in the case of Wettzell, only results for WTZR+WTZS are presented due to their similarity with the two other combinations at the same place. Characteristics of the stations are summarized in Table 6.

Slant residuals and slant delay differences
STD validations in this paper were done for 2 months of the benchmark period during which heavy rain events occurred for some days, particularly 31 May-3 June, 9-11 and 21-26 June, all causing severe flooding in central Europe. During normal weather conditions, the tropospheric variation is reasonably smooth, meaning it can be well represented by GNSS STDs reconstructed from ZTDs and horizontal gradients. However, during high temporal or spatial variabilities in the troposphere, post-fit residuals certainly contain tropospheric signals which were not modelled. If they surpass the observation noise and other residual errors from GNSS models, cleaned residuals should be considered in the GNSS STD model as described in Eq. (1).
In order to initially address optimal STD modelling under different weather conditions within the benchmark period, we tried to identify days with a high variability in the troposphere. Daily standard deviations of cleaned post-fit residuals were computed individually for each day of the benchmark period, for every station and GNSS solution for 1 • -elevationangle bins. We studied their daily variations considering the GNSS model applied. If cleaned post-fit residuals consist of the noise of observations only, the variation in time should be negligible. However, the days showing significantly higher values, correlated at all collocated stations, indicate highly variable tropospheric conditions. Three such days were identified at LDB0, LDB2, POTM and POTS stations (31 May, 20 June, 23 June) and 2 days at WTZR and WTZS stations (19 and 20 June). They all very well correspond to the days initiating heavy precipitations in the domain . Typical differences between raw and clean residuals are displayed in Fig. 12 for all elevations during the normal day (19 June, DOY 170) and the day with high variability in the troposphere (DOY 171, 20 June) for LDB0, LDB2, POTM and POTS stations using GFZ solution. Obviously, the variability of clean residuals (black dots) and their 2σ envelopes are higher by a factor of 2 for the day of year 171 compared to 170. The variability is clearly visible over all elevations, but the increase is slightly higher at low elevations. The plots for these four stations clearly demonstrate the different quality of GNSS observations, particularly related to a multipath effect displayed by 2σ envelope (green curves). A low multipath is common to the stations using choke ring antennas, in our case POTS and LDB2, but LDB2 still suffers from unknown systematic effects at 35-55 • elevations. A very high multipath effect was observed at LDB0 station over all elevations. Variability of 2σ envelopes of clean residuals (red curves) indicates a higher sensitivity of clean residuals to the weather conditions compared to station selection and observation quality, thus suggesting a significant contribution from the troposphere to the cleaned residuals. In the same context, raw residuals show much higher sensitivity to the observation quality compared to different weather conditions, which is particularly true in the case of LDB0 and LDB2 stations.
Elevation-dependent differences of STDs using clean residuals (black dots) are displayed in Fig. 13 for the same days as in Fig. 12, selecting GFZ solution and station pairs WTZS-WTZR and LDB0-LDB2. Additionally, 2σ envelopes are plotted for differences without residuals (red  curves), clean residuals (green curves) and raw residuals (blue curves). Firstly, we note that STD differences are more or less similar for both days, i.e. not significantly different between days with normal and high variations in the troposphere, which is also found for other days of the benchmark period. It suggests that increased residuals in Fig. 12 for DOY 171 contain strong contributions from the tropospheric effect that could not have been assimilated into ZTDs and tropospheric horizontal gradients. An alternative explanation suggests a possible contribution of satellite-specific errors common to both receivers, thus easily eliminated in STD differences at the dual stations. However, systematic errors at satellites are well absorbed by initial phase ambiguities in PPP and short-term or random errors, e.g. due to satellite clocks, are in this study eliminated by the use of final products, i.e. stable enough to avoid observed day-to-day variability in cleaned residuals. The DOY 171 thus shows the situation when cleaned residuals contain a tropospheric signal that should be added to the STD retrievals. In the case of GFZ, the contribution from residuals is particularly important due to local troposphere variation in time when using model of piece-wise constant function with 15 min time resolution for ZTD and 60 min for horizontal gradients. It is not so obvious in the case of a stochastic process used for epoch-wise estimates of all tropospheric parameters. However, the uncertainty of estimated parameters is then higher compared to the deterministic model, which makes it more difficult to separate errors in estimated parameter and errors due to insufficiency of the linearized tropospheric model in time.
Secondly, we can see that envelopes of differences using raw residuals are always the largest ones. Raw residuals vary more with the elevation angle, which is particularly visible for differences between LDB0 and LDB2. Obviously, it is due to the large systematic errors at LDB0 station and additional contribution from LDB2 errors observed at 35- 55 • elevations. The 2σ envelopes of STD differences with clean residuals smoothly follows the 2σ envelope of STDs differences without residuals, keeping the difference within ±15 mm over all elevations. This indicates a stable and reliable usage of clean residuals under any conditions. In contrast, applying raw residuals at problematic sites may seriously degrade STDs as observed at LDB0 station. Finally, we can consider error contribution from both stations to STD differences at dual stations equal, i.e.
with δ 2 STD_dif variance calculated from cleaned STD differences at specific elevations when using the same processing strategy at both dual stations and with δ 2 STD_res characterizing the variance over errors in GNSS STD retrievals corresponding to the observation elevation angle and the applied strategy. Although we can note some differences in δ 2 STD_dif in collocations, partly due to differences in contributions from both stations, the relative performance of differences from STDs with clean residuals (green curves) and without residuals (red curves) for different days remains similar. Uncertainties of the simplified STDs at low elevations surpass additional uncertainties due to applying clean residuals (green curves vs. red curves). According to the magnitude of clean residuals at these elevations (Fig. 12), the small uncertainties from calculated differences indicate the presence of tropospheric signals in the residuals at low elevations, roughly below 30 • . It seems to be almost independent from the weather conditions and is supposed to represent mainly unmodelled horizontal asymmetry in the troposphere. However, further study on detail impact of residuals on GNSS STDs modelling during severe weather conditions requires longer datasets, which will be subject of our upcoming study. Figure 14 displays results for comparisons of individual dual stations in slant directions calculated from all days of the benchmark period. The same statistics and plots (not displayed) were prepared also for days identified with "severe" weather conditions, but only minor differences were observed. Strong variations are observed mainly in normal- ized biases over all elevation angles for the solutions using raw post-fit residuals (rawRES) regardless weather conditions. These are clearly related to local effects such as multipath or modelling instrument-related effects (phase centre offsets and variations) and disappear after using the cleaned residuals (clnRES). The standard deviations and normalized standard deviations at all stations are clearly the lowest for variants without using post-fit residuals (nonRES), slightly higher using cleaned residuals and significantly higher when using raw residuals, i.e. corresponding to above-performed inter-technique validations.

Differences in zenith direction
Tables 7 and 8 show the statistics expressed in the zenith direction for observations ranging in elevation angles from 7 to 15 and from 15 to 90 • , respectively. Median values computed over all GNSS solutions for which residuals were available are presented. Results for the identified days with high daily tropospheric variation are given in the upper part of the table -days are stated in the previous section. In the bottom part results for selected days with low daily variation of post-fit residuals are presented. These days were the same for all collocated stations: 25 May, 30 May and 6 June (DOY 145,150,157). Biases remain stable regardless of the severe weather occurrence and whether post-fit residuals are used. The lowest standard deviations for all dual stations are always related to the solutions without using post-fit residuals. Interestingly, when comparing the statistics for STDs evaluated separately for ranges of 7-15 and 15-90 elevation degrees, standard deviations are smaller in high compared to low elevations for variants without using residuals and vice versa for variants using either cleaned or raw residuals. This can be interpreted as the standard GNSS tropospheric model (ZTD and horizontal gradients) representing well observations at eleva- Figure 13. Elevation-dependent variability in STD differences of clean residuals (black dots) and their 2σ envelopes (green curves) are showed for 19 June (DOY 171) and 20 June (DOY 170) and two dual-stations: WTZS-WTZR and LDB0-LDB2. Plots also display 2σ envelopes for differences of raw residuals (blue curves) and without residuals (red curves).
tions above 15 • but suffering under the modelling deficiencies mainly at low elevations. These statistics also support the above statement that cleaned residuals are valuable particularly for reconstructing low-elevation STDs regardless of the weather conditions as they certainly contain non-negligible tropospheric signals from high-order horizontal asymmetry. During the days with high variation of residuals, the standard deviations are usually a little bit higher than during the days with low variation, but there is no difference between these two regarding the above-mentioned behaviour.

Conclusions
We presented results of validating tropospheric slant total delays obtained from GNSS data processing with those obtained from NWM ray tracing, WVR measurements and collocated GNSS stations, in search of the optimal method for estimating GNSS STDs. Ten GNSS reference stations were selected, exploiting data from the 56-day COST ES1206 benchmark campaign. Eleven GNSS solutions, four NWMbased solutions and one WVR-based dataset entered this validation study. Eight out of 11 GNSS solutions delivered STDs in three variants: (1) without post-fit residuals, (2) with raw post-fit residuals and (3) with cleaned post-fit residuals. The comparisons were carried out into two scenarios, firstly for STDs at their true elevation angles and secondly for STD differences mapped into the zenith direction using a simple mapping function.
Comparisons of STD solutions without residuals, with raw or with cleaned residuals were used to study the impact of different strategies for optimally retrieving STDs from GNSS. The impact of cleaning residuals led to the standard deviations reduced by a factor of 1.2-1.5 over all stations and solutions, namely reaching 2.5-4.5 mm in the zenith direction for clean residuals compared to 3.0-6.5 mm for raw residuals, the latter also being highly dependent on the station. The impact of adding raw or cleaned residuals was practically negligible in terms of biases, which always remained within ±0.1 mm.
Biases and standard deviations between GNSS and NWM solutions depended on applied ray-tracing method, NWM source and station location. Worse results, by a factor of 2.5 in terms of standard deviation, were observed for the ALA/WUELS solution originating from the deficiency of the applied ray-tracing method. Generally, biases in the zenith direction were below ±3 mm for other solutions with the exception of a positive bias of 5 mm observed for GFS NWM model. Standard deviations for all GNSS versus NWM STD comparisons were at the level of 10 mm, excluding the ALA/WUELS solution. Contrary to the GNSS versus GNSS comparisons, normalized standard deviations showed pronounced variability with the elevation angle.
Using the simulation of delays from ALADIN-CZ weather model, we illustrated the impact of the hydrostatic, wet and hydrometeors contributions to zenith and slant delays. These showed strong horizontal variations that allowed relevant characterization of mesoscale meteorological situations. Visualizing the slant anisotropic variation of total, hydrostatic, wet and hydrometeor delays in a common skyplot illustrated a weak hydrostatic anisotropy (up to 5.8 mm) that was almost the same as that of the hydrometeor (up to 6 mm). The largest anisotropy was induced by water vapour (up to 20 mm), but the total anisotropy was much weaker (12 mm) due to the compensation of mean hydrostatic and hydrometeor anisotropies oriented in the opposite direction. GNSS STDs from stations POTM and POTS were validated against collocated WVR observations pointed to GNSS satellites. A positive bias of about 5.5 and 10 mm was observed for POTS and POTM station, respectively. Standard deviations from comparisons of GNSS versus WVR STDs reached 12 mm in the zenith direction, thus higher compare to NWM solutions. Normalized standard deviations revealed a strong elevation dependency, indicating the WVR observations lack this quality at low elevations, particularly below 40 • .
Collocated GNSS stations at three different locations were used to evaluate the quality of GNSS STD retrievals applying statistics over troposphere-free STD differences from theoretical point of view. We could observe strong systematic errors in raw residuals at any elevation angles, particularly at stations without the choke ring antenna, such as LDB0 and POTM. We found a strong elevation dependency of bias when using raw residuals which almost vanished when cleaning the residuals from visible systematic errors. This suggests that the use of raw residuals should not be recommended, at least not without any information about possible systematic errors. Although the simplified STDs reconstructed from the estimated GNSS tropospheric parameters performed the best in all the comparisons, it obviously missed part of tropospheric signals due to non-linear temporal and spatial variations in the troposphere. By identifying low and high variability in the troposphere during all days in the benchmark period, we showed that residuals contain significant tropospheric signals in addition to the simplified model, particularly during high variability in the troposphere. Additionally, we also identified tropospheric signals at low elevations due to a non-linear horizontal asymmetry in cleaned residuals regardless of the station selection and the quality of its observations. From such findings, we recommend the use of cleaned residuals for an optimal STD retrievals from GNSS, at least for low-elevation angles and during high variability in the troposphere. We also have not seen any obvious degradation of STD retrievals in other conditions.
The better inter-solution and inter-technique agreements of STDs without residuals compared to those using clean residuals are attributed to the too-simple tropospheric model resulting in smooth and robust STDs and, consequently, not containing all interesting signals from the troposphere. The majority of evaluated GNSS solutions used deterministic models with rather long validity of estimated tropospheric parameters for which the residuals are important to overcome modelling deficiencies of low-resolution parameter estimates in time. Our future study will focus on the evaluation of GNSS STDs estimated using a stochastic process easily applicable in real time and on a long-term evaluation of azimuthal dependency of post-fit residuals under severe weather conditions. Data availability. GNSS data from the EUREF Permanent Network (EPN) stations are freely available through the anonymous FTP, e.g. from the EPN historical data centre at ftp://epncb.oma.be/ pub/obs/ maintained by the Royal Observatory of Belgium. Other GNSS and WVR data were primarily collected for the purpose of the COST Action ES1206 (GNSS4SWEC project; see Dousa et al. 2016) and cannot be distributed. Numerical weather model data fields from ALADIN-CZ were provided by the Czech Hydrometeorological Institute only for the purpose of the GNSS4SWEC project. Data were available only to the project members until the end of 2017 through the licence signed by the Research Institute of Geodesy, Topography and Cartography (RIGTC). The ERA-Interim data from the European Centre for Medium-Range Weather Forecasts (ECMWF, http://www.ecmwf.int/) were provided to GFZ by the German Weather Service. The Global Forecast System data were provided by the National Centers for Environmental Prediction (http://nomads.ncdc.noaa.gov/data/gfsanl). All the validation results in the form of figures and tables for all types of presented comparisons and stations can be provided by request to michal.kacmarik@vsb.cz.
Competing interests. The authors declare that they have no conflict of interest.