A new voxel-based model for the determination of atmospheric weighted mean temperature in GPS atmospheric sounding

The Global Positioning System (GPS) is a powerful atmospheric observing system for determining precipitable water vapour (PWV). In the detection of PWV using GPS, the atmospheric weighted mean temperature (Tm) is a crucial parameter for the conversion of zenith tropospheric delay (ZTD) to PWV since the quality of PWV is affected by the accuracy of Tm. In this study, an improved voxel-based Tm model, named GWMT-D, was developed using global reanalysis data over a 4-year period from 2010 to 2013 provided by the United States National Centers for Environmental Prediction (NCEP). The performance of GWMT-D was assessed against three existing empirical Tm models – GTmIII, GWMT-IV, and GTm_N – using different data sources in 2014 – the NCEP reanalysis data, surface Tm data provided by Global Geodetic Observing System and radiosonde measurements. The results show that the new GWMT-D model outperforms all the other three models with a root-meansquare error of less than 5.0 K at different altitudes over the globe. The new GWMT-D model can provide a practical alternative Tm determination method in real-time GPS-PWV remote sensing systems.


Introduction
Water vapour (WV), a component of the Earth's atmosphere, plays a crucial role in global atmospheric radiation, energy equilibrium, and the hydrological cycle (Wang et al., 2007).Since the Global Positioning System (GPS) became fully operational in 1994, it has been possible to use GPS measurements to retrieve precipitable WV (PWV) information in the atmosphere (Duan et al., 1996).The main advantages of using GPS technique to derive PWV are its high quality, wide coverage, and all-time availability under all-weather conditions.These features are vital for meteorological applications of GPS such as the prediction of short-term rainstorms and rainy seasons (Song et al., 2003;Zhang et al., 2007) and the monitoring of severe weather events, including thunderstorms, hailstorms, strong winds, and hurricanes (Choy et al., 2001;Zhang et al., 2015).
PWV is defined as the equivalent height to a column of liquid water.GPS-derived PWV values above a given GPS station, i.e.GPS-PWV, are converted from the zenith tropospheric delay (ZTD) estimated from GPS measurements.The GPS-PWV can be used to compare different techniques of WV detection, such as radiosonde, WV radiometer, Moderate-Resolution Imaging Spectroradiometer (MODIS), and sun photometer (Yang et al., 1999;Li et al., 2003;Prasad and Singh, 2009;Kwon et al., 2010).It can also be used for evaluating improvements in numerical weather prediction (NWP) systems (Gutman and Benjamin, 2001;Song et al., 2004).Moreover, temporal and spatial variations in PWV can be precisely identified using GPS-PWV over GPS networks (Champollion et al., 2004;Jin and Luo, 2009;Van Baelen and Penide, 2009).
The GPS-derived ZTD (GPS-ZTD) generally consists of two components: the zenith hydrostatic delay (ZHD) and the zenith wet delay (ZWD).The ZWD is caused by WV in the atmosphere below ∼ 10 km height (Saastamoinen, 1972).Furthermore the ZWD can be converted to PWV by multiplying it with a dimensionless conversion factor.This factor is a function of atmospheric weighted mean temperature (T m ), as expressed below (Askne and Nordius, 1987;Davis et al., 1985;Jade et al., 2005). (1) where is the conversion factor; ρ w and ρ v are the density of liquid water and WV, respectively; R v is the specific gas constant for water vapour; k 2 and k 3 are the atmospheric refractivity constants given in Bevis et al. (1994); and T is the absolute temperature of the atmosphere in kelvin (K).Using the ideal gas law for water vapour, ρ v can be written as ρ v = e T /R v , where e is the WV pressure in hPa (Picard et al., 2008).The heights of the tropopause and the GPS station are h T and h, respectively.Note that both PWV and ZWD are in millimetres in Eqs. ( 1)-( 3).
The T m over a GPS station or at any given point can be accurately determined using a ray tracing method that takes profiles of atmospheric temperature and WV pressure.However, atmospheric profiles are almost impossible to obtain in real-time/near-real-time (RT/NRT) (Wang et al., 2016).Traditionally the following two alternatives have been used in practical applications: the Bevis formula (T m = a + b • T s where T s is the atmospheric temperature) and the empirical model (Bevis et al., 1994;Ross andRosenfeld, 1997, 1999).
In the Bevis formula T m = a + b • T s , the coefficients (a and b) are season and location dependent and can be estimated from meteorological measurements (Wang et al., 2011;Bevis et al., 1992;Schueler et al., 2001;Mendes et al., 2000;Emardson and Derks, 2000).The root-mean-square error (RMSE) of T m from the Bevis formula is 2-5 K. Nevertheless, the Bevis formula becomes invalid when in situ temperature measurements are unavailable at some GPS stations, which could adversely affect the continuous operation of a RT/NRT GPS-PWV remote sensing system.Therefore, an empirical T m model, fed only by coordinates of the site and the time, is essential for ground-based GPS atmospheric sounding.Although the accuracy of empirical models has been shown to be lower than that of the ray tracing method and the Bevis formula, it is still used as a practical alternative to calculate the T m in RT/NRT if meteorological measurements are unavailable.
Table 1 summarises existing empirical T m models developed in the last decade.The data source column presents the type and time span of the data used to develop the models: NCEP-DOE Atmospheric Model Intercomparison 2 data (NCEP2) released by National Centers for Environmental Prediction (NCEP), ERA-Interim data from the European Centre for Medium-range Weather Forecasts (ECMWF), and the Global Geodetic Observing System (GGOS) data generated from ECMWF reanalysis data.
Building upon the global pressure and temperature (GPT) model proposed by Böhm et al. (2007), Yao et al. (2012) developed the season-specific Global Weighted Mean Temperature (GWMT) model based on radiosonde data of 135 global stations in the period 2005-2009.The RMSE of T m over the ground is around 4.6 K.However, due to its poor performance in the southern Pacific Ocean, the coefficients were recalculated for an updated model -GTm-II using T m over the ocean calculated from the Bevis formula, where T s is derived from the GPT model (Yao et al., 2013).This GTm-II model was further improved into GTm-III using GGOS surface T m by taking semi-annual and diurnal variations of T m into account (Yao et al., 2014a).In the later GWMT-IV model, the T m lapse rate is considered as a function of geodetic coordinates instead of a constant value adopted in the former models (He et al., 2013).In a study by Chen et al. (2014), the non-linear model in GTm-III was expressed into a linear one and developed it further into GTm_N.Unlike the spherical harmonics applied in GTm_N, Chen and Yao (2015) developed the GTm-X model based on the semiannual and diurnal variations in T m with a global resolution of a 1 • × 1 • geographical grid.More details for these three models can be found in Appendix B. Note that UNB3m and GPT2w are not specific T m models even though they can output T m values (Leandro et al., 2008;Böhm et al., 2015).In addition, no comprehensive intercomparison with the same reference T m has been carried out among existing empirical T m models.
However, given that the diurnal variation and the lapse rate of T m are either ill-modelled or neglected in most of these empirical models, this study presents a recent development towards an improved T m model (i.e.GWMT-D).This paper is structured as follows.Section 2 reviews the data sets used for determination of T m and validation of empirical T m models.Section 3 presents the new model GWMT-D and its procedure of calculation.The performance of the new model GWMT-D is assessed against three other selected models using reference T m derived from 2014 NCEP2, radiosonde, and GGOS data in Sect.4, followed by conclusions in Sect. 5.

The determination of T m
Three data sets with various temporal and spatial resolutions are used to calculate T m : NCEP2 reanalysis data, GGOS data, and radiosonde measurements.NCEP2 data in the period 2010-2013 are used to develop the new GWMT-D model, while all these three data sets in 2014 are used to evaluate GWMT-D as well as the other three selected empirical T m models.a The inputs are day of year (DOY), hour of day (HOD), latitude (ϕ), longitude (θ), and surface height (h) of a site; the values in the surface error column are the RMSEs of the model on the surface given by the authors, except for the 4.0 * of GPT2w, which is a post-fit standard deviation according to the reference.

NCEP2 data
A state-of-the-art analysis and forecast system has been used to assimilate multi-source data since 1948 and the NCEP2 data set is an updated version from its former reanalysis data (available on www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.pressure.html)(Kanamitsu et al., 2002).The NCEP2 global reanalysis data set has a latitude and longitude grid resolution of 2.5 • × 2.5 • at 17 different pressure layer from 1000 to 10 hPa and also has a temporal resolution of 6 h (namely, at 00:00, 06:00, 12:00, 18:00 UTC).In this study, temperature, geopotential height, pressure, and humidity from NCEP2 data over the period of 2010-2014 are used for the development and validation of the new GWMT-D model.

Radiosonde data
Radiosondes released from ground-based stations can directly measure the atmospheric profiles.Radiosonde records from 585 global Integrated Global Radiosonde Archive (IGRA) stations (Fig. 1) in 2014 are utilised to validate the new GWMT-D model.They are retrieved from the upper-air archive at the website of University of Wyoming (available on http://weather.uwyo.edu/upperair/sounding.html).The daily observations at a site usually consist of 1-4 radiosonde observations, containing pressure, temperature, geopotential height, dew point temperature, relative humidity (RH), and mixing ratio at the surface, tropopause, and standard pressure levels (i.e. 1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, and 10 hPa) (Wang et al., 2005).T m values are obtained through numerical integration (see Appendix A) under the assumption that the collected pressure, temperature, and humidity measurements are along the zenith direction, even though radiosonde balloons often drift away from the vertical direction, especially in windy conditions.
In addition, raw radiosonde measurements are rejected as outliers in the data preprocessing under the following conditions: 1. the height of the first data record in the profile is greater than 20 m above the ground; 2. the difference in height between two successive pressure levels is greater than 10 km; 3. the gap between two successive atmospheric pressure levels is greater than 200 hPa; 4. the total number of valid radiosonde levels is less than 20; 5. the highest humidity level (at the pressure level of 200-350 hPa) is less than µ − 4σ (where µ and σ are the mean height of the tropopause and its standard deviation) obtained from an empirical model (Liu, 2015); 6. the height of the last data record in the profile is lower than 20 km.

Surface T m from GGOS Atmosphere
In this study, global surface T m values are used for the validation of the new GWMT-D model.GGOS Atmosphere provides the daily global surface T m with a horizontal resolution of 2 • × 2.5 • (latitude and longitude) at 00:00, 06:00, 12:00, and 18:00 UTC (available on http://ggosatm.hg.tuwien.ac.at/DELAY/ETC/TMEAN/).This data set has been applied in the development of GTm-III and will be also used in the performance assessment of this study.Nevertheless, the discrepancies between these different data sets are noticeable and may affect the validation results, which will be shown in Sect. 4.

GWMT-D model
The NCEP2 data from the 4-year period 2010-2013 are employed to develop the new GWMT-D (D stands for diurnal variation) model.All global T m values with a latitudinal and longitudinal grid resolution of 2.5 • at 17 pressure levels are first calculated (see Eq. 3).For more details of the calculation, refer to Appendix A. Note that the geopotential height in the radiosonde and NCEP2 data needs to be converted to ellipsoidal height (refer to Appendix A), which is simplified as "height" hereafter.

Improvements in GWMT-D
Compared with other empirical T m models, the improvement achieved by the new GWMT-D model are the modelling of diurnal variation and lapse rate in T m .The T m lapse rate in this paper is the decreasing rate of T m (Bevis et al., 1994;Yao et al., 2012).The heights of 17 pressure levels in NECP2 data are time-variant.In order to investigate a time series of T m for a specific location, NCEP2-derived T m at the pressure level for each grid points are first interpolated at four selected reference heights: 0, 2, 5, and 9 km.The spline interpolation is carried out in this procedure to avoid the Runge's phenomenon (Fornberg and Zuev, 2007).Also the T m time series at each of the reference times (00:00, 06:00, 12:00, 18:00 UTC) of day for a specific reference location are assumed to follow a seasonal cycle: where DOY is the day of year, α 1 is the annual mean value; α 2 and α 3 are the coefficients of the annual variation, and α 4 and α 5 are that of the semi-annual variation.Here the reference locations mean the geodetic coordinates of the grid points on the four reference height levels.These coefficients are estimated from the time series of T m values at the specific reference time using the leastsquares method.The new model is developed upon a fourdimensional (4-D) global T m field with a resampled horizontal resolution of 5 • × 5 • at the four reference height levels and the four reference times.The voxel-based feature of GWMT-D's coefficients is its primary difference from the other models.

Diurnal variation
Annual and semi-annual variations in a NCEP2-derived T m time series for a reference location can be detected using the spectrum analysis (Chen et al., 2014).Although simple sine and cosine functions have been widely used to model the diurnal variations of T m , few studies have been conducted to Figure 2 shows an example of the diurnal variation at 2 km above the ground for 30 • N, 5 • W. It clearly shows that, on the one hand, the diurnal variation in T m is season and location dependent and is difficult to be modelled through simple trigonometric functions as used in the GTm-III model (see Appendix B).On the other hand, the plot of power spectrum density (PSD) in Fig. 2d shows that GWMT-D efficiently captures the diurnal variations in T m but GTm-III does not.This study takes into account this feature by modelling T m at each of the four reference times so that T m values at any other times can be obtained by the spline interpolation.

Vertical lapse rate of T m
The T m lapse rate along the vertical direction can be affected by several factors, e.g. the moisture content of air, atmospheric pressure and the surface height.Figure 3 illustrates the global distribution of annual mean T m lapse rate in the height layer from the ground up to 2 km in 2013.It shows that global annual mean T m lapse rate varies with latitude and land-sea distribution.Therefore, it is essential to consider season-and location-dependent T m lapse rate in the development of empirical T m models.More results can be found in Sect.4.3.
Four specific reference height levels (0, 2, 5, and 9 km) are selected covering most of the troposphere in the new GWMT-D model.All global T m values on these reference heights are first calculated (see Eq. 3).Then T m value for any other heights can be obtained by piecewise linear interpolating its two nearest height levels.This improvement is a distinguished feature of the new model in comparison with the aforementioned empirical models where a constant T m lapse rate is adopted for the different heights over the globe.This step has been taken for each reference time.

Data span used in T m modelling
Another important task is to determine the optimal length of reanalysis data required for the development of empirical T m models.Long-term T m data (> 10 years) over the globe can be used for climatological analysis, but the temporal correlation of T m time series may be too weak to be considered in the T m modelling process.However, a set of short-term T m data (< 1 year) may be insufficiently for reliable results.
Different sets of coefficients of the GWMT-D are calculated using the NCEP2-derived T m data for a period of 1 (2013) to 9 years (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013).The GWMT-D with different sets of coefficients are compared with 1-year T m time series at five pressure levels (1000, 925, 850, 700, and 600 hPa) derived from NCEP2 data in 2014.Table 1 lists the statistical results of the comparison.In this research, the NCEP2 T m time series from the 4-year period are adopted to develop the GWMT-D model for its best-fitting results (shown in bold fonts).
2. For each of the four reference time, a vertical interpolation is performed for the four grid points at the height of h (four corners of the dashed rectangular in Fig. 4).At each grid point, T m is linearly interpolated from the T m values on the two nearest reference height levels h l and h l+1 : 3. The T m values on the four corners are horizontally interpolated to obtain the T m at the target point (ϕ, λ, h) using p = (λ − λ j )/5, q = (ϕ − ϕ i )/5, (8) Now T m values of the target point are determined on the four reference times of the day (i.e.00:00, 06:00, 12:00, and 18:00 UTC).All of notations in Eqs. ( 7) and ( 8) can be found in Fig. 4. The number "5" in Eq. ( 7) is the horizontal resolution of the new model.
4. After the aforementioned spatial interpolations, a spline interpolation in the time domain is carried out to find the T m of the target location (ϕ, λ, h) for the specific time of the day t k using the T m values from the previous step.

Validation of T m models
Different empirical T m models (   et al., 2014a).Consequently, cross comparisons of these accuracy values for their performance may not be appropriate.In this study, the performance of three selected empirical T m models and the new GWMT-D model is assessed using the same reference T m values derived from NCEP2, GGOS, and radiosonde data sets.Due to the fact that GTm_X is unavailable to the public and GWMT and GTm-II have been proven inferior to GTm-III, GWMT-IV, and GTm_N, only GTm-III, GWMT-IV, GTm_N, and the new GWMT-D model are assessed.The methodologies for obtaining T m from NCEP2 radiosonde data sets are given in Appendix A. The two statistical quantities used to measure the performance of these models are bias and RMSE, which are calculated by where T C i m and T i m are the T m values from the models and the reference, respectively, and N is the number of the samples.

Comparison with NCEP2 data
The globally mean biases and RMSEs of the differences between NCEP2-derived and model-derived T m at all global grid points are summarised in Table 3.Only the results of pressure levels of 925 hPa (∼ 0.6 km) and 600 hPa (∼ 5 km)   −1.31 [−5.19, 9.63] 98.0 3.91 [1.26, 15.38] 77.1 GWMT-IV −1.89 [−11.40, 4.77] 96.2 4.36 [1.36, 14.61]  are given since similar results can be obtained from the new GWMT-D model on the other pressure levels less than 600 hPa (refer to Table 2).One can find from Table 3 that GWMT-D significantly outperforms all the other three empirical models.
Figures 5-6 illustrate the distribution of the RMSE (not the mean RMSE of all grid points listed in Table 3) of the differences between the T m derived from the models and the NCEP2 data in 2014 on two pressure levels.Figures 5d and  6d present the best agreement for GWMT-D over the globe.On the pressure level of 925 hPa, more than 91 % of the grid points had RMSE less than 5 K, compared with 77 % from GTm_N and GTm-III, and less than 71 % from GWMT-IV.Whilst on the 600 hPa level, GWMT-IV is worse, especially in the Arctic Circle.The RMSE values of GWMT-D ranged from 1.27 K to 11.55 K, outperforming the other three mod-els with a global average RMSE of only 4.73 K and an approximately 25 % improvement over the other models.
It is worth pointing out that all these four models have relatively low RMSE values near the tropical areas, and all have a similar performance globally except for the Antarctic.This finding is consistent with recent studies, (e.g.He et al., 2013;Chen et al., 2014;Yao et al., 2014a).It may be explained by the fact that the T m on the pressure levels under the terrain of Antarctica (∼ 6 km) may contain large systematic biases caused by the extrapolation of actual meteorological records.

Comparison with GGOS data
The GGOS surface grid T m data in 2014 are used as the reference in this section to evaluate the performance of the four models.The statistical results of the four selected models are shown in Table 4 and Fig. 7.The GTm-III performed the best this time because the GGOS Atmosphere data derived from ECMWF reanalysis data are used in the development of the GTm-III.From Table 3, GWMT-D is almost unbiased while the GTm-III showed a bias of −1.25 to −1.31 K in comparison with the NCEP2-based T m .In contrast, a bias of about +1.2 K (warmer) compared to the GGOS-based T m is found with GWMT-D (see Table 4).This discrepancy of 1.2 K between the NCEP2-derived and GGOS-derived T m may result from differences of NWP systems, e.g.different observations, physical models, data assimilation processes, and boundary conditions (Buizza et al., 2005).Nevertheless, the good performance of GWMT-D indicates that the modelling method of T m can significantly improve the model's accuracy.Figure 7d illustrates that GWMT-D has RMSE values of less than 6 K at most grids, except the areas in the Antarctic, northeast North America, and Middle East (6-10 K).

Comparison with radiosonde data
These four empirical T m models of interest are also evaluated using independent measurements (i.e.radiosonde).A number of comparisons are carried out in this section: 1. Surface T m values calculated from radiosonde measurements are used as the reference to compare with various model-derived surface T m .
2. T m derived from the GWMT-D and three other selected empirical models is compared with radiosondederived T m to investigate models' performance in different heights.
3. The accuracy of the T m models in different seasons is also investigated.
Figure 8 illustrates the RMSE of model-derived surface T m in 2014 at the 585 selected radiosonde stations.The spatial (horizontal) variation in the accuracy of these models can be seen from this figure.An accuracy of better than 8 K has been achieved at most stations for the GWMT-D (Fig. 8d) and a similar accuracy can be achieved by the GTm_N as well (Fig. 8c).These two models outperformed the other two models, GTm-III and GWMT-IV, especially in the Middle East, Siberia, and South Africa regions.
Figure 9 shows the histogram of the difference (i.e.modelderived T m minus radiosonde-derived T m ) at all heights from 0 to 9 km in terms of the mean, standard deviation, median, and mode values.As one of the new trials in this study, we use mode and median values to estimate the sample bias.The  main advantage of using mode and median values is that they are more robust than the arithmetic mean value, especially in skewed distributions (see Fig. 9b and c).As a result, a warm (cold) bias of 3.8 K (−4.4 K) can be found in the GTm_N (GWMT-IV).The histograms of both GTm-III and GWMT-D (Fig. 9a and d) are normally distributed and the GWMT-D is slightly better than the GTm-III.
The entire radiosonde-derived T m is grouped into three height intervals 0-2, 2-5, and 5-9 km according to their station heights.The accuracy comparisons between the GWMT-D and other models in different height intervals are listed in Table 5.It can be concluded that the accuracies of all the models except for the GWMT-D are significantly degraded with the increase of the height of the site.In contrast, it shows that the accuracy of GWMT-D is nearly stable in three different height ranges.Compared with the GTm_N model, the better performance of GTm-III may result from the fact that GGOS T m , which was derived from ECMWF reanalysis data, is more consistent with the radiosonde data than the NCEPderived T m .The RMSE values of GWMT-D, GTm-III, and GTm_N are plotted in the Fig. 10 as a function of height relative to the ground surface in order to reveal the represen-tative effect of terrain on the models.The GWMT-D model's RMSEs are in the range of 4-5 K, while the other two models have large RMSE values at high altitudes.It is noted that the GWMT-IV model is excluded due to its poorer performance shown in Table 5.It shows that the accuracy of GWMT-D is better than 5 K, even at the top of the troposphere.
Figure 11 shows the monthly or seasonal performance of these four selected models.The monthly-mean RMSEs of all the models vary with month (or season) and only the GTm_N shows a variation pattern opposite to that of the other three models.The GWMT-D and GWMT-III present very similar results in both pattern of variation and monthly-mean RM-SEs.The GWMT-IV performs the worst and GWMT-D performs the best among all these four models.

Impact of T m on GPS-derived PWV
The purpose of determining T m is to convert the ZWD of GPS signals to PWV.From Eq. ( 1), the relationship of the RMSEs between T m and PWV can be obtained as

RMSE PWV
where the three RMSEs are defined for the differences between observed and true values (more details see Appendix C) and the relative error of PWV can be defined as RMSE PWV /PWV here.bly due to the systematic difference between NCEP2 and ECMWF reanalysis data.This difference is significant, especially for the Antarctic.These comparisons also confirm that the piecewise linear interpolation of T m used in GWMT-D is better than the direct modelling of T m lapse rate in GWMT-IV and the constant-value method in GTm-III and GTm_N.An approximate 0.3 mm global mean RMSE in PWV read as 1.3 % relative error will be brought about by GWMT-D for ground stations.
It is suggested that the sets of coefficient for empirical T m models (e.g.GWMT-D) need to be redetermined regularly using state-of-the-art data sets with appropriate length of time periods.The new GWMT-D model is an essential alternative T m determination method to RT/NRT GPS PWV remote sensing system.The continuous operation of this system can be maintained even if in situ meteorological measurements on the GPS station are unavailable.Further inter-comparisons are recommended between empirical T m models and other measurements over the ocean (for instance with the Constellation Observation System of Meteorology, Ionosphere, and Climate, COSMIC), since radiosonde measurements used in this study are mainly collected over the land.An optimal integral approach for the determination of T m and quantification of the stochastic characteristics of T m time series are the main focus of our future work.data sources and improved methodologies (Yao et al., 2015(Yao et al., , 2014a(Yao et al., , b, 2013(Yao et al., , 2012)).
The GWMT model was based on spherical harmonics of degree nine and order nine and is a function of the geodetic coordinates of the site, as expressed below: ) where the globally mean lapse rate of T m and α 2 is −4.1 K km −1 ; ϕ, λ, and h are the latitude, longitude, and height of the site, respectively; DOY is the day of year; P nm is the Legendre function; A i nm and B i nm in Eq. ( 6) are two coefficients estimated from the least-squares estimation.
The GTm-II and GWMT models are developed using the same methodology but with different data.
Considering the semi-annual and diurnal variations in T m , the GTm-III model can be expressed as where HOD is the hour of the day.The coefficients α i (i = 1, 2, . . ., 3) are expended to spherical harmonics similar with the case in GWMT and GTm-II.
Since the adjustment model in Eq. (B4) for the GTm-III is non-linear, the coefficients determined may be unstable or biased.Chen et al. (2014)  All the aforementioned models are based on such an assumption that the vertical lapse rate of T m is the same over the globe, i.e. the α 2 in these equations are constants.In fact, this assumption is not always true (He et al., 2013).Therefore, the horizontal variation of T m lapse rate (β) is considered in the GWMT-IV model.It is a function of the horizontal location.Thus, the global T m at the height of h can be expressed as a function of the mean sea level T m (T 0 m ) and β in Eq. ( 9), both of which can be further separated into annual and semiannual components.Parameters of both amplitude and initial phase for annual and semi-annual variations are similarly expended into a spherical harmonics form.
Appendix C: Approximated propagation of RMSE Given a series of observations V collected at the same time (Ning et al., 2016): where M is a time-independent bias (systematic error); V is the true value of observations, V i ; and ε i is the zero-mean stationary Gaussian random error.Hence, the RMSE of the difference between estimates and true values is given by where N is the total number of observations.Since the mean value of ε i will be close to zero for massive repeated observations, Eq. ( C2) can be approximately reduced to where σ ε is the standard deviation of ε.As can be seen from this equation, the RMSE will be identical to standard deviation when observations are free of systematic bias.Consider a linear or non-linear function W = f (V ) : R −→ R whose RMSE can be expressed by Using first-order Taylor expansion, we have Substituting Eq. (C5) into Eq.(C4), As a result, the RMSE of f (V ) can be approximately propagated from that of V .

C
Figure 1.Distribution of the 585 radiosonde stations selected to validate the new GWMT-D model (only those data that pass a quality check are used).

Figure 2 .
Figure 2. Statistical results of diurnal T m (mean ± standard deviation) at 2 km height and four reference times during (a) December-January-February (DJF) and (b) June-July-August (JJA) in 2013, (c) the range of daily T m (max − min), and (d) power spectrum density (PSD) of T m residuals.

Figure 3 .
Figure 3. Global annual mean T m lapse rate in the height interval 0-2 km from NCEP2 in 2013.

3. 2
The procedure to determine T m using GWMT-D Assuming T m is a function of the target location (ϕ, λ, h), day of year (DOY), and UTC hour (HOD), the key steps of determining T m in GWMT-D can be described as follows:1.Two nearest reference height levels close to h, (see Fig.4in grey) and the other four vertical surfaces containing the eight voxels closest to (ϕ, λ) are determined.Then T m values for the reference times on the eight voxels are given by

Figure 4 .
Figure 4. Spatial interpolation for the target point located at (ϕ, λ, h) using the T m values at the eight closest voxels determined by the GWMT-D model.The first interpolation is for each of the four vertical edges of the box, and the second interpolation is on the 2-D plane at the height of the target point (the dashed rectangular).

Figure 5 .
Figure 5.The global RMSE distribution of the differences between the T m derived from each model and the NCEP2 data on pressure level of 925 hPa in 2014.

Figure 6 .
Figure 6.The global RMSE distribution of the differences between the T m derived from each model and the NCEP2 data on pressure level of 600 hPa in 2014.

Table 4 .Figure 7 .
Figure 7.The global RMSE distribution of the differences between surface T m derived from each model and GGOS data in 2014.
Figure 8. RMSE of model-derived surface T m in 2014 at 585 global radiosonde sites.

Figure 9 .
Figure 9. Histogram of model-derived T m minus radiosonde-derived T m in 2014 at different heights.

Figure 10 .
Figure 10.RMSE profile of the T m from GTm-III, GTm_N, and GWMT-D models.The reference T m values are derived from records of 585 radiosonde sites in 2014.

Figure 12 Figure 11 .
Figure 12 illustrates the global distribution of both RMSE PWV and RMSE PWV /PWV obtained from Eq. (11) and radiosonde data in 2014.The value of RMSE T m is obtained from Sect.4.3 and PWV and T m are set to annual mean values.Some radiosonde stations have been removed with insufficient observations or near the polar areas.As we can see the global mean values of RMSE PWV and RMSE PWV /PWV are around 0.25 mm and 1.3 %, respectively.

Figure 12 .
Figure 12.The theoretical RMSE (a) and relative error (b) of PWV resulting from the GWMT-D model using radiosonde observations in 2014.
established the GTm_N model with a global grid of 2.5 • × 2.5 • NCEP reanalysis data neglecting the diurnal variation in T m .The GTm_N model linearises the Eq.(B4) as(Chen et al., 2014):

Table 1 .
A list of the latest global empirical T m models a .

Table 2 .
The global mean RMSE of various GWMT-D models built with different lengths of time periods(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)of NCEP2 data at five pressure levels (in K).The reference T m are derived from NCEP2 data in 2014.The values inside square brackets are the minimum and maximum, and the fourth row (in bold) shows the best fitting results.

Table 3 .
The globally mean biases and RMSEs of the differences between the T m (in K) derived from four empirical models and 2014 NCEP2 data on pressure levels of 925 and 600 hPa.Values within square brackets are the minimum and maximum, and the % column is the percentage of those global grids with a value ≤ 5 K.

Table 5 .
Statistics of the differences between model-derived and radiosonde-derived T m in various height intervals for the year 2014 at 585 global radiosonde sites (in K).The % column in this table is the percentage of radiosonde records within the height interval in that of all height intervals.