An intercalibrated dataset of total column water vapour and wet tropospheric correction based on MWR on board ERS-1 , ERS-2 , and Envisat

The microwave radiometers (MWRs) on board the European Remote Sensing Satellites 1 and 2 (ERS-1 and ERS-2) and Envisat provide a continuous time series of brightness temperature observations between 1991 and 2012. Here we report on a new total column water vapour (TCWV) and wet tropospheric correction (WTC) dataset that builds on this time series. We use a one-dimensional variational approach to derive TCWV from MWR observations and ERAInterim background information. A particular focus of this study lies on the intercalibration of the three different instruments, which is performed using constraints on liquid water path (LWP) and TCWV. Comparing our MWR-derived time series of TCWV against TCWV derived from Global Navigation Satellite System (GNSS) we find that the MWR-derived TCWV time series is stable over time. However, observations potentially affected by precipitation show a degraded performance compared to precipitation-free observations in terms of the accuracy of retrieved TCWV. An analysis of WTC shows further that the retrieved WTC is superior to purely ERA-Interim-derived WTC for all satellites and for the entire time series. Even compared to the European Space Agency’s (ESA) operational WTC retrievals, which incorporate in addition to MWR additional observational data, the here-described dataset shows improvements in particular for the mid-latitudes and for the two earlier satellites, ERS-1 and ERS-2. The dataset is publicly available under doi:10.5676/DWD_EMIR/V001 (Bennartz et al., 2016).


Introduction
ESA's altimetry missions are at the heart of significant progress on oceanography.The combined coverage of highquality observations by the European Remote Sensing Satellites 1 and 2 (ERS-1 and ERS-2) and Envisat spans over more than 20 years from 1991 to 2012.During this period, improvements in instrument data processing as well as orbit and geophysical corrections allowed reaching an accuracy/sensitivity of 1 cm on instantaneous sea surface height (SSH) measurements and demonstrated the capability to observe a 3 mm yr −1 sea level rise (Ablain et al., 2009).
A major source of uncertainty for radar altimetry is the wet tropospheric correction (WTC).The spatial and temporal variability of water vapour is such that an instantaneous estimation of its impact is needed.To provide the observations required for the WTC is the primary role of the nadirlooking microwave radiometer (MWR) embedded into the altimetry missions on board ERS-1, ERS-2, and Envisat.In this context, requirements on accuracy, sensitivity, and longterm stability of the atmospheric water vapour observations are particularly strong since altimetry missions require a precision better than 1 cm in WTC (Eymard et al., 2005) and a temporal stability better than 1 mm yr −1 (Ablain et al., 2009).Note that a total column water vapour (TCWV) contribution of 1 kg m −2 is equivalent to a WTC of about 6.4 mm.
Water vapour also is a highly important climate variable in its own right.The atmospheric water vapour feedback is believed to be the strongest feedback mechanism in climate change, approximately doubling the direct warming impact of increased CO 2 forcing (Cess et al., 1990;Forster et al., 2007).Various groups have reported trends in the amount of columnar water vapour.In particular, over the oceans, a strong trend in TCWV has been observed (Trenberth et al., 2005).TCWV also appears to be a key factor regulating tropical precipitation (Bretherton et al., 2004).
The importance of water vapour in the climate system is recognized by the Global Energy and Water Cycle Experiment (GEWEX), which is currently performing an assessment of long-term water vapour products (the GEWEX Water Vapour Assessment, G-VAP).Of particular importance for the current study is the recent publication by Schroeder et al. (2016), who provide an overview on existing TCWV datasets and assess their long-term stability as well as issues caused, for example, by changes in the observation systems.
The MWR instruments on board ERS-1, ERS-2, and Envisat are based on very similar architectures and have measured water vapour over the ocean between 1991 and 2012 and therefore provide a dataset of 20+ years of water vapour observations.Data continuity into the future is ensured with ESA's Sentinel-3 series of satellites, the first of which was launched in February 2016.MWR's two channels located at 23.8 and 36.5 GHz allow for the simultaneous retrieval of TCWV and cloud liquid water path (LWP) as outlined later in this publication.Through the REAPER (REprocessing of Altimeter Products for ERS) project (Gilbert, 2014) 1 , ESA has provided significant efforts to produce an up-to-date data record of well-calibrated MWR observations.
The efforts described herein build on the REAPER dataset and address three interconnected issues.Firstly, the homogenization and intercalibration of the MWR data record is studied and an improved intercalibration is developed.Secondly, using state-of-the-art one-dimensional variational (1D-VAR) retrievals, a TCWV data product is developed and made 1 https://earth.esa.int/web/sppa/activities/multi-sensors-timeseries/reaper/ available to the community.A revised WTC product accompanies this dataset also.Thirdly, the dataset is validated both against Global Navigation Satellite System (GNSS) observations of TCWV as well as in terms of its mesoscale stability with respect to WTC.The so-derived dataset is made available to the community.This publication is organized as follows: in Sect. 2 we describe the MWR brightness temperature time series as well as the methods used for retrieving TCWV and WTC.Section 3 addresses the central issue of intercalibration and Sect. 4 summarizes the validation results.The sensitivity of the retrieval with respect to the background (a priori) temperature and water vapour profiles is discussed in the Appendix.

The MWR dataset
The MWR flown on board the ERS-1, ERS-2 and Envisat missions are two-channel nadir-pointing passive microwave instruments measuring top-of-atmosphere (TOA) brightness temperatures (see Table 1 for more details).A similar instrument is also flown on ESA's current Sentinel-3 mission.The MWR is nadir-pointing only as its main use to support wet tropospheric correction for the collocated radar altimeter.ERS-1, ERS-2, and Envisat were flown in 98.4-98.5 • inclination orbits at altitudes around 780 km, providing nearglobal MWR coverage with the exception of a narrow region near the poles.
The Envisat MWR brightness temperatures underlying the have been generated by Collecte Localisation Satellite (CLS) in 2014 in the framework of the Envisat MWR L1B Expert Support Laboratory (ESL) activities funded by ESA.It consists of a corrected dataset that removes the anomaly that has been observed in version 2.1 (Ollivier and Guibbaud, 2014).The ERS-1 and ERS-2 MWR brightness temperatures used herein are based on the REAPER project (Gilbert, 2014) but have been entirely reprocessed in the framework of the EMiR (ERS/Envisat MWR Recalibration and Water Vapour Thematic Data Record Generation) project.The so-called "first run" REAPER L1B data have been the basis for this reprocessing.Land measurements are discarded and no specific processing is applied in coastal areas so that contamination from land may occur above coastal waters at distances of less than ca.50 km from land.Such potentially land-contaminated pixels are excluded from the analysis presented here.

GNSS dataset
MWR estimates of TCWV are evaluated against a 2-hourly dataset of TCWV measured by ground-based GPS (Wang et al., 2007), composed of the International GNSS Service (IGS), SuomiNet, and GPS Earth Observation Network (GEONET), and hosted by NCEP/NCAR.While the focus of SuomiNet (see e.g.Ware et al., 2000) is the US and Central America and GEONET (see e.g.Shoji, 2009) is based on Japanese stations, IGS (see e.g.Byun and Bar-Sever, 2009) includes about 500 stations distributed globally 2 .The methodology of how TCWV is derived from the measured GPS zenith path delay is described in detail in Wang et al. (2007).They also found the individual errors to be less than 1 mm and total errors to be less than 1.44 mm.The GNSS database version 721.1 includes data from 1995 to 2014.In total, 997 stations are specified.However, not all of them cover the full period.Every station is characterized by latitude, longitude and altitude and provides 2-hourly data that contain day and time (UTC), surface pressure (hPa), atmospheric weighted-mean temperature (K), TCWV, and information about zenith delay.

TCWV and LWP
Combined TCWV and LWP retrievals were performed here are based on a 1D-VAR scheme initially developed at ECMWF by Phalippou (1996).This scheme originally focused on microwave observations from SSMIS and AMSU.It was extended by Deblonde and English (2003) towards a stand-alone scheme applicable to SSM/I, SSMIS, and AMSU.This scheme was also used in the ESA DUE Glob-Vapour project (http://www.globvapour.info).Here, it has been modified to derive TCWV from brightness temperatures specifically from the MWR sensor family on board ERS-1/2 and Envisat over the ice-free ocean.All other components of the retrieval scheme are identical to the Glob-Vapour scheme.The underlying radiative transfer model is the TIROS Operational Vertical Sounder radiative transfer model (RTTOV, version 7) (Saunders et al., 2007;Hocking et al., 2011), which is widely used both in the operational and research community.All physical parameterizations including absorption by gases and liquid water and rough ocean surface scattering are used unchanged from RTTOV.The op-2 www.igs.org/networktimal estimation scheme follows optimal estimation theory considering the uncertainties in the required meteorological background information, forward modelling (radiative transfer simulations), and satellite observations.The 1D-VAR scheme employed here uses daily global ERA-Interim (Dee et al., 2011) TCWV and atmospheric temperature and cloud water content profiles and various surface fields as a priori (background) and first guess information.Information on the choices of various input parameters is found in the next subsections.
For the particular case of the MWR, the observation vector consists of the two brightness temperatures at 23.8 and 36.5 GHz, from hereon referred to as 23 and 36 GHz.The 23 GHz channel is close to the 22.231 GHz water vapour line and the 36 GHz channel only experiences weak continuum absorption by gases but stronger absorption by cloud liquid water.The 1D-VAR uses a 74-element state vector including 43 temperature levels, 26 moisture levels, surface air temperature, surface specific humidity, sea surface temperature, wind speed, and cloud liquid water path.While the interdependencies especially of the vertical levels are constrained by the background error covariance, the retrieval obviously remains heavily under-constrained, so that ultimately only TCWV and LWP are constrained by the two observations at 23 and 36 GHz.
Given just these two observations, care must be taken not to overfit the solution by optimizing parameters only weakly related to the two observables (say, the atmospheric temperature profile).While the Jacobian matrix for such lessconstrained parameters is in any case small, a cleaner way of eliminating this issue is to set the Jacobian to zero for all parameters that are not desired to be retrieved.This was done for this study for the atmospheric temperature profile and all surface parameters.Therefore only the water vapour profile and cloud liquid water are actively adjusted during the 1D-VAR process, whereas all other parameters were treated as fixed background parameters, set to the values prescribed by ERA-Interim.
The cost function J (x) is defined as where the first part on the right-hand side determines the cost of the solution with respect to the background and the second part determines the cost with respect to the observations.For any given retrieval the expectation value of J (x) is equal to the number of observations.In the above formulation y is the observation vector, x the state vector, x b the background state, H (x) the forward model, S b the background error covariance matrix, and S o the observation error covariance matrix.For further details on optimal estimation, see Rodgers (2000).-2 Example of histograms of retrieved LWP and fitted Gaussian for different bias correction values at 36 GHz.The example shows MWR on Envisat, January 2005.For the example shown here, the bias correction at 23 GHz is set to −3 K.

WTC
The derivation of the WTC follows the analytical procedure laid out in Appendix A. The value of the wet tropospheric delay depends on the MWR-derived TCWV and on the value Tm (Eq.15), which in turn depends on the ERA-Interim temperature profile and the MWR-derived TCWV.

Generation of gridded data
The individual retrievals discussed above were used to calculate global fields of monthly averages of total column water vapour, liquid water path, and brightness temperatures at 23 and 36 GHz on a 2 × 2 • as well as a 3 × 3 • latitude-longitude grid.
Before calculating the monthly means, some filters were applied.A retrieved data pixel was used if a valid total column water vapour retrieval was available, meeting two additional conditions: (1) liquid water path larger than −1 kg m −2 and (2) a cost function value lower than five.The last condition effectively removes heavily precipitation-contaminated pixels as well as observations with remaining sea ice or land contribution.Data of at least 20 days were required within a grid cell for monthly-mean values to be reported.
In a first step, the grid was set up according to the considered spatial resolution (2 For each MWR footprint, the corresponding indices of the global fields were calculated from the latitude-longitude information.If all conditions were met (see above), the retrieved and auxiliary values were added to the corresponding grid box.The daily averages were then calculated as the arithmetic mean of all observations within that grid box within 1 day.If, for a given month and grid box, more than 20 days had valid observations, the arithmetic mean of those was assigned to be the monthly-mean value.The so-derived monthly-mean fields, along with the individual retrievals, are part of the published dataset.All analysis reported within this publication was, however, performed on individual retrievals.

Intercalibration and bias correction
Retrieval of geophysical variables using physical models and optimal estimation procedures requires the elements of the observation vector to be on average unbiased compared to the forward model applied to the true state of the atmosphere.Comparing first guess simulations with observations, biases include contributions from the following error sources: representativeness of the state vector in the first guess (e.g.representation of clouds in the model); spatial and temporal colocation errors between first guess and MWR observations; calibration biases/errors of the different MWRs; systematic errors and uncertainties in the surface emissivity model; systematic errors and uncertainties in spectroscopy of liquid water absorption, dry air absorption, and water vapour absorption; impact of precipitation contamination and precipitation-ice scattering not accounted for in forward model, While the first two items on this list have a significant impact on the values reported here, they only play a secondary role for the retrieval accuracy.As pointed out in Appendix B, the retrieval is sufficiently independent from the first guess to allow for valid retrievals even if the first guess and prior are relatively far away from the true state of the atmosphere.Also shown are the temporal mean values for each channel and instrument (straight lines in slightly lighter colours than the corresponding monthly values) and a regression line (dashed lines).For ERS-2 two separate fits were performed.One corresponding to the period before the 23 GHz gain drop and another one for the period after the gain drop.The blue arrow highlights the time the day drop happened (26 June 1996).A negative bias correction value means that the observations are warmer than the simulations and thus need to be corrected downwards.Furthermore, the relaxed background error covariance criteria formulated also in Appendix B allow for the retrieval to converge to values corresponding to the observed brightness temperatures.
The latter four items in the above list, while having smaller contributions to the overall bias, are of crucial importance to the accuracy and long-term stability of a climate data record.These are addressed in an empirical bias correction scheme as outlined below.

Method
Because of the low sensitivity of the retrieval to the first guess as well as to the background state (see Appendix B), the bias correction proposed here relies on two main assumptions.
First, the globally averaged TCWV from ERA-Interim is considered reasonably accurate in terms of its absolute value to provide a reference against which to gauge the average brightness temperature biases of the MWR time series.Note that we do not make any claims about the long-term stability of the ERA-Interim TCWV or any trends and discontinuities of the dataset.The only assumption we make is that ERA-Interim TCWV on a globally and monthly averaged basis is accurate to within ±2 kg m −2 .This assumption is justified from intercomparison efforts such as Schröder et al. (2013).
The second assumption is that histograms of instantaneous retrieved LWP must show a significant fraction of negative retrieved LWP, which corresponds to measurement noise around zero LWP for cloud-free situations.For typical biasfree optimal estimation retrievals LWP for cloud-free cases is centred around zero g m −2 with a standard deviation of about 30-40 g m −2 (see e.g.Greenwald, 2009;Bennartz et al., 2010).
These two constraints can be used to find an optimal bias correction for both channels of each instrument in the following way: for each instrument and month a certain amount of observations out of all observations available were randomly sub-selected.We chose 4 % of the total number of observations, which presented a compromise between statistical significance and computational efficiency.For these 4 % we ran a series of retrievals with different biases at 23 and 36 GHz.We ran retrievals for bias values running from −8 to 0 K in steps of 1 K for both channels, so that in total 9 × 9 = 81 retrievals were performed on the set of 4 % of each data per month.
We then found the best choice of biases for 23 and 36 GHz, i.e. the combination of biases for which the difference between background TCWV (ERA-Interim) and retrieved TCWV is smallest and the LWP shows the Gaussian behaviour around zero.This will be the optimal bias value for this particular month.
Step 3 was repeated for all months and satellites to find monthly optimal average bias values.
Figures 1 and 2 highlight some key methodological issues related to the method.In particular, Fig. 1 shows how the histogram of LWP shifts as the bias at 36 GHz varies.It also shows the histograms fitted to the retrieved LWP histograms.This fit was performed on the part of the histogram left of its peak, assuming that all values left of the peak correspond to cloud-free scenes.The super-Gaussian distribution to the right of the peak corresponds to actual clouds.In the particular case shown in Fig. 1 the best bias value for 36 GHz would be very close to −6 K, thereby centring the histogram on zero as outlined above.
Figure 2 shows isoplots of TCWV and LWP histogram biases for a full set of monthly retrievals and all combinations of 23 and 36 GHz biases.The optimal set of biases can now be inferred from this histogram as the intersect between the zero TCWV bias isoline (thick, solid) and the zero LWP histogram bias line (thick, dashed) and is located near (−4, −7 K).The same analysis was performed for all months and instruments.The results are shown in Fig. 3 and discussed in the next section.
From Fig. 2 it is also noteworthy that the sensitivity of TCWV to biases in 23 GHz brightness temperatures is roughly 1 kg m −2 per 1 K bias.The sensitivity of LWP to biases at 36 GHz is roughly 25 g m −2 per 1 K bias.Note that both variables also exhibit sensitivity to the other frequency although the sensitivity is somewhat smaller as expected.

Comparison against ERA-Interim
Comparison against operational Envisat WTC  The sensitivities of brightness temperatures to changes in LWP and TCWV together with the Gaussian plots shown in Fig. 1 also allow for a rough estimation of the total uncertainty of the brightness temperatures at 23 and 36 GHz for cloud-free cases.The width of the Gaussian distributions seen in Fig. 1 is 41 g m −2 .Using the sensitivities of LWP with respect to 23 GHz (∼ 14 g m −2 per 1 K) and 36 GHz (∼ 25 g m −2 per 1 K) and Gaussian error propagation, an observation error of 1.01 K in both channels would explain the resulting width of the cloud-free Gaussian histograms seen in Fig. 1.As stated in Appendix B, in our retrieval we use a diagonal observation error covariance matrix of 1 K for both channels, which is consistent with this estimate.

Bias analysis
Figure 3 shows the outcome of the above-described bias analysis for all instruments and channels.Mean bias values as well as slopes are also listed in Table 1.A negative bias correction value means that the observations are warmer than the simulations, and thus the observations need to be corrected downwards.We found the following: -Biases correction values at 23 GHz are between about −4 K for ERS-1 and −2 K for ERS-2 with Envisat being in between these two values.
-ERS-2 23 GHz brightness temperatures show a significant decrease in bias exactly at the time of the instru-   ment's gain drop (indicated by the blue arrow in Fig. 3) and a downward trend in bias (slope) for the period afterwards.This strong drop prompted us to separate ERS-2 in a pre-and post-gain time period for which performed a separate analysis (listed in Table 1).
-Envisat shows a slight upward slope possibly over the first half of its lifetime or over its entire lifetime.
-Brightness temperature biases at 36 GHz are comparably stable over time for both ERS-1 and ERS-2; i.e. the regression slopes are very small.
-Envisat shows a strong annual cycle in bias at 36 GHz, which diminishes somewhat after 2008.It also shows a similar trend as it does for 23 GHz.There was no explanation for this behaviour at the time of writing.
The bias values given here are based on optimal comparison between retrieved versus background TCWV as well as constraints made on the histogram of retrieved LWP.While these constraints are physically reasonable and justifiable on average, it is not advisable to perform monthly bias corrections based on the individual values derived for that particular month.This would by example of TCWV likely result in an overfitting to the ERA-Interim time series.
However, different biases between two instruments observed in their overlap periods can be clearly attributed to the instrument calibration, as the background is identical.Similarly the mean bias values reported in Table 1 provide for a reasonable way of addressing the principal error sources outlined above.A first-order bias correction is therefore performed by subtracting the bias values in Table 1 from the observations.
We note here that biases on the order of −2 to −5 K (simulations too warm compared to observations) are also reported by ECMWF for monitoring of AMSU-A against their operational forecasting system3 .It is therefore likely that ERS-2 is calibrated to within the absolute calibration accuracy of 3 K stated for the instrument.In contrast, ERS-1 and Envisat both show much larger biases, which might be indicative of remaining calibration issues.
As pointed out above, no assumptions about the long-term stability of ERA-Interim should be made in this analysis.The regressions listed in Table 1 therefore cannot conclusively be interpreted as being caused either by natural variability in TCWV or by instrument drifts.
While a conclusive statement of the origins of the regression slopes and related trends cannot be made, it is interesting to relate the slopes back to the aforementioned retrieval sensitivities.Ignoring the relatively short ERS-2 period before the gain drop, the regression slopes found in Table 1 show values between −0.12 and +0.1 K yr −1 .Factoring in the sensitivities of the retrieval the retrieved TCWV, trends observed in global TCWV between the constant bias correction and the regression are expected to be different by roughly −1.2 to +1.0 kg m −2 decade −1 .These numbers provide bounds on how large observed TCWV trends have to be in order to be considered real, if the current dataset were being used.

TCWV
MWR-retrieved TCWV was validated against TCWV derived from coastal GNSS stations.Validation was performed in terms of biases and root-mean-square errors (RMSE) as well as in terms of long-term stability of the dataset.A total of 30 712 valid collocations were found.The temporal collocation difference was chosen to be at maximum 1 h.Note that our dataset excludes observations close to land, so that the distance between a coastal GNSS station and the nearest MWR observation is somewhere around 100 km.The maximum collocation distance was chosen to be 150 km. Figure 4 shows the coastal GNSS stations used.Figure 5 and Table 1 summarize the validation results.For the entire dataset, we find a relatively small bias of 0.63 kg m −2 and an RMSE of 4.68 kg m −2 .If the comparisons were restricted to exclude MWR retrievals with high LWP (> 200 g m −2 ) and GNSS stations with altitudes higher than 50 m above mean sea level (MSL) were also excluded, the RMSE was reduced to 3.95 kg m −2 .LWP values larger than about 200 g m −2 likely represent precipitating clouds (e.g.Wentz and Spencer, 1998).We therefore reason that remaining issues with precipitation contamination might deteriorate the TCWV retrievals.
The lower panels in Fig. 5 provide time series of average relative differences between MWR and GNSS.These were calculated by first computing the relative difference for each individual observation and then averaging these individual relative differences.Note that for the same absolute difference relative differences are larger if the absolute value is smaller; e.g. for an absolute value of 20 kg m −2 , an absolute difference of 2 kg m −2 would correspond to a 10 % relative difference, whereas for an absolute difference value of 60 kg m −2 that same absolute difference would only correspond to a 3.3 % relative error.Therefore relative biases can be different from absolute biases as can be observed in Fig. 5.We provide the stability here in relative terms because the World Meteorological Organization's (WMO) Global Climate Observing System (GCOS) requirements for temporal stability of TCWV are formulated in relative term as well (GCOS-107, 2006).Comparing time series of MWR-GNSS, the expected stability (i.e. the slope of regression line) should be zero.The observed long-term stabilities are +0.68 % decade −1 and −0.38 % for the entire dataset and for

Envisat
. Zonally averaged patterns of improvement (green) and deterioration (red) of our retrieval over the operational ESA retrieval.the reduced LWP < 200 dataset respectively.Both stability values are not statistically significantly different from zero (P values for two-sided t test are 0.148 and 0.572 respectively), so that at least for the comparison against GNSS the dataset can be considered stable over time.
Several other tests including separation of daytime from night-time observations did not yield any additional insights into the uncertainties.2.
Table 1 further compares the validation against GNSS for between the MWR retrievals and the ERA-Interim profiles used as a background.For the less stringent screening including all GNSS-matches, the RMSE is reduced only slightly from 4.87 to 4.68 kg m −2 .For the more stringent screening (LWP < 200, H < 50), the RMSE for the MWR-retrieved TCWV is reduced to 3.95 kg m −2 , compared to 4.53 kg m −2 when directly compared to ERA-Interim.The retrieval biases show a less clear improvement.While the RMSE shows improvements, the assessment of the bias is inconclusive, with an improvement in bias on average for all cases but degradation in bias, if the more stringent screening criteria are applied.The long-term stability of the dataset of the differences also shows somewhat inconclusive results.It has to be noted though that none of the stability estimates are statistically significantly different from zero.

Wet tropospheric delay
To assess the accuracy of WTC, we employ the "cross-over approach" outlined by Legeais et al. (2014).The idea of this approach is to find cross-over points over the same region within a relatively short time period (10 days or below).The SSH is assumed to be on average constant within this relatively short time interval.Thus, sequential SSH observations should ideally give the same answer within the expected uncertainties.Based on this analysis method, the accuracy of different WTC retrievals can be compared.In particular, the WTC retrieval showing the smallest variability in retrieved SSH will be most accurate.
Here, we compare our WTC retrieval with two independent retrievals.Following Legeais et al. (2014), we compare our WTC retrieval firstly to WTC calculated only on the basis of ERA-Interim observations.Secondly, we compare to ESA's operational retrieval, which employs a neural network for WTC retrieval and, importantly, uses the MWR brightness temperatures together with altimeter-derived information about the state of the sea surface (the altimeter backscatter coefficient).This additional information allows for a better characterization of surface emissivity and is not used in our retrieval.
Figure 6 summarizes the validation results for Envisat.Comparisons against ERA-Interim WTC show overall improvements over the entire data record consistent with similar results shown in Legeais et al. (2014) for other sensors.The lower left panel of Fig. 6 shows improvements nearly everywhere over the oceans with the exception of the Gulf Stream off the coast of North America, the Falkland current, and the confluence areas of the Agulhas and Benguela currents.All of these areas show exceptionally high variability in SSH and might therefore pose particular challenges to the validation method used here.
Comparing our retrievals with the operational Envisat retrievals (Fig. 6, upper and lower right panels), our algorithm performs slightly inferior than the operational algorithm in most areas and only shows some improvements in the Southern Ocean near the ice edge.Between the three satellites studied here, Envisat shows the strongest deterioration of results compared to the operational retrieval.In fact, ERS-1 and ERS-2 show consistent improvements over the operational retrieval in the northern and southern mid-latitudes (see Fig. 7).We note that in contrast to ESA's operational retrieval our algorithm does not use any altimeter information to constrain sea surface emissivity and solely relies on the MWR observations and ERA-Interim as background information.We therefore expect our algorithm to perform slightly worse than ESA's operational algorithm.In this context it is surprising that our results in terms of WTC are improved for significant areas of the globe at least for ERS-1 and ERS-2.

Conclusions
We have established a new long-term (1992-2012) dataset of TCWV and WTC over the global oceans, which is readily available for users.The main difference between our dataset and earlier datasets, such as ESA's operational dataset, is that we explicitly intercalibrate the MWRs on the three different satellite platforms.Such an intercalibration is a key requirement for the generation of satellite-based climatologies.We have introduced a novel intercalibration method that allows for the adjustment of biases between the different satellites based on environmental conditions over ocean.Similar to operational data assimilation systems, we estimate calibration offsets at 23 and 36 GHz by comparing simulated and observed cloud-free brightness temperatures over the lifetime of each satellite.We only adjust for long-term, global calibration trends and offset but do not perform any regional, monthly, or seasonal bias corrections.
A potential disadvantage of our intercalibration method is that it hinges the long-term mean retrieved TCWV to the TCWV fields that were used in the simulations used in the intercalibration process (in our case ERA-Interim).While this does limit the use of our dataset to assess the global longterm mean TCWV as well as linear trends, the dataset can be used for anomaly analyses and to provide WTCs for altimeter correction.We note that limitations on absolute accuracy or linear trends are in no way unique to our dataset.For example, Schroeder et al. (2016) investigate trends and discontinuities in six different global TCWV climatologies and find significant artefacts that inhibit trend analyses in nearly all datasets.
Independent validation of our dataset was performed against GNSS radiosondes.This validation shows the TCWV dataset to be stable over time and generally yielding an improvement in RMSE over the background TCWV used in the retrievals.The dataset has also been submitted to the ongoing G-VAP (Schroeder et al., 2016) for evaluation in the con-text of several other TCWV datasets.A crossover analysis of altimeter corrections using the here-derived WTC provided further insights into the accuracy of the dataset.We found that for ERS-1 and ERS-2 the results of our WTC retrievals are superior to ESA's operational WTC retrieval for the midlatitudes and inferior for the tropics.For Envisat, ESA's operational WTC retrieval is generally slightly better than our retrieval.ESA's retrieval does use additional collocated altimeter backscatter information to constrain sea surface emissivity.In contrast, our algorithm currently relies only on ERA-Interim surface wind speed, so that a slight deterioration of our results compared to ESA's should be expected.A 1D-VAR retrieval of surface wind speed, water vapour, and cloud liquid water based on combined radar altimeter and MWR data could also be envisioned.This will likely have a significant positive impact on retrievals from both instruments.It would require extending the surface emissivity model as well as the 1D-VAR retrieval to include altimeter backscatter, which would also have to be intercalibrated.
A number of further limitations exist: in the current implementation, ERA-Interim fields are only used once a day at 12:00 UTC as background profiles.Rapid changes of atmospheric conditions and surface properties are not accounted for.We have studied the impact of the background state on retrieval quality (see Appendix B) and conclude that this limitation has only a marginal impact on data quality, as the algorithm is only weakly dependent on the choice of the background.
Since MWR is nadir-looking only, it does not provide any polarization information.Compared to other microwave sensors, its spectral range is also limited the two frequencies at 23 and 36 GHz.Therefore, screening observations affected by frozen hydrometeor scattering will not be possible.Thus, in cases of moderate to heavy frozen hydrometeor load, such as in deep convective cores, retrieval results will likely be degraded.Here we employ a screening based on the final value of the cost function, which has proven efficient in eliminating outliers.However, validation results show that the comparisons against GNSS are deteriorated for larger LWP, which would be indicative of remaining issues with precipitation screening.
An extension of the approach outlined here to include the MWR on board the Sentinel-3 satellites and potentially other altimeter/radiometer combinations appears highly desirable and would allow extending the current time series forward in time.Feedbacks between improved calibration efforts for single instruments and subsequent intercalibration need to be accounted for.Ideally, a processing chain would be set up to allow for re-processing the intercalibration for a single instrument if the underlying calibration for that instrument had improved.
The first term in the brackets in Eq. (A7) can be split as follows: Note that T v /T 1.

A2 Dry delay
Integrating the dry tropospheric part of Eq. (A9) yields The dry delay is on the order of 2.3 m for a straight vertical path through the atmosphere whereas the wet tropospheric delay is only on the order of 0.4 m at maximum.

A3 Wet delay
Integrating the wet tropospheric terms in Eq. (A9) yields The water vapour mass mixing ratio is defined as As can be seen in Fig. 8.As long as the actual TCWV is less than 5-10 kg m −2 away from the chosen background, the retrieval will perform quite well, especially under relaxed background error covariance conditions.This is due to the high information content of the passive microwave observations with respect to both TCWV and LWP.We note that the use of a single global background profile is not necessarily the best choice for a climatological background.A possible compromise could consist of less stringent choices of climatological backgrounds, allowing the background water vapour and temperature profile to vary with geographical position and latitude.
Compared to the FIXED cases both the ERA-NEW and ERA-OLD case show much better results.The trade-off here lies mainly between an increased RMSE (NEW) and an increased number of profiles with large remaining Tb residuals after convergence (OLD).The ERA_NEW case allows the 1D-VAR to find low-cost solutions further away from the background water vapour profile.This will enhance the RMSE because we compare the retrieved TCWV to the background TCWV.ERA_NEW in contrast provides tighter constraints on the background error covariance matrix and thus minimizes the RMSE better but at the cost of having a larger fraction of retrievals not converge as closely toward the observed brightness temperatures.For the particular case shown here 6.24 % of retrievals still show a residual deviation of simulated from observed Tb-s larger than 1 K.
A design choice for the final retrieved time series was therefore the extent to which it adheres to the prescribed background compared to perfectly minimizing the observed brightness temperatures.We note that in an ideal world with perfect knowledge about background and observation error covariance matrix this choice could not be made and the retrieval would provide a perfect a posteriori estimate of the true state of the atmosphere accounting for correct background and observational information.However, as is always the case the actual retrieval will have to be tuned to some degree.In particular, one wants to minimize the risk of artifacts in the background data to affect the final TCWV time series.Such artifacts can include slight discontinuities in the ERA TCWV time series at time steps where new sensors are added to the reanalysis.
With these considerations in mind we have chosen the ERA_NEW 1D-VAR set-up to be used as the basis for the full time series.The modified background error matrix allows for large deviations from the background profile; i.e. it gives stronger weight to the observations.At the same time, it provides good convergence over the entire range of variability of TCWV and allows for a high number of converged profiles, therefore providing very little sensitivity to the choice of the background profile.In particular, the choice of the background profile is uncritical as long as it represents the general conditions for the geographical region and season.

Figure 2 .
Figure 2. The coloured contour plot in (a) shows contours of LWP bias as function of bias correction values for 23 GHz (x axis) and 36 GHz (y axis).The coloured contour plot in (b) shows biases in TCWV.In addition, labelled isolines of both TCWV bias (dashed) and LWP bias (solid) are overlaid in both plots.Data are shown for ERS-1, June 1994.

Figure 3 .
Figure3.Optimal bias correction values for the entire time series shown as coloured lines with filled circles marking each monthly value.Also shown are the temporal mean values for each channel and instrument (straight lines in slightly lighter colours than the corresponding monthly values) and a regression line (dashed lines).For ERS-2 two separate fits were performed.One corresponding to the period before the 23 GHz gain drop and another one for the period after the gain drop.The blue arrow highlights the time the day drop happened (26 June 1996).A negative bias correction value means that the observations are warmer than the simulations and thus need to be corrected downwards.

Figure 4 .
Figure 4. Overview of location and data density of GNSS stations used for TCWV analysis.

Figure 5 .
Figure 5.Comparison of GNSS and MWR-derived TCWV.(a) and (c) show comparisons for all collocations.(b) and (d) show comparisons for a subset of collocations where (1) MWR-retrieved LWP was smaller than 200 g m −2 and (2) the GNSS station height was below 50 m above MSL.(a) and (b) show data density plots.(c) and (d) show time series of monthly-mean differences between MWR and GNSS for the entire dataset.The blue lines give linear fits to the data.

Figure 6 .
Figure 6.Validation statistics for WTC sea surface height variability.Data shown here are for Envisat only.(a) and (c) show comparisons against WTC derived purely from ERA-Interim.(b) and (d) show comparisons against the operational WTC retrieval from Envisat.(a) and (b) show globally averaged monthly statistics.(c) and (d) show long-term spatial statistics.Values smaller than zero indicate an improvement.Values larger than zero indicate a deterioration of the results relative to the comparison dataset.

Figure 8 .
Figure8.Scatterplots of retrievals obtained with the four different 1D-VAR configurations described in Sect.2.3.1.In all four panels the retrieved TCWV is plotted against ERA-Interim TCWV.The green line is the 1-1 line.The red error bars show the mean and standard deviation in bins of 5 kg m −2 .The vertical line in the upper two plots shows the TCWV of the fixed climate background used for these plots (mid-latitude summer atmosphere).Total number of retrievals was 35 584.Reported values are for valid retrievals with cost function lower than 5. Corresponding statistics are listed in Table2.
A11)Replacing e/p accordingly with r into Eq.(A11) yields z w = 10 −6 R H 2 further define a "water-vapour-averaged mean inverse atmospheric temperature", T m , asT m =

Table 1 .
Main characteristics of the microwave radiometer series.
For implementation details regarding S b and S o , see Appendix B.

Table 2 .
Summary of comparisons between ERA-Interim versus GNSS and MWR versus GNSS in terms of bias, RMSE, and stability of the dataset.

Table 3 .
Mean optimal bias correction values and regression slopes for the all instruments and time periods.The values given here correspond to the straight lines and dashed lines in Fig.3.A negative bias correction value means that the observations are warmer than the simulations and thus need to be corrected downwards.The regression bias is calculated using bias_corr(t) = slope * t+ offset, where t is the decimal year since 1990.For example, 2 July 1991 is day 183 in the year 1991 and therefore corresponds to a value of t = 1.5.

Table 4 .
Retrieval statistics for the four different 1D-VAR configurations described in Sect.2.3.1.Corresponding scatterplots are shown in Fig. 8.Total number of retrievals was 35 584.Reported bias and RMSE values are for valid only for the fraction of retrievals with cost function lower than 5.