Optimal estimation of water vapour proﬁles using a combination of Raman lidar and microwave radiometer

. In this work, a two-step algorithm to obtain water vapour proﬁles from a combination of Raman lidar and microwave radiometer is presented. Both instruments were applied during an intensive 2-month measurement campaign (HOPE) close to Jülich, western Germany, during spring 2013. To retrieve reliable water vapour information from inside or above the cloud a two-step algorithm is applied. The ﬁrst step is a Kalman ﬁlter that extends the proﬁles, truncated at cloud base, to the full height range (up to 10 km) by combining previous information and current measurement. Then the complete water vapour proﬁle serves as input to the one-dimensional variational (1D-VAR) method, also known as optimal estimation. A forward model simulates the brightness temperatures which would be observed by the microwave radiometer for the given atmospheric state. The pro-ﬁle is iteratively modiﬁed according to its error bars until the modelled and the actually measured brightness temperatures sufﬁciently agree. The functionality of the retrieval is presented in detail by means of case studies under different conditions. A statistical analysis shows that the availability of Raman lidar data (night) improves the accuracy of the proﬁles even under cloudy conditions. During the day, the absence of lidar data results in larger differences in comparison to reference radiosondes. The data availability of the full-height water vapour lidar proﬁles of 17 % during the 2-month campaign is signiﬁcantly enhanced to 60 % by applying the retrieval. The bias with respect to radiosonde and the retrieved a posteriori uncertainty of the retrieved proﬁles clearly show that the application of the Kalman ﬁlter considerably improves the accuracy and quality of the retrieved mixing ratio proﬁles.


Introduction
In accordance with the latest report of the Intergovernmental Panel on Climate Change (IPCC), water vapour plays a key role in the description of the thermodynamic state of the atmosphere (Hartmann et al., 2013) and is the most important greenhouse gas (Twomey, 1991).Its amount in the atmosphere is controlled mostly by the air temperature, rather than by emissions (Hartmann et al., 2013).Therefore, tropospheric water vapour is considered as a feedback agent more than a forcing to climate change (Soden and Held, 2006).The water vapour amount is highly variable in space and time, since it can considerably increase due to evaporation or decrease due to condensation and precipitation (Stevens and Bony, 2013).Furthermore, the latent heat strongly influences the energy cycle.The typical residence time of water vapour in the atmosphere amounts to 10 days (Myhre et al., 2013).Due to its spatio-temporal variability and its involvement in many atmospheric processes (e.g.cloud formation) it is difficult to properly implement water vapour in climate models (Held and Soden, 2000;Tompkins, 2002).
In the last decades, the resolution of atmospheric circulation models has been improved, more atmospheric processes have been incorporated and the parametrizations of physical processes have been improved (Randall et al., 2007).In order to evaluate and improve model forecasts, parametrization schemes and satellite retrievals, the observations need to be enhanced.Uncertainties in both observations and modelling of water vapour strongly affect the representation of clouds and precipitation in climate models and predictions.For that reason the German research project High Definition Clouds and Precipitation for advancing Climate Prediction (HD(CP) 2 ) was initiated aiming to improve cloud and precip-Published by Copernicus Publications on behalf of the European Geosciences Union.
A. Foth and B. Pospichal: Optimal estimation of water vapour profiles itation representation in models and to quantify the errors associated.One part within the HD(CP) 2 initiative was the intensive observation campaign HD(CP) 2 Observational Prototype Experiment (HOPE) in Jülich (Macke et al., 2016).Data from this campaign will be used in this work which presents a retrieval of water vapour profiles from ground-based remote sensing.During HOPE, different remote sensing instruments to measure water vapour, both active and passive, were deployed.
An active method is given by the Raman lidar technique (Ansmann et al., 1992;Whiteman et al., 1992;Wandinger, 2005).Water vapour mixing ratio has been determined for several decades using this technique (Melfi et al., 1969;Cooney, 1970;Melfi, 1972).With advancing technology Raman lidars enabled high vertical resolution measurements of water vapour and extended their range to the whole troposphere (Ferrare et al., 1995;Sherlock et al., 1999;Di Girolamo et al., 2009;Leblanc et al., 2012), during daytime (Renaut et al., 1980;Ferrare et al., 2006) or automatically (Goldsmith et al., 1998;Turner et al., 2002).However, water vapour Raman lidars should be calibrated using simultaneous and collocated measurements from for example a microwave radiometer (MWR) or radiosonde (RS) (Mattis et al., 2002;Madonna et al., 2011;Foth et al., 2015).Until now, Raman lidars were mostly used as research instruments that did not work unattended or automatically on a routine basis.Another major drawback of Raman lidars is that they do not provide any water vapour information from inside the cloud or above due to the strong signal attenuation, especially in liquid clouds.Hence, these measurements are limited from the surface to the cloud base.Furthermore, daytime measurements are limited in height due to the presence of scattered solar radiation (Turner and Goldsmith, 1999).
Another approach is to use passive remote sensing to sound the thermodynamic state of the atmosphere.Passive microwave radiometry can provide atmospheric water vapour observations with high temporal resolution, but limited vertical information (Solheim et al., 1998;Westwater et al., 2005).However, the integrated water vapour (IWV) can be retrieved very accurately.Microwave radiometers can be operated during all weather conditions except for precipitation (Güldner and Spänkuch, 1999).As with many remote sensing techniques accurate calibrations are crucial for obtaining precise measurements (Maschwitz et al., 2013;Küchler et al., 2016).
By contrast to the already presented remote sensing observations, water vapour profiles can be measured in situ using RS (Miloshevich et al., 2006).Routine RS launches are mostly performed by national weather services usually twice a day at special locations.Therefore, both horizontal and temporal resolution of routine measurements are rather low.However, these profiles can serve as reference for remote sensing observations.
As described above, it is a challenge to provide continuous high-resolution water vapour profiles with a single in-strument.In recent years, the Leipzig Aerosol and Cloud Remote Observations System (LACROS) (Bühl et al., 2013), installed a combination of ground-based remote sensing systems.The synergy of complementary information from both active and passive instruments can provide a more comprehensive understanding of atmospheric processes (Stankov, 1998;Furumoto et al., 2003;Bianco et al., 2005;Delanoë and Hogan, 2008).From a combination of radar reflectivities and liquid water path from MWR, Frisch et al. (1998) successfully derived liquid water content (LWC) profiles.Han et al. (1997) presented a method based on a Kalman filter (Kalman, 1960;Kalman and Bucy, 1961) that incorporates current and past measurements followed by a statistical inversion that combines the lidar with the radiometric and climatological data.The Cloudnet project is comprised of a number of algorithms for the continuous analysis of cloud properties by means of remote sensing with lidar, MWR and cloud radar (Illingworth et al., 2007).The instruments synergy allows for a continuous evaluation of the representation of clouds in climate and weather forecast models (Sengupta et al., 2004;Hogan et al., 2009;Bouniol et al., 2010).Additionally, the data set enables the development and validation of new cloud remote sensing synergy algorithms.Löhnert et al. (2004Löhnert et al. ( , 2008) ) developed the so-called integrated profiling technique (IPT) that integrates a groundbased MWR, a cloud radar and a priori information, e.g. from RS.This approach enables the derivation of temperature, humidity and liquid water content profiles (Ebell et al., 2010) and their associated error estimates.The IPT is based on a variational scheme, also known as optimal estimation (Rodgers, 2000).Cimini et al. (2010) as well as Hewison and Gaffard (2006) used a similar approach as Löhnert et al. (2004) but with background information from a short-range numerical weather prediction model instead of RS climatology.
The synergy of Raman lidar and MWR is beneficial for continuously observing the vertical water vapour distribution.When both Raman lidar and MWR are measuring collocated and simultaneously, continuous water vapour profiles can be obtained operationally (Ferrare et al., 2006;Adam and Venable, 2007;Adam et al., 2010).However, the Raman lidar needs to be calibrated on a routine basis.A calibration method that is based on the IWV from MWR is suited for this issue (Foth et al., 2015).In previous approaches the total precipitable water from MWR in combination with RS has been used to calibrate the water vapour profiles (Turner and Goldsmith, 1999;Turner et al., 2002).Calibration methods only based on RS (England et al., 1992;Mattis et al., 2002;Reichardt et al., 2012) are often inappropriate for continuous monitoring of the tropospheric water vapour with Raman lidar because of their low temporal resolution and the requirement of regular RS launches.
The aim of this study is to present a two-step algorithm that combines a Raman lidar and a MWR by using an optimal estimation approach.The retrieval can be seen as an ex-tension of the IPT by Löhnert et al. (2009).Barrera-Verdejo et al. (2016) also generated a variational retrieval based on these two instruments.At first glance, both approaches seem to be similar, but they are fundamentally different with regard to the optimal estimation method.Barrera-Verdejo et al. (2016) used both (Raman lidar and MWR) as part of the observation vector.Because the water vapour profiles from Raman lidar are strongly disturbed by clouds, they are truncated at the cloud base.In the present work, the truncated Raman lidar profiles are extended to the full height range by using a Kalman filter in a first step.Then the Kalman-filtered profiles serve as input to the optimal estimation.This approach is based on studies of Schneebeli (2009).Additionally, the focus of the presented work is to develop a method that enables routine retrieval of a continuous time series of water vapour profiles and their error estimates during all non-precipitating conditions.

Instrumentation
In the framework of the HD(CP) 2 initiative HOPE was conducted around Jülich in western Germany during April andMay 2013 (Macke et al., 2016).The goal of HOPE was to probe the atmosphere with a specific focus on boundary layer development and the development of clouds and precipitation.Two observatories were set up in addition to JOYCE (Löhnert et al., 2015).The LACROS site (Wandinger et al., 2012;Bühl et al., 2013) was temporarily built up in Krauthausen, which is about 4 km south of JOYCE.Both JOYCE and LACROS observatories are equipped with a set of active and passive remote sensing instruments such as lidars and MWRs which allow the application of the proposed retrieval.Radiosondes were launched at the KIT (Maurer et al., 2016) station in Hambach, which is about 4 km away from JOYCE and LACROS.Furthermore, a 120 m tower provide surface meteorological data as pressure, temperature and humidity.

Raman lidar Polly XT
At LACROS, the lidar measurements were conducted with the fully automatic portable multiwavelength Raman and polarization lidar Polly XT (Althausen et al., 2009) by the Leibniz Institute for Tropospheric Research (TROPOS).Polly XT measures backscattered light at wavelengths of 355, 532 and 1064 nm and Raman scattered light at 387, 407 and 607 nm wavelengths.From that, water vapour profiles can be determined (Whiteman, 2003;Wandinger, 2005).In the lowermost heights the overlap of the laser beam with the receiver field of view of the bistatic system is incomplete.However, the overlap of both Raman channels is assumed to be identical and for that reason the overlap effect should be negligible regarding water vapour measurements.Nevertheless, there are some uncertainties in the lowermost 600 m.There-fore, the signal ratio is set constant to account for the overlap problem.Additionally, the mixing ratio error is artificially increased resulting in less impact of erroneous profiles near the surface to enlarge the influence of both Kalman filter and optimal estimation.During daytime, no water vapour measurements can be performed due to the high daylight background and the weak signal from Raman scattering.The Polly XT raw data (30 m and 30 s) are processed and calibrated to mixing ratio profiles as explained in Foth et al. (2015).The vertical and temporal resolution of the calibrated profiles amounts to 90 m and 5 min to decrease the measurement noise and to retrieve water vapour from higher altitudes.The calibrated water vapour profiles are then used for the proposed retrieval.
An overview of the area of operation and the automated measurement capabilities of Polly systems all over the world is extensively introduced by Baars et al. (2016).

Microwave radiometer HATPRO
The humidity and temperature profiler (HATPRO), built by Radiometer Physics GmbH, Germany, is a passive instrument that measures atmospheric emission at two frequency bands in the microwave spectrum.Seven channels are along the 22.235 GHz H 2 O absorption line.From these observations humidity information can be retrieved.The seven channels of the other band from 51 to 58 GHz along the O 2 absorption complex contain the vertical temperature profile information.The fully automatic microwave radiometer HAT-PRO makes it possible to derive temperature and humidity profiles as well as integrated quantities such as integrated water vapour (IWV) and liquid water path (LWP) with a high temporal resolution up to 1 s (Rose et al., 2005).Their uncertainties are 0.5 kg m −2 for IWV (Steinke et al., 2015) and 22 g m −2 for low LWP values and increase up to 45 g m −2 for LWP values higher than 500 g m −2 , respectively (Ebell et al., 2011).Observations are possible during nearly all weather conditions except precipitation.
Statistical algorithms were used to retrieve temperature profiles, IWV and LWP from the measured brightness temperatures by means of a multi-linear regression between modelled brightness temperatures and atmospheric profiles.That algorithm is based on a long-term data set of De Bilt radiosondes (Löhnert and Crewell, 2003).
Weighting functions are well suited to describe the ability for humidity profiling.Figure 1 shows the weighting functions for the seven HATPRO frequencies along the H 2 O absorption band.Generally, the measured brightness temperatures do not originate from an isolated height level.The weighting functions describe the contribution of a certain height to the observed signal.Ideally, the weighting functions are peaked functions and several frequencies contribute information from different height levels.Three weighting functions (22.24,23.04 and 23.84 GHz) differ considerably from each other.The higher frequencies have a similar shape as the atmosphere is optically thin at these frequencies.For that www.atmos-meas-tech.net/10/3325/2017/Atmos.Meas.Tech., 10, 3325-3344, 2017 reason they add only little information and the vertical distribution of humidity is limited.
The usage of the 31 GHz channel caused unrealistic results.The reason for that behaviour was not identified but might be induced by the forward model or a faulty calibration.
The MWR was also equipped with a standard meteorological weather station measuring temperature, pressure and relative humidity.These values are only used to calculate the pressure profile that is used in the forward model.The surface values needed for the optimal estimation originate in the surface tower measurement which is much more accurate.Arising pressure uncertainties result in negligible deviation in the modelled brightness temperatures.

Radiosondes
During HOPE, radiosondes (RSs) were launched a minimum twice a day (11:00 and 23:00 UTC) and more often during intensive observation periods (IOPs) at the KITCube site in Hambach.The RS (type Graw DFM-09) measures temperature, humidity, pressure and wind velocity (Nash et al., 2011;Wang and Zhang, 2008).Due to the vicinity of the RS station to an open-cast mining with a depth of nearly 400 m, horizontal inhomogeneities between the RS launch station and LACROS are likely (Foth et al., 2015).

Retrieval methodology
The focus of this work is to retrieve a continuous time series of water vapour profiles from a combination of ground-based remote sensing with Raman lidar and MWR in a straightforward way to offer a broad application.Most of this section has already been described and presented in Foth (2017) without explicit citation.The retrieval is a two-step algorithm that combines the Raman lidar mixing ratio profile with the MWR brightness temperatures.The Kalman filter (first step) eliminates measurement disruptions (e.g.clouds) to provide a full-height mixing ratio profile that serves as input to the one-dimensional variational assimilation (optimal estimation method).The retrieval can be applied to raw data (photon  counts) using the calibration method based on Foth et al. (2015) or using already calibrated profiles.
Figure 2 gives a brief overview of the retrieval framework.It starts with the latest analysed state xk−1 , which is advanced to the estimated state x E k , with k being the time index.This state is then combined with the current lidar measurement y k to obtain the filtered state x F k using the Kalman filter.x F k is then used as the a priori input to the one-dimensional variational assimilation.The a priori profile is modified such that the modelled brightness temperature matches those measured with the microwave radiometer (MWR) z k , resulting in the most probable estimated state xk , which is again projected in time in the consecutive step.Inverse methods for atmospheric sounding are well described in Rodgers (2000).For clarity the same notation is used.

Definition of quantities
In this section the state vector and the two measurement vectors are described.The first measurement vector contains the mixing ratio profile from the lidar measurement.It is used in the first retrieval step (Kalman filter).The second measurement vector consists of the brightness temperatures from the MWR measurement and a surface mixing ratio from a standard meteorological station.This vector is used in the optimal estimation.The atmospheric state is described by the state vector which contains the humidity variable q at different height levels from 0 to height n (e.g. 10 km).The vertical resolution originates from the lidar measurements and is equal to 90 m.The humidity variable q is given as the natural logarithm of water vapour mixing ratio.The benefit of using the logarithm is the limited range of variation and the prevention of negative unphysical values resulting in a lower amount of unrealistic states (Phalippou, 1996).
The lidar measurement vector of length m y y = [q 1 , . .., q m y ] T (2) contains the water vapour mixing ratio at each height level from ground up to a possible cloud base.The lidar profiles y and the associated errors y are usually given in mixing ratio.For the reasons mentioned above, both have to be transformed into q values.The transformed errors define the diagonal elements of the lidar measurement covariance matrix S y .The off-diagonal elements are assumed to be zero which means that no correlation exists between the errors at different height levels.
The second measurement vector, called from now on observation vector, is given as with the dimension m z .It contains the brightness temperature T B at a certain frequency ν and the surface mixing ratio q s from a standard meteorological station.In this study only zenith observations and frequencies along the water vapour absorption band are chosen.The combined measurement and forward model covariance matrix S z contains the errors from the MWR observation, from the surface mixing ratio measurement and from the forward model.The errors from the MWR observation are the radiometric noise.Its variance is set to 0.25 K 2 at each frequency.The off-diagonal elements are set to 0.01 K 2 , meaning small covariances between the frequencies (Barrera-Verdejo et al., 2016).The determination of the forward model error is described in Sect.3.3.Forward model uncertainties that occur due to assumptions in the LWC profiles are illustrated in Sect.3.4.The measurement uncertainty of the surface mixing ratio is roughly assumed to be 0.1 g kg −1 .However, the uncertainty is increased due to the distance between the measurement site and the surface humidity sensor (see Sect. 2) and is assumed to be 0.3 g kg −1 .
First-guess profiles and errors are created for the HOPE campaign.Usually they are formed by a certain number of RS.Therefore the covariance matrix is sometimes called RS climatology.For the HOPE campaign 211 RS that were launched during April and May 2013, were used to calculate a mean profile that serves as a first-guess profile and is used after a long measurement disruption.Additionally, the correlation and covariance matrices are determined (Fig. 3).Here, the humidity variable is interpolated to the state grid space (lidar height grid) and is transformed to the natural logarithm before calculating the matrices.Both clearly illustrate the correlations between water vapour at different heights in the atmosphere.Naturally, the correlation is close to 1 near the main diagonal and is smaller for off-diagonal terms.Due to well-mixed conditions the correlation in the lowest 1.5 km is higher.These matrices are similar to those from previous studies (Ebell et al., 2013;Barrera-Verdejo et al., 2016).

Kalman filter
In the presence of clouds, the lidar profile is truncated at the cloud base due to the strong attenuation within the cloud.We use the Kalman filter to expand the truncated lidar profile to the full height range using previous information.The Kalman filter is based on the following two equations: ( The evolution operator (e.g.forward model) H k projects the state into measurement space (Eq.4).Since x k and y k use the same humidity variable, the forward model matrix H k equals the unity matrix with dimension m y × n.Equation ( 5) describes the transition of the state vector at time step k to time step k+1.The transition matrix M k is assumed to be the unity matrix due to the lack of an atmospheric model.The square of the transition error t,k forms the diagonal elements of the covariance matrix S t,k .For the calculation of S t,k the Schneebeli method can be applied (Schneebeli, 2009).Schneebeli generated a time series of synthetic profiles from a combination of consecutive radiosondes and ground values.S t,k is finally calculated from an ensemble of these consecutive profiles.A similar approach is described by Han et al. (1997).After a large number of time steps, it might happen that the correlations between layers get lost which can result in unrealistic profiles.Additionally, the retrieval tends to be unstable with either unphysical solutions or even be non-convergent when using the transition error.Another possibility is to start with the RS climatology covariance (S clim ) as previous covariance matrix ( Ŝk−1 ) at every consecutive time step.Using this approach the addition of the transition covariance matrix (S t,k ) can be skipped.In this application the latter approach is used which is much more stable.Using Eq. ( 5) and the assumptions explained above, the last analysed state xk−1 and its covariance matrix Ŝk−1 are propagated as follows: www.atmos-meas-tech.net/10/3325/2017/Atmos.Meas.Tech., 10, 3325-3344, 2017 where x E k and S E k are the estimated state and its covariance matrix, respectively.These are then combined with the lidar measurement at time step k to obtain the filtered state: with G K k being the Kalman gain matrix: The covariance matrix of the filtered state is determined by Finally, x F k and S F k serve as input to the optimal estimation.The application of this technique for linear filtering and prediction problems was first described by Kalman (1960) and Kalman and Bucy (1961).

Forward model
In the optimal estimation framework microwave brightness temperatures (T B ) at given frequencies (ν) are modelled from the a priori atmospheric profiles and are compared to those that are measured.However, in this work only zenith observations are used.Based on Simmer (1994), F(x) models the non-scattering microwave radiative transfer using gas absorption by Rosenkranz and liquid water absorption by Liebe (Rosenkranz, 1998;Liebe et al., 1993) for each height level of the retrieval grid (90 m).The Rosenkranz gas absorption model is corrected for the water vapour continuum absorption according to Turner et al. (2009).The humidity information (q) of the a priori profile originates from the Kalman-filtered state, whereas the temperature profiles (T ) are provided by statistical retrievals from MWR observations (Sect.2.2).The pressure profiles (p) are calculated by surface pressure observations from MWR and the barometric formula.Because the retrieval grid is limited to 10 km, the thermodynamic state between 10 and 30 km is taken from a RS climatology above Essen, which is in the vicinity of the HOPE area.The restriction to the troposphere up to 10 km would lead to errors of around 1 K in the calculation of the brightness temperatures.Assumptions about the liquid water content (LWC) and its determination are described in Sect.3.4.The forward modelling of the surface mixing ratio is trivial.It is a one-to-one translation to the lowest level of the state vector x.In conclusion, F(x) is of the following form: with RTO being the radiative transfer operator.The forward model error is calculated as covariance of the difference between brightness temperatures modelled by two different absorption codes of Rosenkranz and Liebe (Rosenkranz, 1998;Liebe et al., 1991) applied to a longterm data set of radiosondes from Lindenberg, Germany.The diagonal elements of its covariance matrix are shown in Table 1.One has to consider that there are significant offdiagonal terms.This error is part of the combined observation and forward model covariance S z .The uncertainties of the gas absorption models cause biased mixing ratio profiles (see Sect. 5).

Liquid water assumption
Since liquid water strongly affects the absorption in the microwave spectrum, its amount and height have to be known.However, from MWR only the integral value can be derived, and not its vertical distribution.In order to determine LWC profiles, the cloud boundaries have to be determined.The cloud base of a liquid water cloud is identified by the gradient method based on the 1064 nm channel from lidar (Baars et al., 2008) which has been shown to be a more robust method for the automatic detection of the cloud base than the wavelet covariance transform (Brooks, 2003;Baars et al., 2008).However, a threshold value has to be chosen carefully to distinguish between thin liquid water clouds and optically thick aerosol layers below liquid water clouds.Additionally, liquid water clouds are only detected if the LWP is larger than a narrow threshold of 5 g m −2 .
The LWC is calculated from the modified adiabatic assumption (Karstens et al., 1994): where h indicates the height above cloud base in m and h within the range of 1-5140 m.The adiabatic LWC ad is calculated using the temperature and pressure profiles and is corrected for effects of dry air entrainment, freezing drops or precipitation.The LWC is integrated over all layers until the calculated LWP equals the LWP measured with MWR.This height is finally defined as cloud top.However, any profile is treated as single-layer cloud with this method.
Usual approaches to diagnose LWC profiles from radiosonde are based on a threshold method (Wang et al., 1999).Cloud bases or tops are identified when the relative humidity exceeds or falls below 95 %, respectively.Within the cloud the LWC is calculated using the modified adiabatic assumption (Löhnert and Crewell, 2003).The uncertainty that results in the assumption of single-layer clouds is estimated by comparing both mentioned methods.This is done for a long-term data set of radiosondes from Lindenberg, Germany.For these radiosonde profiles, brightness temperatures are modelled at the HATPRO frequencies using both LWC profile assumptions.The brightness temperature difference as a function of LWP is illustrated in Fig. 4a.As can be seen, the means and standard deviations (coloured lines and error bars) increases with increasing LWP.In addition, the difference increases from the 22.24 to 31.4 GHz.Naturally, there is no difference for single-layer clouds indicated by the dots at 0 K.The number of occurrences decreases with increasing LWP (grey bars on the top).However, only clouds with an LWP larger than 0.02 kg m −2 are considered.Figure 4b shows an exemplary covariance matrix for an LWP between 0.45 and 0.5 kg m −2 .These uncertainties contain significant off-diagonal terms and are larger for the channels that are more sensitive to liquid water (31.4GHz).According to the observed LWP the corresponding covariance is added to the combined observation and forward model covariance matrix S z to account for the assumption of single-layer liquid water clouds.

Optimal estimation method (OEM)
A schematic overview over the optimal estimation is given in Fig. 6.In basic terms, the forward model simulates what the MWR would observe given an arbitrary state.The problem is that several different states may produce the same measurement.This is a so-called ill-posed problem.To constrain the state space a priori information as lidar profiles are needed.In the proposed retrieval the lidar profiles are Kalman filtered as mentioned above.Finally, the optimal estimation finds the most probable solution (mixing ratio profile) from a class of solutions.The theory of inverse modelling based on optimal estimation methods is briefly introduced in this section and described in more detail in Rodgers (2000).
The optimal estimation of an atmospheric state by a given observation vector z and an a priori state x a = x F can be found by minimizing the cost J ( x) function of the form Here J a ( x) indicates the a priori costs, J z ( x) the observation costs and J sup ( x) is a penalty term to avoid supersaturation.Since both liquid and ice phase can occur in clouds at temperatures between −38 and −5 • C (Heymsfield and Sabin, 1989;Koop et al., 2000;Ansmann et al., 2009;Kanitz et al., 2011), the saturation mixing ratio is defined as follows: where q sat liq and q sat ice are the saturation mixing ratios above liquid water and ice, respectively.The q sat lin denotes a linear function that describes the transition from q sat liq to q sat ice .The related uncertainty is defined as the difference between q sat liq and q sat lin and between q sat lin and q sat ice , respectively.It amounts to a maximum of 0.23 g kg −1 at −8 • C and decreases with decreasing temperature which usually means increasing height.
J sup ( x) adds a penalty if the retrieval produces supersaturation all over the profile (Phalippou, 1996;Schneebeli, 2009).This function is defined by J sup (x j ) =    0 : q j q sat j ζ q j − q sat j 3 : q j > q sat j . ( The constant ζ = 10 6 drives the strictness of the constraint.The larger ζ , the more strict is the constraint.Here, a large value is set to avoid supersaturation all over the profile.However supersaturation is not completely avoided due to the uncertainties in the temperature profiles from the MWR that are the basis of the saturation mixing ratio q sat .
www.atmos-meas-tech.net/10/3325/2017/Atmos.Meas.Tech., 10, 3325-3344, 2017 Figure 5 illustrates the benefit of the supersaturation constraint on 23 April 2013, 01:02 UTC.The disregard of the constraint results in too large mixing ratio values in altitudes above 5 km.This overestimation corresponds to a supersaturation of 200 up to 300 % relative humidity.The application of the constraint prevents the overestimation of humidity.The resulting values are in good agreement with the saturation mixing ratio with relative humidity values not exceeding 115 %, which is more realistic.
The implementation of a constraint that prohibits subsaturation within clouds is not beneficial in this application.The assumption of single-layer liquid water clouds and the uncertainties in the temperature profile would result in uncertain saturation mixing ratio profiles and finally lead to wrong retrievals.
With each term written out Eq. (13) becomes For clarity the time index is omitted here.x is the optimal estimate of the atmospheric state.S a and S z denote the covariance matrices of the a priori state and the observation, respectively.The optimum solution can be found iteratively using the Levenberg-Marquardt method: with i being the iteration index.The dots above J indicate the first and the second derivative, respectively.The Levenberg-Marquardt parameter γ is increased by a factor of 10 if J ( xi+1 ) J ( xi ) and reduced by a factor of 2 if J ( xi+1 ) < J ( xi ).In this work the initial value of γ = 2.It was found that the Levenberg-Marquardt method does not reach convergence faster but more reliably than the Gauss-Newton approach (γ = 0) (Rodgers, 2000;Schneebeli, 2009).If γ → ∞, the step tends towards the steepest descent of the cost function, allowing for leaving a local minimum towards a global minimum (Hewison and Gaffard, 2006).K i denotes the weighting function matrix, also known as Jacobian or kernel (hence K), but from now on Jacobian.
It is defined as and calculated by perturbing the state vector at each height level by ln(0.1 g kg −1 ).Equation ( 18) is iterated until the following criterion is fulfilled: with S δz being the covariance matrix between the measurement and F( x): Finally, the covariance matrix of the resulting analysed state vector (a posteriori) is calculated as Since the retrieval might converge to a false minimum it is necessary to check the retrieval for correct convergence.Therefore, the χ 2 test for consistency of the optimal retrieval (x op ) with the observation (z obs ) is introduced: Here, the forward modelled state F(x op ) and the observation vector z obs are compared with the error covariance matrix S δz .The test is usually used to look for outliers, i.e. cases where the χ 2 value is larger than a threshold value (χ thr ).χ thr is calculated for a probability of 5 % that χ 2 is greater than the threshold for a theoretical χ 2 distribution with m z degrees of freedom.All retrieved profiles with a χ 2 value that exceeds the threshold are marked as untrustworthy.The χ 2 values of all retrieved profiles are analysed and discussed in Sect. 5.
The averaging kernel matrix A gives the sensitivity of the retrieval to the true state: The rows a T i of A are the averaging kernels.In an ideal inverse method, A would be a unity matrix.Generally the averaging kernels are peaked functions which indicate the smearing of information across multiple levels.In this work, the averaging kernels are not peaked functions, because the MWR observation does not provide enough vertical information.This issue is covered in detail in Sect.4.1.The averaging kernel has an area a area , which is a measure of fraction that comes from the observation, rather than the a priori.The area of a i is the sum of its elements and can be calculated as Au where u is a vector with unit elements.
The information content of a measurement can be expressed by the degree of freedom (d), which is the trace of A. d is a measure of how many independent quantities are measured.One has to consider that the larger the a priori uncertainty, the larger d and the larger the retrieved a posteriori uncertainty (Ebell et al., 2010).
In summary, the retrieval is strongly driven by the a priori uncertainty which constrains the subspace in which the retrieval must lie.The larger the off-diagonal elements of this covariance, that means the higher the correlations and the smaller is the subspace.For that reason the a priori covariance has to be estimated very carefully.In the proposed retrieval the a priori covariance is strongly decreased by the application of the Kalman filter that reduces the subspace of possible solutions.

Cloud-free conditions
In this section the general functionality of the retrieval of water vapour profiles and basic parameters such as averaging kernels and degree of freedom are introduced using a straightforward cloud-free case.Figure 7 gives an overview of a mostly cloud-free day (5 May 2013).It shows the LWP, the height-time display of the mixing ratio measured by the Raman lidar Polly XT and the height-time display of the retrieved profiles after applying the two-step algorithm.The vertical and temporal resolution of the Raman lidar mixing ratio profiles is 90 m and 5 min, respectively.In the early morning up to 03:00 UTC the mixing ratio could be measured very well by the lidar (Fig. 7b).With the rising sun the profiles are more and more noisy such that even the lowermost values are disturbed.For that reason the lidar profiles can be no longer used; they serve as an input to the OEM only if they are available.At 05:00 UTC the water vapour channel is automatically switched off and usually switched on again at 18:00 UTC.The noise decreases after sunset allowing an undisturbed water vapour observation from 20:00 UTC on.An automated depolarization calibration produces a gap around 22:00 UTC.The cloud base height indicates the development of boundary layer clouds which can also be seen in the LWP values during daytime (Fig. 7a).Although there are no lidar profiles during the day, a complete time series of mixing ratio profiles can be retrieved (Fig. 7c).In the following, the retrieval application of two different conditions, with full height and without mixing ratio profiles from lidar, are distinguished.
Figure 8 illustrates the algorithm processing in the presence of full-height calibrated Raman lidar profiles on 5 May 2013, 23:02 UTC.The last analysed state (from 5 min ago) is propagated in time to the estimated state (Fig. 8a).The propagation is a 1 : 1 translation.Its uncertainty is small because it originates in the last analysed state that was also driven by a lidar profile.The plotted uncertainties are the square roots of the diagonal elements of the corresponding covariance matrix.The Kalman filter combines the current lidar measurement and the estimated state to the filtered state that is more driven by the estimated state than by the lidar measurement (Fig. 8b).The filtered profile serves as input (a priori) to the optimal estimation (Fig. 8c).The small uncertainties of the a priori forces the retrieval to resemble the filtered state with similar uncertainties.The ability of the li-  dar to perform precise water vapour measurements results in small differences to the reference RS.The comparison to RS is discussed in detail in the next paragraph.Figure 8d shows the averaging kernels for a subset of 10 levels.They demonstrate how the information in one retrieved bin is derived from an average of those around it.Ideally the averaging kernels are peaked functions.However, the vertical humidity information at the HATPRO frequencies is limited, which results in smooth functions that are similar to each other.The area of the averaging kernels a area describes the sensitivity to a unit perturbation.It gives an indication of where the MWR observation is sensitive to the true state and where the final information originates.a area values around unity or differing from unity indicate that the information originates in the observation (z) or in the a priori, respectively.In Fig. 8e, a area is close to zero up to 6 km and increases to values around 1.8 for higher altitudes.This means that the MWR observation is not sensitive to the true state, caused by small a priori (Kalman filtered) uncertainty.In this case the retrieved profile is driven by the accurate a priori state that originates in the lidar measurement.The information content that comes from the observation is given by the degree of freedom d. Figure 8e represents the accumulated degree of freedom d acc which maximally amounts to ∼ 0.4.That means that 0.4 independent pieces of information are added by the observation (MWR and surface value).
As mentioned above, the retrieved optimal profile (OEM) fits well with the RS profile.A more intense comparison is illustrated in Fig. 9a.Instead of feeding the retrieval with lidar data, one can only use the MWR data as well.In this way, the improvement of applying Kalman-filtered lidar profiles as a priori is emphasized.In such cases (OEM MWR ) the Kalman filter is completely skipped.The profile corre- sponding to d = 2 is added to Fig. 9a.The uncertainties are larger over the whole profile in comparison to the OEM.Both the OEM MWR and the MWR profiles from the statistical retrieval (MWR stat ) are unable to distinguish vertical structures as indicated by the OEM and RS.For that reason, their absolute differences to the RS are larger than those from the OEM (Fig. 9b).Furthermore, in this application the OEM MWR clearly overestimates the humidity below 1 km.
The OEM profile fits best and the zero line (no difference) is within the error bars over nearly the whole profile.The OEM is slightly more accurate especially near the surface and with smaller uncertainties over the whole profile.The relative differences (to RS) are smaller below 4 km and large for altitudes where the mixing ratio from RS is small (Fig. 9c).In summary, the OEM profile fits best with small uncertainties and differences referred to the RS.However, in cases with full-height lidar profiles the optimal estimation is not necessary, because the Raman lidar profiles already contain nearly all information.But full-height lidar profiles are only available 18 % of the time during HOPE and by applying the OEM the data set is extended to 60 % coverage (see Sect. 5).
In contrast to 23:02 UTC there is no mixing ratio profile from lidar available at 07:02 UTC (Fig. 10a).Due to the missing lidar profiles the estimated and the filtered profiles as well as their uncertainties are the same (Fig. 10b).The difference between the filtered and the optimal estimated profile is very small since the atmospheric changes within a 5 min step are quite small.However, the uncertainty decreases near the ground.This is not only caused by the MWR but by the surface measurement which is also part the observation vector (z).The optimally estimated profile is very smooth, since the HATPRO frequencies do not provide enough information to distinguish fine vertical structures.This can be seen in the difference between the optimal estimated profile and the RS  profile which is used as reference.The corresponding averaging kernels (Fig. 10d) are smooth functions that are similar to each other, because the vertical humidity information at the HATPRO frequencies is limited.The area of the averaging kernels a area is around unity (Fig. 10e).This means that the MWR observation is sensitive to the true state and most information (nearly all) originates in the observation (z).The accumulated degree of freedom d acc maximally amounts to ∼ 1.9, meaning that 1.9 independent pieces of information can be retrieved.Löhnert et al. (2009) used RS climatology as a priori for different locations and found d values around 2 for humidity profiling with HATPRO.In contrast, one has to consider that here the observation vector is supplemented by the surface humidity which also adds information.The difference might be explained by different a priori covariance matrices S a .In summary, the presence of a lidar measurement results in more accurate retrievals, whereas retrievals without water vapour profiles from lidar are mainly driven by the MWR observation for example during daytime.However, the two-step algorithm makes it possible to retain structures from high vertically resolved lidar data to use for periods without lidar data.

Cloudy conditions
As introduced in Sect.3.4, liquid water strongly affects the absorption in the microwave region.Therefore, the operation of the retrieval in the presence of clouds containing liquid water has to be treated separately.Figure 11 shows an overview of a cloudy day, 21 April 2013.In the course of the day the LWP increases to a maximum of 600 g m −2 (Fig. 11a).Between 00:00 and 03:30 UTC the measured lidar profiles reach from ground up to the cloud base between 2.5 and 3.5 km.Referring to the rather low LWP the cloud seems to be an ice cloud.During the day, the mixing ratio is determined on the basis of the MWR observation only disturbed by five short interruptions that are caused by missing cloud base detection by lidar.From 19:30 UTC on the lidar profiles are truncated at the cloud base at around 1.5 km.The LWP shows that these clouds contain liquid water.The possible content of ice water is not relevant for the radiative transfer in the considered spectrum.However, ice clouds as well as all other clouds disturb the precise determination of water vapour with Raman lidar.For that reason the profile is only considered up to cloud base.The problem of truncated profiles is solved by the application of the Kalman filter (Sect.3.2).It enhances the profiles up to 10 km by the combination of previous information and the respective truncated lidar profile such that a full-height profile can serve as input to the optimal estimation.
A comparison between the retrieved profiles (OEM), the retrieved profiles based on climatology (OEM MWR ), the MWR profiles from the statistical retrieval (MWR stat ) and the RS is shown in Fig. 12a.There is a cloud with LWP = 242 g m −2 between 1.3 and 2.4 km.Both OEM MWR and MWR stat are unable to distinguish the vertical structure inside the cloud given by the RS.Furthermore, they show large differences to the RS profile below and slightly above the cloud (Fig. 12b).The OEM profile shows a good agreement with the RS profile below the cloud based on available lidar data.The associated uncertainties are small.Within the cloud the uncertainty increases.The profile approximates to the RS.Above the cloud, the OEM uncertainties are in the same range than the OEM MWR profile, whereas the difference to the RS profile is smaller.Over nearly the whole range the RS profile is within the uncertainty range of the OEM profile.However, for the most part, the RS profile is also within the OEM MWR uncertainty.The corresponding relative differences with the RS profile are plotted in Fig. 12c.Up to 4 km the relative difference of the OEM profile is less than 25 %.Above this height the relative difference increases.The OEM MWR and MWR stat have larger relative differences to the RS.In summary, the OEM fits best the RS with lowest differences in and above the cloud.

Statistical analysis
In the previous section (Sect.4) the functionality of the retrieval is introduced based on clear-sky and cloudy cases during HOPE.A statistical analysis of the retrieved water vapour profiles during the whole HOPE campaign is presented in the following section.Here, also profiles from RS and the OEM MWR (without lidar) are used as reference.
First, an overview over the calibrated water vapour profiles observed by Polly XT during HOPE is given in Fig. 13a.The grey area indicates regions without lidar data (up to 6 km) due to cloud attenuation (17 %) and during the day (65 %).The well-resolved vertical profiles enable the determination of distinct water vapour structures or inversions that can be seen, for example, at around 1 km during the night between 26 and 27 May 2013.
As introduced in the previous sections, one can use the covariance of the RS climatology as uncertainty from the previous state, instead of lidar data.However, the cloud base height determined by the lidar is necessary.This approach (OEM MWR ) is only based on the observation with MWR and surface humidity and is similar to that proposed by Löhnert et al. (2009).The corresponding height-time display is illustrated in Fig. 13b.The gaps (40 %) are caused by rain, MWR malfunctions, flagged MWR data, the absence of cloud base height from lidar or that no solution was found by the retrieval.Nevertheless, the profile availability is 60 %.Although the data availability for OEM MWR is larger than for the Raman lidar (Fig. 13a), the vertical resolution is coarser.This can be seen clearly by comparing to the lidar profiles (Fig. 13a) of the night between 26 and 27 May 2013.
Figure 13c shows the retrieved mixing ratio profiles (OEM) based on the method that was described in the previous sections.The data coverage is nearly the same as for OEM MWR .However, the OEM is able to retrieve fine water vapour structures by means of the lidar profiles.The OEM enables not only the distinction between dry (e.g.beginning of April) and more humid (e.g.middle of April) periods but also the vertical distribution of water vapour especially from within and above a cloud.For a more comprehensive investigation of the quality of the profiles a differentiation between three situations based on certain initial conditions is helpful.These situations are in accordance with the case studies presented in the previous section (Sect.4).The first situation includes cases where a full-height lidar profile is available (minimum up to 8 km).Such a case is presented in Sect.4.1 especially in Fig. 8. Referring to the statistical analysis these profiles are marked in blue unless stated otherwise.The second group includes cases with lidar profiles which are truncated between 0 and 8 km mostly due to clouds.Such cases were introduced in Sect.4.2 in Fig. 12 and are marked in green from now on.The last group contains all cases without lidar profiles as introduced in Fig. 10 shown in red.An overview is given in Table 2.The table also lists the sample size for all profiles and those that are used for comparisons with RS.These are also distinguished between profiles passing and failing the χ 2 test that is discussed later in this section.Additionally, the OEM MWR is used as reference and is marked in grey.
To assess the accuracy of a water vapour profile, reference profiles from RS and OEM MWR profiles are used.In this work the bias and the root mean square error (RMSE) between the retrieved profiles and those from RS are applied to evaluate the quality of the retrieved profiles.For this comparison retrieved profiles that are between RS launch time and 1 h after launch time are used.This results in a maximum of 12 profiles for one sounding.Only cases which pass the χ 2 test are considered for the comparison.Figure 14a shows the bias for the specified situations and for the OEM MWR .The blue line illustrates the retrieved profiles that are based on lidar profiles in minimum up to 8 km (clear sky).It has a maximum value of 0.5 g kg −1 near the surface and it decreases close to zero above 1.5 km.However, the bias is pos-Table 2. Overview of the different situations depending on Raman lidar mixing ratio (RL MR) profile availability and truncation height (h tr ) where the RL MR profile is truncated (due to clouds).The three columns on the right indicate the sample size used for the comparison with radiosonde (RS), to validate the retrieved profiles, and all cases.Furthermore, the profiles that are used for the comparison with RS are separated between those passing and failing the χ 2 test based on a threshold χ 2 thr .The temporal resolution of the retrieved profiles amounts to 5 min.2. The sample size is given by the numbers in the middle panel.Only profiles between RS launch time and 1 h after are considered.

RL MR profiles
itive, which means that the retrieved profiles have larger values than the RS profiles.Above 6 km the retrieved profiles show higher values than the RS.This bias needs to be investigated in further studies and is beyond the scope of this study.
The bias of the situations where the lidar profiles are truncated below 8 km is shown in green (Fig. 14a).The values are in maximum around 0.6 g kg −1 and are largest in the planetary boundary layer.Above 2.5 km the bias is around zero.The bias of the situations where no lidar profiles are available and of the OEM MWR show a similar behaviour to each other.Both curves show an overestimation of the retrieved mixing ratio within the boundary layer up to 2 km.Between 2 and 5 km the retrieval underestimates the mixing ratio by around −0.4 g kg −1 .Additionally, the small amount of vertical information that comes from the MWR observation might not be able to compensate this misbehaviour and to resemble the profile given by the reference.This effect can also be seen in the presented clear-sky case study in Fig. 10.Nevertheless, situations where no lidar profiles are available show a bias closer to zero than the OEM MWR .These cases benefit from the night cases whose vertical structure is propagated into the day cases.The positive biases of all four curves seem to have a systematic difference that might be explained by some sources of uncertainty in the RS profiles.The different locations of the platform in Krauthausen and the RS launch station and drifts of the balloon might result in the observation of different air masses (Foth et al., 2015).Naturally, the forward model itself is a source of uncertainty.The modelled brightness temperatures strongly depend on the assumed absorption line shapes (Turner et al., 2009).Figure 15 illustrates a comparison of forward models using two different gas absorption models: Rosenkranz (1998, R98) and Liebe et al. (1993, L93).The differences are the line shape parameters of the 22.235 GHz water vapour line, as well as the water vapour continuum absorption.Both models are corrected for water vapour continuum absorption according to Turner et al. (2009).All other parameters, e.g.cloud absorption, are the same.Both forward models were performed under two different a priori states, both without lidar.The first uses the a priori profile and the a priori covariance from RS climatology.It simulates the theoretical uncertainty (theor.)only induced by the different absorption models.In the other case the a priori profile is propagated (prop.)from the previous state as used in the original retrieval.Here, the a priori uncertainty is also taken from the RS climatology.The bias to RS in the second case is larger because the theoretical uncertainty is propagated from each previous state resulting in an increase in uncertainty (Fig. 15a).It can be seen that the L93 model has a smaller bias below 1 km.Above 2.5 km the R98 model simulations better fit the RS with a bias around −0.3 g kg −1 and a bias close to 0 g kg −1 above 5 km.The retrieved uncertainty, the so-called a posteriori uncertainty, of the R98 simulations are smaller than those from the L93.The uncertainty of the L93 runs is also largest in heights above 3 km.Finally, the R98 gas absorption model seems to be more suitable for the presented retrieval.Nevertheless, the forward model is a major source of uncertainty.The RMSE between OEM and RS is illustrated in Fig. 14b.It gives an indication of the statistic error.The RMSE of all four curves decreases with height.In addition, the RMSE is smaller for cases with lidar profiles as a priori and larger for those without.The RMSE of the HOPE RS profiles is larger than any RMSE of the retrieved profiles, which is basically the variance of mixing ratio in the whole period.
Figure 14c illustrates the a posteriori uncertainty of the mixing ratio profiles (see Eq. 22).The black line indicates the uncertainty of the RS climatology which is the square root of the diagonal elements of its covariance matrix.It can clearly be seen that the retrieved a posteriori uncertainty is smaller for all situations.The curves of the cases without lidar profiles and the OEM MWR are nearly in agreement.In both cases the Kalman filter is skipped due to the absence of lidar profiles.Therefore, both use the same a priori uncertainty and their retrievals are solely driven by the MWR and surface humidity observation.The presence of lidar data (full height or truncated) results in much lower uncertainties.Their small a posteriori uncertainties underline the synergy improvement.
In summary, Fig. 14 clearly shows that the application of Kalman-filtered lidar profiles enormously improves the accuracy and quality of the retrieved mixing ratio profiles.
Another possibility to evaluate the accuracy of the retrieved profiles is to analyse the bias as a function of the mixing ratio (Fig. 16).The slope of the regression line is smaller than the one-to-one line.This means that larger differences occur for larger mixing ratios.Figure 16 also indicates the correlation between retrieved and RS mixing ratios.The squared coefficient of correlation R 2 is largest for those situations with full-height lidar profiles and amounts to 0.97 (Fig. 16a).The R 2 of the OEM based on truncated lidar profiles (panel b) is slightly smaller (0.96).In situations without lidar data and the OEM MWR have still smaller values of 0.92 and 0.91, respectively.Nevertheless, all cases show a better agreement with RS than the OEM MWR .This illustration also demonstrates the synergy improvement by implementing the lidar data with a Kalman filter before applying the OEM.
To assess the quality of retrieved profiles a statistical test for correct convergence of the solution is applied.The modelled state F(x op ) and the observation vector z obs are compared with the error covariance matrix S δz (see Eq. 21) to check if the retrieval is consistent with the observation.Figure 17 shows the χ 2 test statistics for all mentioned situations.The χ 2 test was introduced in Sect.3.5.It can be seen that 29 profiles are rejected in the situations with full-height lidar profiles because their χ 2 value exceeds the 5 % threshold value of 14 (Fig. 17a).The amount of untrustworthy profiles is similar to the situations with truncated lidar profiles.In both cases the smaller a priori uncertainty prevents an adjustment of the modelled brightness temperatures to those measured by MWR.For that reason, their difference is larger resulting in a larger χ 2 value.The χ 2 test rejects a smaller relative amount of profiles for the daytime cases (panel c) and at the OEM MWR (panel d).Their larger a priori uncertainty enables a better match between the modelled and the measured brightness temperatures.However, all situations show a peak at small values that originates in a very good agreement between the forward modelled optimal state and the observation vector.Admittedly, the test is very strict and rejects all failing profiles although they might be realistic atmospheric states.Nevertheless, it enhances the confidence of the retrieved profiles.
A good measure for the proportion of information that comes from the observation is given by the degree of freedom.It describes the number of independent pieces of information that is added by the retrieval and has already been introduced in Sects.3.5 and 4. Figure 18a illustrates the de-  gree of freedom as a function of truncation height.It clearly demonstrates that the lower the truncation height the higher the degree of freedom.This is caused by the larger a priori uncertainty in cases with truncated or without lidar mixing ratio profiles.The sample size is much higher than in the comparisons above because here all profiles can be used and not only those around the RS launch time.Most of the grey crosses are not visible because they are covered by the red diamonds.The related frequency distributions are shown in Fig. 18b.Both the OEM MWR and the daytime cases are very similar to each other.Even their mean values and standard deviations are nearly identical, with values of 1.9±0.22.These values are in good agreement with those found by Löhnert et al. (2009) for a similar approach.The situations with the truncated lidar profiles show a wide range of values from 0.3 to 2.1.The green distribution also has the largest standard deviation, which amounts to 0.34.The situations with full-height lidar profiles have the smallest mean and standard deviation with values of 0.45 ± 0.17.These cases are mostly driven by the a priori information and not by the observation.The variation within each situation is caused by different atmospheric conditions.Figure 19 illustrates the degree of freedom as a function of IWV.It shows an increase in d with increasing IWV caused by a stronger emission of water vapour.For higher IWV, the MWR is able to add more information to the retrieval.Finally, the behaviour of the degree of freedom and especially its dependence on truncation height and hence a priori uncertainty agrees well with similar studies (Löhnert et al., 2009;Ebell et al., 2013).

Conclusions
A good knowledge of the water vapour distribution is essential for the description of the thermodynamic state of the troposphere.Since the continuous observation of water vapour profiles with a single instrument is challenging, the synergy of complementary information from active and passive remote sensing has become more important in recent years.
In this study we present a two-step retrieval combining the Raman lidar water vapour profiles with the MWR brightness temperatures.The Kalman-filtered water vapour profile serve as input (a priori) to the one-dimensional variational approach, also known as optimal estimation.In addition to the water vapour profile, its uncertainty is retrieved.
The retrieval enables the observation of a continuous time series of water vapour profiles with known uncertainties.During HOPE, the availability of full-height water vapour profiles from lidar amounts to 17 % excluding all cloudy and daytime cases.By applying the retrieval, the availability of water vapour profiles can be enlarged to 60 %.The bias with respect to RS and the retrieved a posteriori uncertainty of the retrieved profiles clearly show that the application of the Kalman filter considerably improves the accuracy and quality of the retrieved mixing ratio profiles.In the presence of fullheight Raman lidar profiles, the MWR does not add much information to the retrieved profiles.However, cases without Raman lidar profiles are dominated by the MWR information with a larger degree of freedom.The lower the truncation height of the lidar profiles, the higher the importance of the MWR.
Furthermore, the retrieval can be applied to raw data (photon counts) using the calibration method based on Foth et al. (2015) or using already calibrated profiles.
In future steps, the precipitation evaporation can be assessed by means of observed or retrieved temperature and humidity profiles.This information can be used to improve model parametrization of physical processes with water vapour participation and finally to improve weather and climate predictions.
The retrieval will be implemented into the Cloudnet processing.A better knowledge of the water vapour distribution and the collocated and simultaneous monitoring of cloud microphysics within Cloudnet might improve the understanding of cloud formation, precipitation, evaporation and entrainment rates.The application of this algorithm might help to decrease uncertainties in the area of cloud and precipitation formation as well as cloud dissipation, as mentioned in the latest IPCC report (Boucher et al., 2013).
Data availability.The quality-controlled MWR, RS and tower data used in this work are archived in a common format to the HD(CP) 2 data archive centre for Standardized Atmospheric Measurement Data (SAMD).All the data are publicly available at https://icdc.cen.uni-hamburg.de/index.php?id=samd.
For more information on data availability and on data policy of the Polly XT raw measurement data please contact the Polly team via the website (http://polly.tropos.de).
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "HD(CP)2 Observational Prototype Experiment (ACP/AMT interjournal SI)".It is not associated with a conference.

Figure 1 .
Figure 1.Absolute humidity weighting function for the HATPRO frequencies for a cloud-free model atmosphere.

Figure 2 .
Figure 2. Sketch of the retrieval scheme.Details are given in the text.This figure is adapted from Schneebeli (2009).

Figure 3 .
Figure 3. Correlation (a) and covariance matrix (b) derived from 211 radiosondes for HOPE.Both are shown for the natural logarithm of the mixing ratio (ln(MR)) as a function of height with a resolution of 90 m.
Figure 4. (a) Brightness temperature difference as a function of LWP (dots) using two different LWC assumptions.The colours indicate the according frequencies (top right).The mean and the standard deviation per bin size are indicated by coloured lines and error bars, respectively.The bin size amounts to 0.05 kg m −2 .The number of occurrences is given in grey bars at the top.(b) Exemplary covariance matrix for an LWP between 0.45 and 0.5 kg m −2 .The channel numbers correspond with the HATPRO frequencies given in (a), which means, for example, that 1 refers to 22.24 GHz.

Figure 5 .
Figure 5. Retrieved mixing ratio profiles both with (red) and without (green) supersaturation constraint on 23 April 2013, 01:02 UTC.Cloud base is indicated by the dashed line.The saturation mixing ratio is illustrated by the dotted grey line.The a priori profile (blue) for both scenarios is the same.

Figure 6 .
Figure 6.Illustration of the optimal estimation method.Details are given in the text.

Figure 7 .
Figure 7. Overview of a mostly cloud-free case on 5 May 2013.(a) liquid water path (LWP).(b) Height-time display of the mixing ratio measured by the Raman lidar.(c) Height-time display of the retrieved optimal estimated mixing ratio.The solid line indicates the height where the Raman lidar profiles are truncated.The dotted line defines the cloud base height determined by the lidar.

Figure 8 .
Figure 8. Overview of cloud-free scene on 5 May 2013, 23:02 UTC.Mixing ratio (MR) profiles from the Raman lidar and the estimated (a), the Kalman-filtered (b) and the optimally estimated state (c).Additionally, the mixing ratio of the radiosonde (RS) is shown (c).Error bars are added to the profiles at the different states of the processing.(d) Averaging kernel for a subset of 10 levels indicated by the coloured numbers.(e) Accumulated degree of freedom d acc (solid) and the area of the averaging kernel A area (dotted).

Figure 9 .
Figure 9. (a) Comparison of mixing ratio profiles on 5 May 2013 around 23:00 UTC: retrieved profile (OEM, red), retrieved profile with RS climatology as a priori (OEM MWR , blue), profile from the MWR statistical retrieval (green), the Raman lidar measurement (grey) and RS (black) as reference.Error bars are added to the optimally estimated profiles (red, blue, grey).Absolute (b) and relative (c) difference from the reference RS.

Figure 12 .
Figure 12.As Fig. 9 but on 21 April 2013 around 23:00 UTC.The grey area indicates the cloud with an LWP of 242 g m −2 .

Figure 13 .
Figure 13.Three different height-time displays of mixing ratio profiles during HOPE: (a) calibrated Raman lidar profiles, (b) optimal estimated profiles based only on MWR (and surface humidity) without any Raman lidar mixing ratio profile (OEM MWR ) and (c) optimal estimated profiles based on Kalman-filtered Raman lidar mixing ratio a priori profiles (OEM).

Figure 14 .
Figure 14.Statistical analysis of the synergy improvement: mean difference (bias) between the retrieval and the RS (a), root mean square error (RMSE) to RS (b) and a posteriori uncertainty (c).It distinguishes four situations according to Table2.The sample size is given by the numbers in the middle panel.Only profiles between RS launch time and 1 h after are considered.

Figure 15 .
Figure 15.Mean difference (bias) between the retrieval and the RS (a) and a posteriori uncertainty (b) for two different absorption codes: Rosenkranz, (R98, grey) and Liebe (L93, orange).The retrievals shown are based only on MWR but with different a priori states.On the one hand, both a priori profile and a priori uncertainty are taken from the RS climatology (theor.)and on the other hand the a priori profile is propagated (prop.)from the previous step while the uncertainty is taken from the RS climatology (red cases in the figures above).The sample size is given by the numbers.Only profiles between RS launch time and 1 h after launch time are considered.

Figure 16 .
Figure16.Comparison of optimal estimated (OEM) and radiosonde (RS) mixing ratio profiles for the four situations (panels ad) given in Table2.The black solid line indicates the regression line.

Figure 17 .
Figure17.Histograms of the χ 2 test for the four situations given in Table2.The dotted lines indicate the theoretical χ 2 distribution with m y degree of freedom.Dashed lines indicate the 5 % threshold value of 14.The absolute number of cases below and above the threshold value is given to the left and to the right of the dashed line, respectively.

Figure 18
Figure 18.(a) Degree of freedom as a function of truncation height for different situations introduced in Table2.(b) Frequency distribution of the degree of freedom.The symbols and error bars correspond to the related mean and standard deviation, respectively.The numbers indicate the sample size of the considered profiles; full height (blue), truncated (green), no lidar (red) and OEM MWR (grey).

Figure 19 .
Figure 19.Degree of freedom as a function of IWV for the situations introduced inTable 2. The lines indicate the according regression lines.

Table 1 .
Forward model error for each frequency due to different absorption codes.Uncertainties are given as square root of the diagonal elements of the covariance matrix.