Accuracy assessment of water vapour measurements from in situ and remote sensing techniques during the DEMEVAP 2011 campaign at OHP

The Development of Methodologies for Water Vapour Measurement (DEMEVAP) project aims at assessing and improving humidity sounding techniques and establishing a reference system based on the combination of Raman lidars, ground-based sensors and GPS. Such a system may be used for climate monitoring, radiosonde bias detection and correction, satellite measurement calibration/validation, and mm-level geodetic positioning with Global Navigation Satellite Systems. A field experiment was conducted in September–October 2011 at Observatoire de Haute-Provence (OHP). Two Raman lidars (IGN mobile lidar and OHP NDACC lidar), a stellar spectrometer (SOPHIE), a differential absorption spectrometer (SAOZ), a sun photometer (AERONET), 5 GPS receivers and 4 types of radiosondes (Vaisala RS92, MODEM M2K2-DC and M10, and Meteolabor Snow White) participated in the campaign. A total of 26 balloons with multiple radiosondes were flown during 16 clear nights. This paper presents preliminary findings from the analysis of all these data sets. Several classical Raman lidar calibration methods are evaluated which use either Vaisala RS92 measurements, point capacitive humidity measurements, or GPS integrated water vapour (IWV) measurements. A novel method proposed by Bosser et al. (2010) is also tested. It consists in calibrating the lidar measurements during the GPS data processing. The methods achieve a repeatability of 4–5 %. Changes in the calibration factor of IGN Raman lidar are evidenced which are attributed to frequent optical re-alignments. When modelling and correcting the changes as a linear function of time, the precision of the calibration factors improves to 2–3 %. However, the variations in the calibration factor, and hence the absolute accuracy, between methods and types of reference data remain at the level of 7 %. The intercomparison of radiosonde measurements shows good agreement between RS92 and Snow White measurements up to 12 km. An overall dry bias is found in the measurements from both MODEM radiosondes. Investigation of situations with low RH values ( < 10 %RH) in the lower and middle troposphere reveals, on occasion, a lower RH detection limit in the Snow White measurements compared to RS92 due to a saturation of the Peltier device. However, on other occasions, a dry bias is found in RS92, instead. On average, both RS92 and Snow White measurements show a slight moist bias at night-time compared to GPS IWV, while the MODEM measurements show a large dry bias. The IWV measurements from SOPHIE (night-time) and SAOZ (daytime) spectrometers, AERONET photometer (daytime) and calibrated Raman lidar (night-time) showed excellent agreement with the GPS IWV measurements. Published by Copernicus Publications on behalf of the European Geosciences Union. 2778 O. Bock et al.: Accuracy assessment of water vapour measurements


Introduction
Water vapour is participating in many atmospheric processes and plays an important role in the climate. However its accurate measurement is a challenging task due to its rapid decrease with height, by several orders of magnitude, and high horizontal and temporal variability. Several different techniques are commonly used to measure either humidity profiles or total column integrated water vapour (IWV) throughout the troposphere for both operational and research applications. In this work we focus on the accuracy assessment and improvement of some frequently used in situ (radiosondes and ground-based sensors) and remote sensing techniques (Raman lidars, Global Positioning System (GPS), and spectrometers) to serve applications such as (1) weather forecasting, (2) climate monitoring (or atmospheric composition change), and (3) calibration/validation of satellite measurements. Weather forecasting is especially focused on the lower and middle troposphere where most of the water vapour is located. Climate monitoring is more focussed on the upper troposphere/lower stratosphere where H 2 O molecules strongly impact the global energy balance through cloud formation and absorption of infrared radiation. As satellite measurements are becoming the dominant source of data in geosciences, a careful calibration and validation of these measurements is required. This needs support from both in situ and ground-based observing systems. Present applications of satellite data cover a broad range of fields such as meteorology, climatology, geodesy (e.g., monitoring of the solid earth deformations), and altimetry (e.g., monitoring of sea level change).
Upper air measurements of water vapour have traditionally been made using radiosondes (balloon-borne sensors providing pressure, temperature, humidity and wind profile data). They have been used operationally for weather forecasting but are receiving increasing interest for climate applications thanks to the extended data record (more than 50 yr). Unfortunately, most operational radiosondes exhibit dry and wet humidity biases depending on the type of sensor, operations conditions (day/night), temperature of sensor (introducing a dependence with height), among other factors . The uncertainty of raw measurements is roughly in the range ±10 % in the lower-middle troposphere to ±50 % or more in the upper troposphere-lower stratosphere compared to a reference-quality standard such as a Cryogenic Frost-point Hygrometer (CFH) or a chilledmirror (CM) dew/frost-point hygrometer (Vömel et al., 2003(Vömel et al., , 2007a. Empirical models have been developed to correct the operational radiosonde humidity biases, mostly for research applications (e.g., Wang et al., 2002;Vömel et al., 2007b;Cady-Pereira et al., 2008;Nuret et al., 2008;Miloshevich et al., 2009;Ciesielski et al., 2010) and more recently for operational numerical weather prediction (Faccani et al., 2009). The continuous improvement of sensor technology and data analysis software requires frequent assessment of the quality of sounding systems. Such systems are provided by various manufacturers and used by many weather services and researchers worldwide (Nash et al., 2005(Nash et al., , 2011Sapucci et al., 2005;Miloshevich et al., 2006;Suortti et al., 2008).
Raman lidars are an alternative technique for retrieving high-resolution water vapour mixing ratio (WVMR) profiles in the troposphere but they operate only in clear sky conditions and, for most Raman lidar systems, during night-time (Vaughan et al., 1988;Whiteman et al., 1992). Their accuracy is limited by calibration uncertainties (systematic errors) and by photon-counting noise (random errors which are rapidly increasing with altitude). Special instrumental designs allow for sounding both the lower and the upper troposphere, during daytime and night-time (Goldsmith et al., 1998). Raman lidar calibration can be performed by various approaches. One early approach considered the determination of optical transmission characteristics of the system, but it is limited to 10 % at best due to uncertainties in the Raman cross sections of water vapour and nitrogen (Vaughan et al., 1988;Sherlock et al., 1999a). Calibration by comparison with other collocated sensors, such as ground sensors, radiosondes, or microwave radiometers and GPS, has thus become the standard (Ferrare et al., 1995;Turner et al., 2002;Whiteman et al., 2006a). While these methods are pertinent for process studies, strong limitations arise with the time continuity of the sensors used for lidar calibration for climate monitoring. Detection of changes in the lidar calibration can be monitored using independent methods like stable specific light sources (lamps) or zenith clear-sky observations (Sherlock et al., 1999a;Leblanc and McDermid, 2008;Hoareau et al., 2009). The repeatability of Raman lidar calibration reaches at best the 5 % level and depends highly on the accuracy of the reference sensor and stability of the lidar optical design.
Many experiments have been conducted over the past 15 yr to assess the accuracy of Raman lidars and radiosondes, among other techniques, and have allowed significant instrumental improvements. Some of these experiments focused on the lower troposphere (Revercomb et al., 2003;Whiteman et al., 2006a;Behrendt et al., 2007;Bhawar et al., 2011) while others focused on the upper troposphere (Ferrare et al., 2004;Soden et al., 2004;Whiteman et al., 2006b;Leblanc et al., 2011). Most of them also considered total column IWV measurements such as provided by solar photometers, stellar spectrometers, GPS and microwave radiometers. These measurements appear as good complementary observations to lidar measurements though they provide only total column information. Comparisons between these different column measurements are, so, fully relevant for lidar calibration.
This paper reports on the results from a field experiment that we conducted at Observatoire de Haute-Provence (OHP), France, during fall 2011, to serve as least three objectives: (1) assess and compare standard and new Raman lidar calibration methods with two different systems, one designed for sounding the lower troposphere (Tarniewicz et al. 2001;Bock et al., 2003) and one for the upper troposphere Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/ (Hoareau et al., 2009); (2) evaluate the quality of the operational radiosondes used by Meteo-France and other national weather services for weather forecasting; (3) evaluate the quality of IWV measurements from GPS and optical spectrometers. This work is part of a collaborative project called DEMEVAP (Development of Methodologies for Water Vapour measurement) which aims at developing Raman lidar-based systems capable of achieving water vapour measurements both in the lower and upper troposphere, with high accuracy and controlled long-term calibration at the level of 3 % or better. The system concept refers to the combined use of lidar and other measurements, especially from GPS and easily deployable ground-based sensors. An accuracy of 3 % is targeted for climate monitoring, radiosonde bias detection and correction, satellite measurement calibration/validation and mm-level geodetic positioning with GNSS (Global Navigation Satellite System) receivers. The long-term monitoring of the water vapour profile from ground instruments as performed within the NDACC (Network for the Detection of Atmospheric Composition Changes), was initiated with preliminary prototypes that need to be improved to reach their goal with an adequate accuracy. Such a campaign is helpful for designing the 2nd generation NDACC lidars. The Global Climate Observing System (GCOS) has identified the water vapour vertical distribution as one of the essential climate variable to monitor (Seidel et al., 2009) within the GRUAN network (GCOS Reference Upper-Air Network). In Sect. 2, we briefly describe the instruments that participated to the DEMEVAP 2011 field campaign and give an overview of the experiment. In Sect. 3 we provide more insight into the data processing methods. Section 4 presents the results of the comparisons and tests of methods. More specifically, we compare four Raman lidar calibration methods using upper-air radiosonde data, ground-based capacitive or chilled-mirror measurements, IWV measurements and a novel GPS-lidar coupled data processing method (Bosser et al., 2010). Tests of calibration using ground-based measurements (actually, 10 m a.g.l.) were possible thanks to the scanning capability of the Institut National de l'Information Géographique et Forestière (IGN) Raman lidar that was recently released. The Raman lidar WVMR measurements as well as the operational radiosondes humidity profiles are compared to the measurements from chilled-mirror hygrometers (Meteolabor, "Snow White"). We also assess the accuracy of GPS IWV estimates by comparing measurements from 5 colocated receivers/antennas, made of two different types, and test the impact of microwave absorbing material placed beneath 2 of the 5 antennas (this material is intended to reduce scattering of GPS signals which is one of the major error sources in GPS meteorology; Elosegui et al., 1995;Ning et al., 2011).
Note that in this manuscript %RH is used to refer to the commonly used percent unit of relative humidity measurements, whereas % refers to a general definition of a percentage, independent of the physical unit of the discussed variable.

Campaign and instruments
The DEMEVAP 2011 campaign was conducted at the Gerard Mégie lidar station of OHP, which is part of the NDACC. This site was chosen because it offers permanent instrumentation (including a water vapour Raman lidar), could easily host additional instruments for the campaign, and provides interesting measurement conditions (low aerosol burden thanks to a 680 m elevation above sea level in rural conditions). The location of the instruments is shown in Fig. 1 and their technical specifications are described in the subsections below. It is seen from Fig. 1 that all the lidars, GPS and radiosonde launch pad are collocated within a few tens of meters. The two 10 m masts equipped with capacitive humidity sensors were located 90 and 180 m away from the scanning Raman lidar to provide calibration measurements beyond the lidar overlap distance.

The NDACC OHP water vapour Raman lidar
The observatory of Haute-Provence was designed as a primary station of the NDACC network including lidar for ozone, temperature and stratospheric aerosol monitoring. More recently NDACC has identified water vapour as a key parameter for the Upper Troposphere-Lower Stratosphere (UTLS) issues and has promoted improvement of instruments for measuring water vapour, including Raman lidars. A Nitrogen Raman channel was implemented since 1989 (Keckhut et al., 1990) on the Lidar emitting at 532 nm dedicated for elastic backscattering signals for temperature measurements. Another Raman channel centred at 660 nm for the water vapour Raman scattering was implemented in 1994 (Sherlock et al., 1999b). After several tests, and except detectors, the lidar design remained the same since 1999. The pulsed emission is a second harmonic of a Nd:Yag laser providing 350 mJ at 50 Hz. An afocal telescope enhances the laser beam section by a factor of 20. The Raman backscattering signals are collected by a specific 80 cm diameter Cassegrain telescope coupled with an OH rich optical fiber of 0.9 mm diameter (for reducing fluorescence effects). A bistatic configuration is used with the emitter axe located at 60 cm from the collector axe leading to a recovery of the laser and the field of view completed at an altitude of 3-4 km. The water vapour mixing ratio is deduced from the ratio of the both Raman signals. Since the overlap function is the same for both Raman signals, measurements are possible down to 2 km. Raman signals are separated by a dichroic mirror and separated from the background and Rayleigh residual signals using a succession of several low-pass and band-pass interference filters (Barr Associates) giving a final half widths of 0.7 nm (Sherlock et al., 1999b;Hoareau et al., 2009). The detection is insured with different photomultipliers and avalanche photodiodes operating in photocounting mode. During the campaign the detection was insured with cooled Hamamatsu photomultipliers R1477. An aerosol channel with a specific 20 cm diameter telescope is used to detect simultaneously cirrus cloud occurrence. Photo-counting is insured by a home-made electronic interfaced with a PC providing a vertical resolution of 75 m and an integration of 8000 shots. This system experienced numerous failures with a random emission time tagging. These problems hampered the use of measurements from this lidar during the September period of the DEMEVAP campaign. They were resolved in October 2012 allowing only 3 nominal profiles in coincidence with other measurements. Measurements were analysed using the standard method developed by Hoareau et al. (2009) and used for NDACC water vapour monitoring. This method is based on averaged signals over a period of quasi-stationary water vapour and cloud scene and longer than 20 min. The calibration is usually insured in extending downward the profile with data from the nearby radiosonde station in Nîmes, France, and using on site total water vapour column using astronomical observation of the SOPHIE spectrometer.

IGN-LATMOS scanning water vapour Raman lidar
This is a mobile research instrument mounted in a small (5 m long) van, equipped with a motorized periscope (two rotating square aluminized mirrors of 40 cm × 40 cm) allowing measuring in 0-360 • azimuths and 0-90 • elevations with a precision of 1°. The lidar is based on a tripled Nd:YAG laser (355 nm) transmitting 4 ns pulses of 70 mJ at 20 Hz. The receiving optics consists of a 30 cm diameter Cassegrain telescope, fiber-coupled to a filtering and detection stage composed of interference filters and photo-multiplier tubes (Tarniewicz et al., 2001). For the DEMEVAP experiment, the system used a 0.8 mm diameter fibre at the beginning and a 0.4 mm diameter fibre after the 28 September, and 0.4 nm bandwidth interference filters (Barr Associates) centred on 354.7 nm (elastic backscatter), 386.7 nm (Nitrogen Ramanshifted wavelength), and 407.6 nm (Water Vapour Ramanshifted wavelength). The two Raman signals are detected in photo-counting mode using Licel GmbH transient recorders.
The raw measurements are sampled in 7.5 m bins and integrated over 20 s (400 shots). The laser beam is expanded to a 45 mm diameter before being transmitted co-axially with the receiving telescope. The optical alignment of transmitter and receiver is controlled manually several times each night by maximizing the return signal at a 2-3 km range. The optimal position of the optics fibre is close to the telescope focal plane. It is usually adjusted at the beginning of the experiment and then controlled from time to time. During the DEMEVAP experiment, two events perturbed significantly the optical alignments and hence the continuity in the measurements. One occurred on 26 September, when the operators lost the signal and could not realign the system. This required a complete check of the system, including the dismounting of the optics fibre and subsequent re-alignment. Another one occurred on 28 September, when the operators damaged the optics fibre with an incidental full-power laser return during a control alignment using a retroreflector. This required the replacement of the optics fibre and again a complete re-alignment of the system. The replacement of the optics fibre (from 0.8 to 0.4 mm) and the subsequent small changes of the position of the N 2 and H 2 O beams on the respective photomultipliers can induce changes in the lidar calibration constant because of the inhomogeneity of the actives surfaces of the photomultipliers (Dinoev et al., 2013). During the experiment, we also changed frequently the position of the optics fibre near the telescope focus in order to optimize the short-range signal-to-noise ratio for the low elevation calibration measurements to the 10 m masts (see below). Because optics fibres do not perform complete angular scrambling, changing the relative position of the beam Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/ at the input of the fibre impacts slightly the geometry of the beam at the output, and hence the H 2 O/N 2 calibration factor. Though we believed that this effect would be negligible, it appeared from laboratory experiments made after the field campaign that it can produce changes of ±5 to 10 % in the calibration factor. The main reason is again inhomogeneity of the active surfaces of the photomultipliers. The changes in the lidar calibration factor during the DEMEVAP experiment were monitored with the ratio of two N 2 signals detected in the H 2 O and N 2 channels using a common N 2 filter. This procedure was described originally by Whiteman et al. (1992), and is used in routine during the IGN lidar operations to detect calibration changes (Tarniewicz et al., 2001). Figure 4b shows the time series of the N 2 signal ratio. To the first order, the ratio shows a linear drift of ∼ 15 % over the time of the experiment (45 days) superimposed with small changes from night to night. The drift is explained by cumulative displacements of the optics fibre near the focus of the telescope (up to 2 mm over the 45 days) as the operators tried to optimize the short-range signal-to-noise ratio. The small changes may be due to small changes in the geometry of the beam at the output of the optics fibre (e.g. due to steering of the transmitted beam or position of the optics fibre) or small changes in the high voltage of the photomultipliers (laboratory experiments showed that the calibration factor can change by 0.5-1 % every 1 V around the nominal voltage of 850 V). Drifts of similar magnitude (4.4 to 7.4 % per month) have been reported by Brocard et al. (2012). They attributed these drifts to a rapid decrease in the sensitivity of their photomultipliers due to intense daytime background irradiation. In total, the lidar worked for 15 nights during the main campaign period (12 September to 21 October) and two pre-campaign nights (6-7 September), providing 35 vertical WVMR profiles lasting 20 to 90 min among which 25 were coincident with radiosonde flights (Table 1). In between the zenith measurements, calibration measurements were performed at low elevation in directions of two 10 m masts equipped with capacitive humidity sensors (Fig. 1). The procedure consisted in pointing the lidar to calibration point 1 (CAL1), acquiring measurements for 5 min, then pointing the lidar to calibration point 2 (CAL2) and acquiring measurements for 5 min. The procedure was repeated twice each time to assess the repeatability uncertainty. A total of 134 low elevation profiles were collected (67 to each calibration point).

Radiosondes
In addition to the MODEM radiosonde station operated in routine by the Observatoire Astronomique Marseille-Provence (OAMP) team of the Gérard Mégie station within the frame of NDACC, three other radiosonde systems were installed by Meteo-France for the intercomparison: a second MODEM station, a Vaisala Digicora III station and a Meteolabor station. Four types of radiosondes were launched: MODEM M2K2-DC (used in routine at OHP), MODEM M10 (the new sonde used operationally by Meteo-France, Direction des Services d'Observation (DSO), since the end of 2011), Vaisala RS92 (mostly operated by Meteo-France, Centre National de Recherches Météorologiques (CNRM), in research experiments), and the Meteolabor Snow White (used by Meteo-France DSO as a reference sonde for the quality assessment of operational sondes). Most balloons launched during DEMEVAP carried 3 sondes and a few carried 4 sondes ( Table 1). The light-weight sondes (M2K2DC, M10 and RS92) were attached to a wooden stick, while the Snow White sonde was attached separately to the balloon with an extra wire. The motivation for operating all these systems was to assess the quality of the sondes used in operations by OAMP and Meteo-France and confront them to a CM reference sonde and lidar measurements. Specially, we wanted to assess the potential of Raman lidars as a reference and as an alternative to operational sondes in clear sky conditions such as that proposed by Revercomb et al. (2003), or implemented by Dinoev et al. (2013). Note that the potential of Raman lidars for transferring calibration between in situ (e.g. dew point sensors) and upper air measurements (e.g. from satellites) is based on the assumption that thanks to proper instrument design the calibration function can be reduced to only a function of time (i.e. it is not rangedependent). Once the calibration constant is fitted to surface measurements, it can be transferred throughout the profile. The signal-to-noise ratio, which is usually rapidly decreasing with altitude, is then the primary source of random errors.
The Vaisala RS92 radiosondes use thin-film capacitance RH sensors, where a thin hydrophilic polymer layer on a glass substrate acts as the dielectric of a capacitor . The sensor calibration relates the measured capacitance to RH with respect to liquid water and corrects for the effect of temperature of the sensor using a look-up table established by the manufacturer. The RS92 instrument is composed of two RH sensors that are alternately heated to eliminate the problem of sensor icing in clouds (Miloshevich et al., 2009). These radiosondes were tested during many field campaigns which allow establishing a list of error sources. These are of three kinds: biases (systematic errors), random errors and sensor time-lag errors. The main uncertainties include calibration bias, random production variability, time-lag error, solar radiation error (during daytime only), ground-check uncertainty, and roundoff error (when RH is reported as an integer). Miloshevich et al. (2009) provide an empirical correction model for the mean bias error and time-lag error yielding an accuracy of ±4 % of measured RH for night-time up to the lower stratosphere. Without applying these corrections, the accuracy of the RS92 radiosondes should be in the range ±15 % for altitude below 10 km. Comparisons to CFH measurements showed that RS92 radiosondes have a small moist bias in the lower troposphere and a dry bias in the upper troposphere at night-time (Miloshevich et al., 2009;Nash et al., 2011). Several studies also revealed that the night-time RS92 IWV measurements have a moist bias of ∼ 3 % (Cady Pereira et al., 2008;Bock and Nuret, 2009;Miloshevich et al., 2009). More recent radiosondes intercomparisons indicate that the RS92 radiosondes accuracy is at the 3-5 %RH level (Nash et al., 2011;Vaisala, 2011a). The Vaisala Digicora III software used during DEMEVAP was version 3.64. This software includes many of the bias corrections identified in earlier work (Vaisala, 2011b) and, hence, no additional bias correction was applied to the RS92 data used in this study. The MODEM M2K2DC and M10 radiosondes use a capacitance polymer as a RH sensor. Contrary to the Vaisala RS92 radiosonde, the M2K2DC has a dedicated temperature sensor to measure the temperature of the humidity sensor. This dedicated measurement is intended to provide a more accurate conversion of the measured capacitance into RH. The MODEM M2K2DC participated in the WMO inter-comparison in Yangjiang, China, in 2010, where it showed large moist biases at night (5-10 %) which were presumed to be due to the application of solar dry bias correction at night (Nash et al., 2011). Comparisons of M2K2DC measurements to GPS IWV performed at Nîmes and Ajaccio, France, showed also a moist bias at night, typically of 5 to 10 %. It was shown too that the difference between the bias at night and at day is about 10 % (if moist bias at night is 5 %, the dry bias during daytime will be 5 %). To our knowledge, no other study analysed the performance of the MO-DEM M2K2DC radiosondes. The M10 radiosonde is a very recent product and no intercomparison results have been published so far. DEMEVAP is the first experiment that evaluates the performance of this radiosonde.
The Meteolabor Snow White (SW) is a chilled-mirror hygrometer that uses a thermoelectric Peltier device to cool a mirror down to the dew-/frost-point temperature. An Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/ opto-electrical device monitors the thickness of the condensate on the mirror and a feedback loop controls the Peltier device to maintain a constant layer of condensate (dew or frost). Earlier studies revealed that the maximum frostpoint depression attainable by the Peltier cooler under operational conditions was between 25 • C  and 36 • C (Vömel et al., 2003). Comparisons between the SW and NOAA Frost Point Hygrometer (FPH) measurements revealed several limitations of the SW (Vömel et al., 2003): (1) the lower RH detection limit is about 6 %RH; (2) extended dry layers can cause the SW to lose condensate when RH is below this limit, it can take some time to the mirror to regain condensate above the dry layer (and sometimes this cannot be achieved); (3) dry and moist biases are observed in the upper troposphere, depending on the flight conditions. During DEMEVAP, some dry conditions were encountered in the lower/middle troposphere for which these limitations have to be kept in mind. The SW instrument is available in two models called "daytime" and "night" which differ in their design of the sampling inlet and housing. In the daytime model, the sensor is protected from solar light with a protective Styrofoam housing and the air is led to the sensor via a curved path which is prone to water vapour contamination during transit through clouds. In the night model the sensor is exposed to minimize this problem and also more efficiently cool the hot side of the Peltier device. The Meteolabor radiosondes used during DEMEVAP were PTU-GPS sondes (MRS-SRS-C34) with Snow White chilled-mirror hygrometers. All the SW measurements were made at night, but 6 out of 7 instruments were night models (026 and 059 versions) and 1 was a daytime model (040 version). All the radiosonde height measurements are expressed with respect to geometric heights for consistency with the with lidar measurements. The pressure, temperature, and humidity measurements at the surface are compared to groundbased sensors below (Sect. 2.5.3).

GPS receivers
Ground-based GPS receivers have been used in many field experiments and atmospheric process studies as remote sensors of IWV (Bevis et al., 1992;Rocken et al., 1993;Bock et al., 2004Bock et al., , 2005Koulali Idrissi et al., 2011). GPS receivers are operating continuously and in all weather conditions. Recent studies attest that the accuracy of GPS IWV ranges at best between 0.5 and 2.5 kg m −2 Schneider et al., 2010;Thomas et al., 2011;Buehler et al., 2012;Ning et al., 2012). Whether these numbers reflect an absolute level of accuracy is not clearly established because the absolute accuracy of the references is not well known either. The comparison of IWV measurements from GPS and other techniques depends also on the measuring conditions and thus on the local climate (the mean IWV can range from 1-2 kg m −2 in Polar atmospheres to 50-60 kg m −2 in the Tropics).
GPS IWV measurements are limited by three main error sources. The first one is contained in the raw phase measurements and results mainly from the interference of the direct signal transmitted by a satellite and the signals scattered and reflected from the environment around the ground-based antenna. The interference of direct and scatter signals is usually referred to as multipath and can result in phase errors of ∼ 1 cm (Elosegui et al., 1995). It can be mitigated by use of microwave absorbing material. However, multipath has long been dominated by the two other effects. The second effect is also contained in the raw phase measurements and consists in anomalous phase offsets and variations in the electrical response of the antenna with the angle of incidence of the electromagnetic waves (Niell et al., 2001). It is called the antenna phase centre offsets (PCO) and phase centre variations (PCV). The third effect results from the modelling errors of tropospheric delay during data processing. The latter effect has long been the dominant one in GPS and other geodetic techniques (Davis et al., 1985). Recent modelling improvements have significantly reduced the antenna PCO and PCV errors (Schmid et al., 2007) and the tropospheric modelling errors (Boehm et al., 2006a, b). Hence, mitigation of multipath has recently regained attention (Ning et al., 2011).
As part of the DEMEVAP instrumental deployment, we installed 5 GPS stations. The antennas were mounted on the roof of the main building of the Gérard Mégie station at OHP (i.e. about 10 m from each of the two Raman lidars and the radiosonde launch pad). Four of the antennas were placed at the corners of a 4 m × 4 m square, and the fifth was placed into its centre. The four GPS receiver/antennas on the corners (referred to as GPS No. 1 to 4) were all made of Topcon GB1000 receivers and Trimble Zephyr GPS antennas (TRM 41249). The fifth receiver/antenna (GPS No. 5) was a Trimble NetR9 GNSS receiver and Trimble Zephyr GNSS antenna (TRM 55971). All five stations were installed on 7 September 2011. A surface of 1.8 m by 1.8 m of 77 mm thick microwave absorbing material (Eccosorb AN-77 polyurethane foam) was placed under the antennas of GPS No. 1 and 2. The absorbing material was only installed on 21 September 2011 due to late delivery from the manufacturer. All the receivers recorded phase and code measurements from the GPS satellites on the two GPS carrier frequencies (1227 and 1575 MHz) at 30 s interval. The elevation mask was set to 5 • . Details of data processing are given in Sect. 3.3 below.

SOPHIE stellar spectrometer
SOPHIE is a spectrometer operating on the 193 cm telescope of OHP. It was designed for the detection of exoplanets by the radial velocity method (Bouchy et al., 2013). The observation of astronomical objects (stars, galaxies, etc.) is made in the visible spectral range from 387.2 nm to 694.3 nm with www.atmos-meas-tech.net/6/2777/2013/ Atmos. Meas. Tech., 6, 2777-2802, 2013 a very high spectral resolution of 0.008 nm at 592 nm during all the night when the sky is clear. Spectrometric observations are still possible with thin cloud cover (e.g., when cirrus clouds are present). Total column IWV is obtained using a absolute optical absorption spectroscopic method as described in Sarkissian and Slusser (2009) adapted to this new spectrometer. H 2 O absorption cross section (σ H 2 O ), adapted to the instrument spectral resolution and to atmospheric profiles (pressure and temperature) specific to the OHP location, is correlated with the logarithm of the measured spectral intensity (note that the intensity measured by the spectrometer is not calibrated in term of absolute irradiance), which unit is then an optical thickness, τ H 2 O . The amount of water vapour molecules N H 2 O contained on the line-of-sight of the instrument is then provided in a straightforward manner using the definition of the optical thickness: Because of the very high resolution and the very high sensitivity of SOPHIE, line by line analysis is then possible but here we are making the analysis simultaneously on the two triplets of the H 2 O absorption cross section from 591.7 nm to 592.7 nm. Line-of-sight amounts of H 2 O are measured on individual spectra obtained when the source is close to the meridian (i.e. at at the highest possible elevation from the horizon as commonly made by astronomers). The air mass of the observation is then between 1 when the source is at zenith by definition and 2 when the source is 30 • above the horizon. The total vertical column amount of H 2 O (IWV) is then deducted dividing the line-of-sight amount by the air mass. The very high spectral resolution of the SOPHIE instrument requires that the H 2 O absorption cross section is computed with special care (in an initial computation, the cross section was underestimated by 25 %). For the analysis of the DE-MEVAP measurements, we recomputed the H 2 O absorption cross sections using several methods (NASA on line service, HITRAN online service, home-made computations using HI-TRAN recommendations). The shape and broadening of individual structures in the cross section did not change significantly in the spectral range of interest. Finally, we used the home-made cross sections computed using HITRAN recommendations. They achieved very good agreement with the other IWV measurements (Sect. 4.3).

SAOZ
SAOZ (Pommereau and Goutail, 1988) is a ground-based UV-visible spectrometer that measures the sunlight scattered from the zenith sky. It was specially designed to allow observations of stratospheric O 3 and NO 2 vertical column using the differential optical absorption spectroscopy (DOAS) method. The SAOZ instrument is working continuously at OHP since 1992. Line-of-sight values of individual observations are divided by the air-mass factor, the coefficient needed to obtain vertical column amounts. For geometrical reasons, most precise measurements of stratospheric constituents are performed twice a day during twilight (sunrise and sunset). Nevertheless, SAOZ observations during the day can be appropriated to measure tropospheric constituents like water vapour. Slant density columns (SDC) of water vapour are obtained from differential analysis in the 555-610 nm spectral band averaged over 1 min. This band is selected to avoid interferences with others constituents. Five iterations are done to obtain the H 2 O SDC. Multiple scattering due to presence of clouds enhances the differential signal and thus must be taken into account. The correction is done using a ratio between O 4 SDC of each observation and O 4 climatology on a clear day.
Since the spectrum is first divided by a reference spectrum and then differentiated, the amount of H 2 O present in the reference spectrum is added to the computed H 2 O SDC to obtain the final H 2 O SDC. This SDC is then divided by the H 2 O air mass factor (AMF) from Sarkissian model (Sarkissian et al., 1995) to obtain the final H 2 O vertical columns.
A first version of the data contained a large bias in the retrieved IWV values compared to GPS. It appeared that this bias was due to limitations in the HITRAN 2000 data and to the use of a single AMF independently of the measurements conditions (clear or cloudy skies). A new version used in this paper was thus produced. New water vapour cross sections were first computed using HITRAN 2012 which reduced the relative amplitude of the cross sections in the two bands of interest (570 and 590 nm) by ∼ 5 % and reduced also the error bars. As a second improvement, a correction factor depending on the cloud fraction was determined which was applied to the reference AMF. As a measure of the cloud fraction, we used a colour index (CI) computed from the ratio of the irradiance at 550 and 350 nm for each measured spectrum. Four groups of CI values were considered: (1) CI < 1.1 (clear sky), (2) 1.1 < CI < 1.9 (limited broken clouds), (3) 1.9 < CI < 2.35 (broken clouds), and (4) CI > 2.35 (overcast sky). The AMF correction factors were determined as the ratio between GPS IWV and uncorrected SAOZ IWV data in each of the four scenarios from a subset of data of the DEMEVAP campaign. We achieved the following correction factors. In clear sky, the reference AMF is used (no correction factor is applied). In limited broken clouds conditions, the reference AMF is reduced by 16 %. In case of broken clouds, the reference AMF is reduced by 30 %. Finally, in case of overcast sky, all the data are rejected because the H 2 O profile is usually perturbed by clouds located at various altitudes. The measurements are used up to 80 • solar zenith angle when CI < 0.8 and 64 • in all other cases. The reference AMF was computed data from a radiosonde profile measured on 14 September 2011.

AERONET sun photometer
The permanent instrumentation of the Gérard Mégie station at OHP includes a sun photometer from the AErosol RObotic NETwork (AERONET). This instrument observes solar radiation in various wavelengths in the visible and near infrared, Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/ including a water vapour line at 936 nm (Holben et al., 1998). It works during daytime only and data are screened for retaining mainly clear sky conditions. IWV is obtained using a differential absorption technique from the 936 nm line and nearby window wavelengths. The accuracy of the IWV measurements from this instrument were estimated to 5 % by comparison with microwave radiometer measurements (Halthore et al., 1997). Level 1.5 (cloud-screened), version 2, AERONET IWV data have been retrieved from the NASA archive (http:// aeronet.gsfc.nasa.gov/). The IWV data are reported every 15 min for elevation angles above 15 • .

Capacitive humidity sensors on 10 m masts
Two 10 m masts were installed by Meteo-France, CNRM, for the time of the DEMEVAP campaign. They were equipped with identical temperature sondes (Vaisala PT1000 class A Atexis) and capacitive humidity sensors (Vaisala HMP45). The sensors were both mounted on the top of the masts in Socrima shields to protect from direct sunlight. Pressure sensors (Vaisala PTB210) where located in a housing at the foot of the masts. The pressure, temperature and humidity measurements were logged with a 1 min interval. The two masts were located at 90 m (PTU1) and 180 m (PTU2) from the IGN-LATMOS Raman lidar with a difference in azimuth angle of nearly 90 • (Fig. 1).
The calibration of these sensors is done annually by the Laboratoire National de metrologie d'Essais (LNE), as a standard procedure followed for all meteorological sensors in use in Meteo-France. The accuracy of the measurements is ±0.15 hPa for pressure, ±0.2 • C for temperature, and ±4 %RH for humidity.

Dew-point humidity sensors
Two dew-point hygrometers were also operated during DE-MEVAP. The first one was a Meteolabor VTP-6 instrument, designed for out-door operations, and the other one was a General Eastern Optica instrument designed for laboratory calibration. The VTP-6 was installed close to the pressure sensor of PTU2, and provided continuous measurements of dew point temperature (TD) and air temperature (T ) with a 10 min interval. Relative humidity was derived afterwards from these two measurements in a similar way as for the Snow White hygrometer. The accuracy of this sensor is 0.15 • C in both temperatures, according to the manufacturer. The accuracy of the Optica system is ±0.20 • C in dew point and ±0.15 • C in air temperature. During the lidar and radiosonde operations, the Optica system was moved between PTU1 and PTU2 to check the consistency of the different TD and RH measurements which was usually at the 0.1-0.2 • C and 1-2 %RH level.

Comparison of surface meteorological measurements
The measurements from the 45 days of the campaign were compared for the different ground-based sensors. The mean and standard deviation of differences between PTU and VTP-6 measurements were < ±0.2 • C and < 1.1 • C both for T and TD, and < ±0.6 %RH and 2 %RH for relative humidity. With this level of accuracy, capacitive humidity sensors are expected to perform as well as dew-point hygrometers for the purpose of lidar calibration, for example. The nearsurface measurements from the radiosondes were also compared to the VTP-6 measurements. The mean (resp., standard deviation) of RH difference between the Vaisala RS92 radiosonde, Meteolabor Snow White, MODEM M2K2DC and M10 measurements at surface, compared to the VTP-6 RH measurements were 2.0 %RH (3.3 %RH), 0.3 %RH (6.6 %RH), 0.4 %RH (6.2 %RH), 1.1 %RH (4.4 %RH), respectively. For temperature, the differences between the four radiosonde measurements at surface and VTP-6 measurements were −0.23

Campaign overview
The DEMEVAP 2011 campaign covered a period of 45 days, between 6 September and 21 October 2011. During this period, the GPS receivers, the ground-based sensors, the SAOZ spectrometer and the AERONET sun photometer operated continuously. The lidars and radiosondes were operated only during night-time in clear sky conditions for 12 nights in September (12 September to 29 September) and 3 nights in October (17-21 October), plus two pre-campaign nights (6-7 September). A standard night of operation consisted in a sequence of at least two 60 min zenith-pointing observations with the IGN-LATMOS Raman lidar at the beginning of which a balloon carrying multiple radiosondes was launched by the Meteo-France and OAMP staff. In total, 25 balloons were launched at night-time and one during day (pre-campaign test), which carried 80 sondes, among which 25 Vaisala RS92, 24 MODEM M2K2DC, 24 MODEM M10, and 7 Meteolabor Snow White (only the night-time measurements are reported in Table 1). The campaign was interrupted on 30 September because the OHP had planned a practical training course for Master students for one week. Afterwards, the weather was not clear until 17 October. The variety of atmospheric conditions encountered during the period is reflected in the time series of IWV and surface parameters (Fig. 2). The measurements were interrupted at the passage of large-scale perturbations. Such situations were usually associated with peak values in IWV about 25 kg m −2 and large drops in pressure and temperature (Fig. 2). One exception was found on 25 September when humid air advection was not associated with a depression. On average, the atmospheric conditions were warmer and moister during the September period of observations than during October.

Raman lidar water vapour retrieval and calibration
Water vapour mixing ratio is traditionally determined from the ratio of Raman signals measured in the water vapour and nitrogen channels, according to the following equation (Whiteman et al., 1992): Where C lidar (z) is the overall lidar calibration function, and S x (z) and B x are the measured signal and background, respectively, for species x (water vapor or nitrogen) in number of photons/bin/shot. The standard procedure with the IGN-LATMOS Raman lidar consists in summing the signals into space-time bins of increasing size as a function of altitude in order to maintain a SNR of at least 15 in the N 2 measurements. This summing is intended to minimize biases induced by fluctuations in the denominator of Eq. (1) (Bosser et al., 2007). The minimal time bin is 5 min and the WVMR profiles are retrieved at 5 min intervals (hence introducing some correlation between profiles at upper levels). In the case of the OHP NDACC Raman lidar, a slightly different method for data selection is used, while a similar compromise is expected. An a priori analysis of the stationarity of the water vapour mixing ratio and cirrus clouds is performed to optimize the averaging period while the profile retrieval is not performed for integration periods smaller than 20 min (Hoareau et al., 2009). The minimal spatial resolution is 7.5 m for the IGN-LATMOS lidar while the minimum is 75 m for OHP NDACC lidar. For both systems the vertical resolution degrades rapidly with altitude up to a maximum of 750 m in the upper troposphere. The background estimates are computed from bins beyond the maximum range of the lidar (20-60 km) and summed over 5 min. The lidar calibration function is computed from the following equation: where r N 2 is the mass mixing ratio of nitrogen, M x the molar weight for the given molecules (water vapour or nitrogen), C x the instrumental efficiency (comprising the transmittance of all the optical components on the path and the quantum efficiency of detectors at the given wavelength), T (z, λ x ) the atmospheric transmittance from ground to distance z at wavelength λ x , and dσ x (z,λ x ) d the given Raman scattering cross section. Note that both the OHP and the IGN-LATMOS Raman lidars are fiber-coupled and are not limited by differences in the overlap functions applying to the H 2 O and N 2 channels. The altitude dependence of the Raman cross sections accounts for the temperature dependence of the intensity of the individual Raman spectrum lines. When narrow bandwidth interference filters are used, the change in Raman shift with air temperature produces a change in the measured signal intensity which can be misinterpreted as a change in air composition (Whiteman et al., 2006a). The net effect depends on the spectral shape of the filter and was estimated to +3 % (+13 %) between the surface and an altitude Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/ of 5 km (12 km) in the case of the IGN-LATMOS Raman lidar. This effect was corrected before calibration. In the case of the NDACC-OHP lidar system, this error was estimated to be smaller than 2 % (Sherlock et al., 1999b) and it is not corrected, but there are options available for implementing a routine correction (e.g. use of ECMWF model temperature profile). As seen from Eq. (2), the lidar calibration function has three uncertain contributors: instrumental efficiency, Raman backscattering cross sections, and atmospheric transmittance.
The uncertainty associated with these three terms is typically about 10 % for instrumental efficiency (Vaughan et al., 1988;Sherlock et al., 1999a;Tarniewicz et al., 2002), 10 % for Raman cross sections (Peney and Lapp, 1976), and 2-5 % for the differential atmospheric transmission depending on the aerosol content (Shettle and Fenn, 1979). Due to the large total uncertainty, it is standard to fit the calibration function by comparison with collocated/coincident water vapour measurements. In our case, we split the calibration function into an a priori estimate, C apriori lidar (z), which is computed from laboratory calibration of the optical components and standard atmosphere profiles, and a scale factor, f , such that C fitted lidar (z) = C apriori lidar (z) × f . During DEMEVAP, four Raman lidar calibration methods were compared: 1. Radiosonde profile matching: a scale factor f RS is estimated by a weighted least-squares method that minimizes the difference between the lidar WVMR profile (zenith pointing) and a reference radiosonde profile over a small layer. In the case of the IGN-LATMOS lidar, one calibration factor is estimated from each radiosonde and 20 min of lidar data (i.e. using four 5 min profiles) beginning just after the radiosonde launch. Several layers are tested.
2. Point humidity measurement matching from capacitive sensors: similar to the previous one except the scale factor f PTU is determined from slant measurements with the lidar pointing to one and the other 10 m masts. This method could only be tested with the IGN-LATMOS scanning Raman lidar. One calibration factor is estimated for each slant lidar profile (integrated over 5 min) and the nearest-in-time PTU measurement (averaged over 10 min). Because of the presence of electrical interference in the short-range raw lidar measurements (until ∼ 120 m), the lidar data at the exact range of PTU1 (90 m) could not be used. Instead, the median measurement over 32 bins was used. The same procedure was adopted for PTU2 though less interference was observed at the exact distance (180 m).
3. IWV matching: a scale factor f IWV is estimated by comparing IWV estimates from GPS or radiosondes (IGN-LATMOS lidar) or SOPHIE spectrometer (NDACC-OHP lidar) to the Raman lidar WVMR profiles integrated over the portion of atmosphere were these measurements are reliable and completed below with radiosonde measurements and above with a standard atmospheric profile. This method is the standard method used for the NDACC-OHP lidar that is also performed during this campaign using radiosondes from Nîmes (Fig. 3).

GPS-lidar coupling:
a scale factor f GPS is estimated during GPS data processing where zenith wet delay (ZWD) estimates computed from the zenith pointing Raman lidar measurements are used instead of a priori ZWD values and no ZWDs are estimated from the GPS data contrary to the standard procedure (Sect. 3.3). This method has been shown to improve the GPS vertical positioning thanks to the use of local atmospheric profile measurements instead of a smooth mapping function extracted from a global meteorological model or climatology (Bosser et al., 2010). The fitted scale factor is thus expected to be more accurate (in terms of % error) than the ZWD or IWV values determined from standard GPS processing. However, only zenith pointing water vapour profiles from the lidar are used in this method. During DEMEVAP, zenith pointing was not performed continuously over the nights because of the slant pointing needed to test method #2. Hence the results from this method were slightly noisier than during the VAPIC campaign (Bosser et al., 2010).
The expected accuracy of these methods is limited by the accuracy of the calibration data (5 % for radiosonde data, 2 % for PTU data, and 2-5 % for GPS data) and by systematic and random errors in the lidar measurements.

Radiosonde data processing and humidity conversions
In this study we used Vaisala RS92 data files delivered in "fledt" format and MODEM M2K2DC and M10 data in ".cor" format. The Vaisala measurements were available at 2 s (∼ 10 m) resolution and the MODEM measurements at 1 s (∼ 5 m). The P , T and RH measurements contained in these data files were taken as the best estimates as computed with the standard manufacturer's software. Only the Snow White data were reprocessed because the conversion of mirror temperature to RH depends on whether the condensate on the mirror is water or ice. Detection of the change from dew to frost is not automatic and requires a careful visual inspection of the measurements as described, e.g. by Fujiwara et al. (2003). We used the Snow White housekeeping data (Peltier current and phototransistor voltage) to detect layers of discontinuity in condensate reflectivity and we compared the SW RH values for dew and frost to the RH values reported by the RS92 radiosonde onboard the same balloon. This procedure allowed determining quite unambiguously the altitude of change from dew to frost on the SW mirror for each of the 7 flights. The RH values were computed with respect to liquid water since this is the standard in meteorology. We used saturation vapour pressure formulas recommended in the WMO, CIMO Guide (2008). These are slightly different than those used by Vaisala and Meteolabor in their station software but the largest differences that would be observed at −60 • C are smaller than 2 % (Nash et al., 2011). MODEM uses a formula that does closely match the CIMO 2008 formula. The radiosonde measurements were screened automatically to retain only points with strictly increasing altitudes. This procedure rejected less than 1 % of the MODEM measurements and 5 % of the SW measurements (this difference is probably due to the fact that the SW had a longer wire which could oscillate). Afterwards, all the profiles were checked manually by visual inspection and the upper portions of some profiles were removed when anomalous drifts or noise were detected.
The radiosonde IWV contents were computed after the RH measurements were converted into specific humidity and integrated over pressure. Since all the radiosonde measurements were available at high resolution and had valid data from the surface up to the tropopause (10-12 km), no correction for missing data at the bottom or top of the profile had to be applied. Here, the numerical error in the IWV computation and representativeness error in the IWV intercomparison from distant instruments encountered in previous studies (Bock et al., 2005 are expected to be negligible.

GPS data processing
The GPS data were processed with the GIPSY/OASIS II v 6.0.2 software in Precise Point Positioning (PPP) mode (Zumberge et al., 1997). Phase measurements are decimated to a 5 min interval and data are analysed in a 30 h window centred on each day from which the 00:00-24:00 UTC parameters are extracted to avoid edge effects. We apply the IERS2010 recommendations for solid earth tides model (Petit and Luzum, 2010) and FES2004 model for the ocean tide loading effect (Lyard et al., 2006). Absolute antenna models (from igs.atx) are used for transmitters (satellites) and receivers. Phase ambiguities are fixed using the "ambigon" algorithm (Bertiger et al., 2010). The cut-off angle is fixed to 7 • without down-weighting of low-elevation observations and the uncertainty in phase observations is fixed to 10 mm. One set of station coordinates is estimated for each session day. The ZWD parameters and horizontal gradients are modelled as random walk processes with a 5 min time resolution. In our standard solution, a priori zenith delays for hydrostatic and wet components are computed from surface measurements (PTU1 sensor) and mapping functions from VMF1 are used (Boehm et al., 2006b). The random walk parameters for stochastic estimation of ZWD and gradient parameters are fixed to 5 mm h −1/2 and 0.5 mm h −1/2 , respectively. Several other choices of a priori models, mapping functions and random walk parameters were tested. None of these tests produced changes in estimated ZWD parameters exceeding ±2.5 mm. The mean and standard deviations of differences with respect to the standard solution, computed over the 45 days of the campaign and for the five stations, did not exceed ±0.5 mm and 1 mm, respectively. These tests give thus an estimate of the uncertainty associated with the processing procedure of GPS IWV measurements of < ±0.1 kg m −2 for the bias and < ±0.2 kg m −2 for the standard deviation, or a 2.5-σ interval of ±0.5 kg m −2 .
The validity of the estimated ZWD parameters for each GPS receiver was checked by visually inspecting the time series of formal error provided by the software and by comparing the ZWD estimates between receivers. The mean formal errors were 1.6 mm for the four GB1000 stations and 1.8 mm for the NetR9 station. No suspect data could be detected in any of the five GPS series. The two by two comparisons showed that ZWD estimates from the GB1000 receivers agreed with a mean difference smaller than 1 mm and a standard deviation smaller than 1.6 mm. The scatter between the two receiver/antenna types was slightly larger, with a standard deviation of 2.9 mm (∼ 0.5 kg m −2 ). The comparison of ZWD estimates from the stations with the microwave absorbers and stations without the absorbers did not reveal any significant impact (< 0.3 kg m −2 ). The mean ZWD differences before and after installation did not show a consistent effect between the two GPS stations compared to the other GPS stations. However, some differences were found in the post-fit phase residuals, consistently with results from Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/ Elosegui et al. (1995). This point needs further investigation to draw clear conclusions.
The estimated ZWD parameters were finally combined with their a priori zenith hydrostatic delays (ZHDs) to form the total zenith tropospheric delay (ZTD) which was afterward converted into the "true ZWD" by subtracting the "true ZHD" computed from surface pressure measurements and the formula of Davis et al. (1985). These ZWD estimates were converted into IWV using an estimate for the mean tropospheric temperature, T m (Bevis et al., 1992). Following the methodology described by Bock et al. (2008), we tested several estimates for T m : (1) the formula of Bevis et al. (1992) combined with surface temperature measurements from PTU1 sensor; (2) the operational product from Technical University of Vienna using ECMWF analyses (http: //ggosatm.hg.tuwien.ac.at/DELAY/ETC/TMEAN/); (3) the direct computation using temperature and humidity profiles from RS92 measurements as a reference. The root mean square difference between Bevis formula and radiosonde integration was 4.4 • C, whereas it was 1.8 • C between ECMWF analyses and radiosonde integration. As already reported by Bock et al. (2008) for Africa, the mean diurnal cycle of the Bevis formula is strongly influenced by surface temperature. For these reasons, we used the T m estimates produced from ECMWF analyses.

Comparison of calibration methods with the IGN-LATMOS Raman lidar
The four methods for calibrating WVMR lidar profiles (see Sect. 3.1) are intercompared here using data from the IGN-LATMOS Raman lidar. Only Vaisala RS92 measurements are used for the methods relying on radiosonde data. The measurements are limited to the period 12 September-20 October: they exclude the first and last days because not all instruments were fully operational (e.g. GPS receivers). Figure 4a shows a summary of results, with one example for each method except method #3 where two IWV references are tested (GPS and radiosonde). All five comparisons yield calibration factors around 1, revealing that the a priori estimate for the calibration coefficient is very realistic. However, there is a quite large degree of scatter in the results, both for a given method and between methods. The mean values range between 0.942 (GPS-lidar coupling) and 1.010 (PTU2): disagreement is about 7 % on the calibration coefficient. The standard deviations of the values range between 0.040 (∼ 4.0 %) for IWV-GPS and 0.057 (∼ 5.7 %) for RS. The RS results for a layer and for IWV are fairly consistent (mean values of 1.000 and 0.977 and standard deviations of 5.3 and 5.7 %). This is true also for the GPS results (mean values differ by < 1 % and standard deviations are 4.7 and 4.8 %). However, the standard deviations found from these data sets are quite large. This is partly due to a drift that is clearly visible in Fig. 4a. All five comparisons reveal this drift which is actually contained in the lidar measurements. However, there is some uncertainty in the magnitude of the drift between methods (see the linear fit equations in Fig. 4a). The values over the 45 days of the campaign range between −11 % and −18 % (slope value ×45). The N 2 calibration measurements in Fig. 4b show a similar overall drift as well as short-term fluctuations that are highly correlated with the calibration factor variations. As discussed in Sect. 2.1.2, most of the variations in the calibration factor result from changes in the optical alignments and are of similar magnitude as seen in other Raman lidars (Brocard et al., 2012). Once the estimated calibration factors are corrected for a linear trend, the standard deviation is drastically decreased (by a factor of ∼ 2). The best fit is then obtained for IWV-GPS (2.5 %) and the worst for PTU2 (3.9 %). The calibration using IWV-GPS data is now proposed as the standard method for deriving accurate lidar profiles independent from radiosondes.  of the four methods. The radiosonde matching processed in a 1 km layer above the former one shows a decrease of 2.3 % in the mean calibration factor and nearly a doubling of the standard deviation. This is explained by the larger errors in the lidar data at higher altitudes. The PTU matching using PTU1 instead of PTU2 shows a small change in mean value and a slightly larger standard deviation because of the interference in the lidar measurements at shorter range mentioned in Sect. 3.1. The two IWV-RS solutions (IWV RS "daily" and IWV RS "all") differ in the number of points. The first solution provides a mean calibration factor for each night (i.e., combining the measurements from the one or two soundings each night). The second solution provides the calibration factors for all individual soundings. They lead to 13 and 22 calibration factor estimates but the results are very consistent. In the last two methods, results are presented for all five GPS instruments and an additional row gives the mean values. Evidently, there is some scatter between the GPS solutions. With method #3, the values estimated from GPS No5 are slightly noisier than from the other four GPS instruments. However, with method #4 it is the reverse. Overall, method #3 (IWV_GPS) is slightly more robust than the method #4 (GPS-lidar coupling) with a 2.0 % post-fit standard deviation compared to 3.1 %.
In conclusion, this comparison shows that a lidar calibration factor with a relative accuracy of 2-3 % (1-standard deviation) can be achieved with three methods: radiosonde matching, either on a low layer or over the total column (IWV); IWV GPS matching; and the GPS-lidar coupling method developed by Bosser et al. (2010). However, the uncertainty in the calibration factor remains undetermined in a 7 % range (comparing, e.g., the mean values achieved by method #4 and method #2). Figure 5 shows the mean WVMR profiles and the temporal variability observed from IGN-LATMOS lidar and Vaisala RS92 measurements during the DEMEVAP campaign as well as their differences. The lidar profiles used in this comparison were integrated over 20 min which is roughly the time the sounding balloons needs to rise up to 7 km. The mean profiles and temporal variability agree very well in the Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/  lower troposphere. This was expected since the lidar measurements are here calibrated on the RS92 measurements between 1.0 and 2.0 km (these altitudes correspond to the 0.3-1.3 km layer above ground level in Table 2). The mean (fractional) WVMR differences lay in the range −0.15 to +0.25 g kg −1 (±5 %) from 1 to 3 km. Between 3 and 8 km, the WVMR differences remain smaller than ±0.1 g kg −1 but the fractional differences increase up to 60 %. Above 8 km, the lidar measurements become unreliable. The verticalmean bias is very small as attested by Table 3 (0.04 g kg −1 or 2.2 %) but the vertical-mean standard deviation is quite high (13 %) due to the rapid increase of lidar errors with altitude. Figure 6 shows a similar comparison with Snow White measurements. This is an independent comparison since the SW measurements are not used in the lidar calibration. The results are very consistent with the RS92 comparison. There is very good agreement between the mean profiles, though the lidar seems to have a small dry bias in the lowest 1 km and a moist bias between 2.5 and 7.0 km. These biases cancel each other in the vertical-mean (−0.01 g kg −1 or −0.6 %). As with the RS92 comparison, the vertical-mean standard deviation is quite high (12 %). In terms of IWV, the agreement between the lidar measurements and the two radiosonde types is very good ( Table 3). The mean difference is about ±0.2 kg m −2 (< ±2 %) and the standard deviation of differences is about 0.3-0.4 kg m −2 (2-3 %).

Comparison of WVMR profiles
The small and consistent biases for both radiosonde comparisons indicate that the lidar calibration is efficient and may provide IWV measurements with an accuracy of ±3 % with a single profile (20 min of lidar measurements). The same comparison done with uncalibrated lidar measurements yields quite similar IWV biases as those reported in Table 3 (this is due to the fact that the mean calibration factor used here is f RS = 1.000, see Table 2) but with a marked drift and the standard deviation of differences with respect to RS92 and SW are nearly doubled. Figure 7 shows the WVMR profile comparison between both Raman lidars and radiosondes on 21 October 2011. This is the only case where we had valuable measurements from both lidars. Unfortunately, the atmospheric conditions were not very clear and clouds arrived around 20:30 UTC. The NDACC-OHP lidar measurements were integrated between 20:12 and 20:44 UTC, the IGN-LATMOS measurements between 20:13 and 20:46 UTC and the sounding balloon was launched at 20:24 UTC. Two profile reconstruction methods are compared for the OHP lidar: RP1 uses Nîmes radiosonde data only to fit the lidar constant and RP2 corrects the RP1 solution to fit the IWV measurements from SOPHIE spectrometer at the time of lidar measurement. It is seen from Fig. 7 that the instruments yield consistent measurements of the moist atmospheric boundary layer (0.7-3.0 km altitudes) but disagree somewhat in the dry layer just above (3-4 km). The OHP lidar measurements follow quite closely the radiosonde measurements over the full altitude range (the RP2 profile is within ±20 % of Vaisala RS92), but the IGN lidar measurements are very noisy above 3 km. A bias between the two soundings is evident also, with the M2K2DC Table 3. Statistical results of comparison of WVMR measurement between IGN-LATMOS lidar and radiosondes (RS92 and Snow White). Mean difference (bias) and standard deviation of difference (std. diff.) are temporal statistics computed over NP profiles. The IWV values for this comparison are computed over the common lidar and radiosonde altitudes (0.85-10 km).

Vertical-mean
Vertical-mean WVMR bias WVMR std. diff. IWV bias IWV std. diff. NP IGN lidar -RS92 0.04 g kg −1 (2.2 %) 0.21 g kg −1 (13 %) 0.21 kg m −2 (1.6 %) 0.30 kg m −2 (2.2 %) 24 IGN lidar -SW −0.01 g kg −1 (−0.6 %) 0.19 g kg −1 (12 %) −0.15 kg m −2 (−1.1 %) 0.41 kg m −2 (3.1 %) 7 measurements being moister than the RS92, especially in the upper troposphere. Table 4 provide more quantitative results. The comparison of WVMR measurements from the lidars to the RS92 show that the best agreement is found with OHP lidar and RP2 calibration method (mean difference of 0.3 % and standard deviation of difference of 11 %). In terms of IWV, the results are similar: the OHP/RP2 method shows a better agreement with RS92 measurements than OHP/RP1 on the total column IWV (−2.6 % vs. 12.4 %). The small bias in the IWV measurement from OHP/RP2 (−2.6 % on total column or −6.1 % over lidar range) might be due to the extension downward of the profiles with the Nîmes (M2K2DC) radiosonde that is located 80 km away from OHP. The contribution to IWV is important in the first kilometres where we can notice the largest differences with the RS92 measurements. The results with the IGN lidar might not be very accurate for this comparison due to the large noise observed in this particular profile (Fig. 7). Figure 8 shows the mean temperature and humidity profiles from M2K2DC, M10, and RS92 observed during the DEMEVAP campaign (22 flights) and Fig. 9 shows a similar comparison including the Snow White measurements (7 flights). It is seen that the atmosphere was relatively dry on average during the radiosonde flights. Between the surface and 2 km, the air was moderately moist, with a mean RH about 55 %. Between 3.0 and 6.0 km, a very dry air layer is observed, with a mean RH below 20 %. It is topped with a slightly moister layer, between 7.0 and 12 km where RH was about 30 % (note that when RH is expressed with respect to ice, the latter layer has a RH of about 50 %). Above 13 km altitude, the RH from capacitive sensors drops to zero for M2K2DC and M10 and to 1 %RH for RS92 (the latter is not zero, probably because of a software offset). The capacitive humidity sensors are actually no longer responding at these low temperatures and relative humidities . The SW measurements, on the other hand, show a nearly constant WVMR of 0.006-0.008 g kg −1 (10-13 ppmv). These values overestimate by a factor of 3-4 the typical WVMR profile observed at OHP with a tunable diode laser spectrometer (Durry and Pouchet, 2001). Figure 10 shows the mean differences compared to RS92. The temperature profiles from RS92, M10, and SW agree within ±0.3 • C throughout the troposphere, but the M2K2DC shows a 0.4-0.6 • C bias. The humidity measurements show quite large biases, with both MODEM radiosondes too dry compared to the Vaisala RS92, throughout the whole troposphere and into the tropopause. The M2K2DC has a dry bias of −5 %RH up to 6 km which decreases above, and the M10 has a dry bias of −6.5 %RH up to 9 km which increases to 10 %RH at 10 km altitude and decreases above. In terms of relative difference, the values are even larger, with average bias of −19 % and −33 % between the surface and 12 km for M2K2DC and M10, respectively. The SW and Atmos. Meas. Tech., 6, 2777-2802, 2013 www.atmos-meas-tech.net/6/2777/2013/  RS92 humidity measurements agree fairly well up to 7 km altitude, with a small bias (2 %RH) between the surface and 3 km. Above 7 km, the sensitivity of the RS92 humidity sensor is decaying (Miloshevich et al., 2009) which produces a slight dry bias in these measurements of 10 % between 8 and 12 km. This bias increases rapidly above when the capacitive sensor no longer respond to the actual humidity variations.

Lower RH limit of radiosonde measurements
A number of studies documented the lower detection limit of Vaisala RS92 capacitive humidity sensors and Snow White chilled-mirror hygrometer (Vömel et al., 2003;Fujiwara et al., 2003;Vaughan et al., 2005;Verver et al., 2006). These studies were mainly focused on the upper troposphere. The data from DEMEVAP campaign evidence similar problems in the middle and lower troposphere. Figure 11 shows  ments and 14 % according to SW measurements. This sudden drop in humidity is also reflected in a large dew point depression of 26 • C in the SW measurements. The inspection of SW Peltier current reveals that it is reaching its maximum when the sonde crosses this dry layer. This confirms that the SW is responding properly to the drop in humidity but it may not reach a sufficiently low temperature such as to maintain the condensate on the mirror (Note that at these mirror temperatures, the condensate is likely dew, not frost). The phototransistor voltage shows actually a peak in mirror reflectivity which confirms the loss of condensate. Above the dry layer, the RH increases abruptly to 30 % and peaks at 50 % at 4.0 km. This rapid increase in available water vapour seems to allow the SW to regain normal functioning as attested by the good agreement with the RS92 measurements, except between 6.0 and 6.3 km where the RS92 measurements reveal a spike in RH topping a shallow dry layer (5.7 and 6.0 km). Surprisingly, the SW measurements are consistent with the RS92 within this dry layer but miss the spike in RH. Saturation of the Peltier current and a peak in phototransistor voltage suggest that the humidity in the spike may have been used to rebuild the layer of condensate on the mirror, at constant mirror temperature, instead of leading to an increase of mirror temperature at constant condensate thickness. Saturation of the Peltier current over extended atmospheric layers was also observed in the SW measurements from 21 September and 17 October. These phenomena are very similar to those reported by Vömel et al. (2003) and Vaughan et al. (2005). Figure 12 presents the case from 21 September 2011. The vertical profile of humidity resembles that of the 15 September but the dry layer is drier and deeper, and extends from 2.0 to 3.7 km. Humidity drops to 3 %RH according to RS92 and 6 %RH according to SW. The lower troposphere is also drier (< 45 %RH). Again, the Peltier current is saturating and the phototransistor voltage is peaking (Note that the maximum values are different from that of the 15 September possibly because the sonde version was different). The dew point depression reaches 36 • C around 2.7 km which is about the maximum depression reported by Vömel et al. (2003), for the Snow White chilled mirror. Contrary to what we observed in the preceding case, the SW measurements exhibit a moist bias compared to RS92 throughout the whole troposphere, and not just in the dry layer, except in a moist layer between 3.7 and 4.5 km. Here, we cannot simply suspect the SW but also need to consider a possible dry bias in the RS92 measurements. Figures 13 and 14 compare the WVMR measurements with all the instruments available for these two cases. On 15 September 2011, the moist bias in the SW measurements  in the 2.4-3.4 km dry layer and the dry bias of the SW measurements around 6.0 km are confirmed with the IGN-LATMOS lidar measurements. On 21 September 2011, the result is different and the lidar profile closely matches the SW profile above 2.0 km. In the dry layer between 2.0 and 3.7 km, despite the Peltier current and the phototransistor voltage were suggesting that the SW has reached its lower RH limit (∼ 6 %), the WVMR measurements seem reliable. Compared to the SW and Raman lidar, the RS92 measurements show thus a mean dry bias of 5 %RH throughout the troposphere for this particular sounding. The origin of this bias is not clear for the moment but it is consistent with the dry bias reported by Yoneyama et al. (2008) during night. For both soundings, the measurements from the two MODEM radiosondes show dry biases, especially in the dry layers discussed above. However, the more recent M10 sonde behaves slightly better than the older M2K2DC.
Our results show that, in these two cases, the Raman lidar measurements are more reliable than radiosondes in detecting dry layers in the lower and middle troposphere (0-6 km a.g.l.), consistently with the lidar's vertical resolution which is degrading with altitude. Figure 15 shows the time series of IWV measured by all the instruments. The IWV shows large variations over the period of the experiment, with values ranging between 3 kg m −2 and 30 kg m −2 . GPS is the only technique considered here that provides measurements both during daytime and night and it is taken as a reference. The radiosondes were operated during night only, jointly with the Raman lidars, and IWV for these instruments was determined from profile integration. Nighttime comparisons also include SOPHIE spectrometer data. Daytime comparisons include AERONET sun photometer and SAOZ. Overall, all the instruments agree fairly well in depicting the time variations of IWV during the campaign. We should emphasize that some differences might be observed because measurements for these instruments are not taken in the same volume of atmosphere: GPS is measuring permanently and integrating fields of view over nearly all the hemisphere, RS is measuring over the path of the balloon, AERONET takes daytime measurements in the direction of the sun (i.e. varying from east to west during the day), SAOZ takes scattered sunlight measurements at the zenith, and SO-PHIE measures at night-time in the direction of the selected stars, which is usually toward the South.

IWV intercomparison
To investigate the differences in more detail, Fig. 16 shows two by two comparisons where GPS IWV from station OHP1 is taken as a common reference and Table 5 reports a summary of statistics of these comparisons. First, it should be noticed that the nights when the lidars and radiosondes were operated were relatively dry, with IWV ranging between 7 and 24 kg m −2 , and the subset of measurements with SOPHIE was even drier with IWV ranging between 5 and 15 kg m −2 (mean IWV = 9.53 kg m −2 ). Overall, we find a very good degree of correlation between GPS IWV and the other instruments with 7 out of 8 correlation coefficients > 0.93 and 5 out of 8 correlation coefficients > 0.98. The scale factors for 7 out of 8 comparisons and the slope parameters for 6 out of 8 comparisons are within 1.00 ± 0.07. The biases for 5 out of 8 comparisons are smaller than ±4 %. The standard deviations for 6 out of 8 comparisons are smaller than ±7 %.
The RS92 measurements correlate to better than 0.99 with GPS, but exhibit a small moist bias during night of 0.56 kg m −2 (3.4 %), consistently with the findings of Cady-Pereira et al. (2008) and Bock and Nuret (2009). However, the origin of this bias is not explained so far and its existence is not yet unanimously recognized. The Snow White measurements present the best correlation with GPS data (better than 0.995), but show a slight moist bias, comparable to RS92 and consistent with the lower RH limitation problem discussed in Sect. 4.2. The two MODEM radiosondes show large dry biases consistent with the dry bias seen in the mean profiles (Fig. 8). However, this bias is not consistent with the results found for the nearby radiosonde station of Nîmes, France, from an independent GPS -radiosonde comparison study based on M2K2DC measurements (Poujol, personal communication, 2011). This point needs further investigation with the operators and with the manufacturer. The IGN-LATMOS Raman lidar IWV measurements show a very small bias (1.2 %), a moderate standard deviation (5.5 %), and a quite high correlation (0.98) compared to the GPS measurements. This comparison was made with 5 min sampling (i.e. with a relatively large noise in the lidar measurements). The differences also show some signal at the scale of the observing sessions (see the sine-like undulations around the linear fit line in Fig. 16). These spurious fluctuations may be due to small drifts in the lidar calibration during the observing sessions or to the rescaling of the fractional IWV measured by the lidar (0.2-8 km) to the total column. However, in the end, the IGN-LATMOS lidar -GPS IWV comparison achieves the smallest RMS difference (0.83 kg m −2 or 5.6 %). The measurements from SOPHIE (night-time) are  very good agreement with GPS in terms of mean IWV (mean difference of 1 % and scale factor of 0.995) but a relatively large scatter (standard deviation of difference of 11 %). However, the period of comparison is very short (4 nights) and these results should not be taken as general conclusions on the accuracy of this technique. The AERONET daytime measurements are in excellent agreement with the GPS measurements. This finding is consistent with other comparisons performed in contrasted climates (Bokoye et al., 2003;Schneider et al., 2010). The SAOZ daytime measurements are also in very good agreement with GPS IWV (mean difference < 3 % and standard deviation of difference of 6.5 %), except for the scale factor of 0.959. To our knowledge, this is the first time IWV measurements derived from a ground-based DOAS instrument are compared to GPS and this technique looks very promising. Finally, the significance of the results obtained for each instrument by comparison with one particular GPS solution is investigated with respect to the dispersion between the GPS measurements. Figure 17 shows the main statistical parameters of the comparisons with each of the five GPS solutions. The mean bias variations lay within < 0.3 kg m −2 (3 %) and the standard deviation of differences are all within 0.2 kg m −2 (2 %). These variations are thus small enough compared to the mean values to conclude on their significance. In a similar way, the correlation coefficients and scale factors change by less than 0.01 and 0.02, respectively, except for comparison between SOPHIE and GPS station OHP5 where the numbers are 0.04 and 0.03, respectively. This GPS station shows slightly different values in all the comparisons, compared to the four other GPS stations. As already mentioned in Sect. 3.3, the two types of GPS receivers and antennas seem to behave slightly differently, though the scatter between the IWV measurements remains at a very acceptable level. The ratio of the RMS difference from OHP5 over the mean of OHP1 to OHP4 is 0. from the GPS-RS92 comparison (night-time), 1.14 from the IGN-LATMOS lidar comparison (night-time) and 1.033 from the GPS-AERONET comparison (daytime). There is also no indication of a day-night bias in the GPS measurements from any of the two receiver/antenna types.

Discussion
Several environmental research fields rely either on the monitoring of water vapour in the atmosphere (e.g. climate research, atmospheric process studies) or on the calibration of the effect of water vapour molecules on the propagation of satellite signals in the atmosphere (e.g. satellite altimetry, geodesy and astronomy in the microwave frequency domain). Calibration of operational meteorological observing systems is also an important task for national weather services in order to guarantee that high quality observations are assimilated into numerical weather prediction models. Besides the traditional use of operational radiosondes, Raman lidars and GPS are two techniques that have been particularly developed in the recent years to address the measurement needs in these fields. However, the choice between one and the other technique depends on the application. Also, the ultimate requirements the techniques should meet is not clearly established. Hence, high long-term stability would be a primary requirement for climate research, whereas high ac-curacy in instantaneous measurements would be more important for calibration/validation purposes. Depending on whether high vertical resolution information is required or integrated contents are sufficient may also be directed either toward the use of GPS or Raman lidar measurements. It is thus of crucial importance to carefully characterize both the long-term stability and the short-term accuracy of these two techniques. This study addressed these questions.
Calibration issues of Raman lidar measurements have motivated a lot of research since the technique was invented (e.g., Vaughan et al., 1988;Sherlock et al., 1999a;Leblanc and McDermid, 2008;Whiteman et al., 2012). The use of external humidity reference measurements is a traditional approach which is inherently limited by the accuracy of the reference measurements. Hence it is important to simultaneously assess the quality of various candidate reference techniques and improve the calibration algorithms and methods. In this study, we compared four calibration methods and used three types of reference data (radiosonde data, ground-based capacitive humidity measurements, and GPS IWV and phase measurements) with redundancy in the data (four radiosonde systems, two capacitive humidity sensors and five GPS stations). Ground-based dew point measurements and upper air dew/frost point measurements provided by chilled-mirror hygrometers were used for validation. This type of hygrometers is classified as a field calibration standard by WMO (CIMO Guide, 2008). We used here these measurements to assess the absolute accuracy of our calibrated lidar measurements. Lidar calibration coefficients determined from Vaisala RS92 radiosonde measurements achieved a repeatability (shortterm stability or variability) of 2-3 % after a drift over the period of the experiment (1.5 months) was subtracted. A consistent repeatability was also found using GPS IWV as a calibration measurement or the GPS-lidar coupled data processing method developed by Bosser et al. (2010). Thanks to the pointing capability of the IGN-LATMOS scanning lidar, calibration from capacitive humidity measurements provided by ground-based sensors located at 90 and 180 m from the lidar could also be tested. Unfortunately, interference in the shortrange lidar signals limited the quality of the results achievable with this method to 4-5 % only. We observed that the mean calibration coefficient changed by up to 7 % depending on the method and reference data. However, the comparison to Snow White chilled-mirror measurements demonstrated a RMS difference of 3-4 % IWV. We expect that even higher accuracy could be demonstrated from the calibration with ground-based capacitive and dew point humidity measurements if the short-range interference problems in the lidar signals are mitigated. We thus fully comply with the concluding statements made by Revercomb et al. (2003) that a scanning Raman lidar might be an efficient instrument to transfer calibration between a ground-based reference and upper air observing techniques, primarily at night. Ongoing research in this field is expected to further improve the technique both at the level of the instrumentation and data analysis algorithms (e.g., Hoareau et al., 2012). For example, several lidar systems used in the NDACC have been shown to possess a wet bias which was attributed to fluorescence signals of several origins (deposits of insects, airborne pollens, degradation of hardware, etc.; Whiteman et al., 2012).
Many applications rely on the use of IWV measurements as a reference for calibration or validation. This is the case for the calibration of many Raman lidars in the NDACC network (e.g. NDACC-OHP Raman lidar uses SOPHIE spectrometer measurements; Hoareau et al., 2009) or for the validation of operational satellite measurements, radiosondes and NWP models (e.g., Bock and Nuret, 2009). GPS is evidently the most convenient and most widely used technique since it is easy to deploy and maintain, and operates both at daytime and night. Though, the accuracy of GPS IWV is not well known.
In this study we investigated some of the error sources inherent in the GPS measurements. Tropospheric modelling and multipath did not appear as being major error sources in the data that we analysed (Sect. 3.3). The use of microwave absorbers did not change significantly (< 0.3 kg m −2 ) the 5 min IWV estimates. The uncertainty associated with the use of different types of instruments and the variability from instrument to instrument of similar type were assessed by GPS to GPS comparisons and by comparing GPS to other instruments. The GPS to GPS differences were about ±0.5 kg m −2 which are very likely due to the difference in the antenna types and antenna models used for the raw GPS data processing. The accurate calibration of GPS antennas is probably the main source of systematic errors in the GPS measurements at present. The GPS to other instruments show much larger scatter. There is clearly a difference in the bias uncertainty between GPS and radiosonde measurements (−12.3 % to +6.6 %) and GPS and other remote sensing techniques (−2.9 % to +1.2 %) as attested by Fig. 17 and Table 5. This study shows that GPS is in very good agreement with the other remote sensing techniques (Raman lidar, sun photometer, DOAS and stellar spectrometer). It was shown that the Snow White chilled-mirror measurements can exhibit a moist bias in very dry air which is a least partly imputable to a limitation in the Peltier device. To overcome this kind of limitation, cryogenic frost-point hygrometers were recommended from past experiments (Vömel et al., 2007a;Leblanc et al., 2011). The comparison with GPS measurements at daytime and night also poses the question of the diurnal variations of GPS errors. This study shows that mean biases in the range −0.4 to 0 kg m −2 (−2.9 to 0 %) could be achieved from daytime comparisons (GPS vs. AERONET and SAOZ) and −0.1 to +0.18 kg m −2 (−1.1 to +1.2 %) from night comparisons (IGN lidar and SOPHIE). These results are consistent with those reported by Guerova et al. (2005), who showed that GPS and microwave radiometers usually have constant bias through day and night.
The main further perspective of this work is to reprocess and homogenize the IWV estimates from those remote sensing techniques that possess long-term databases for the study of climate trends and variability. Among these techniques, the astronomical spectrometers databases are able to retrieve historical values of H 2 O starting at the beginning of the 20th century.