Retrieval of an ice water path over the ocean from ISMAR and MARSS millimeter and submillimeter brightness temperatures

A neural-network-based retrieval method to determine the snow ice water path (SIWP), liquid water path (LWP), and integrated water vapor (IWV) from millimeter and submillimeter brightness temperatures, measured by using airborne radiometers (ISMAR and MARSS), is presented. The neural networks were trained by using atmospheric profiles from the ICON numerical weather prediction (NWP) model and by radiative transfer simulations using the Atmospheric Radiative Transfer Simulator (ARTS). The basic performance of the retrieval method was analyzed in terms of offset (bias) and the median fractional error (MFE), and the benefit of using submillimeter channels was studied in comparison to pure microwave retrievals. The retrieval is offset-free for SIWP > 0.01kg m−2, LWP > 0.1kg m−2, and IWV> 3kg m−2. The MFE of SIWP decreases from 100% at SIWP= 0.01kg m−2 to 20% at SIWP= 1kg m−2 and the MFE of LWP from 100% at LWP = 0.05kg m−2 to 30% at LWP= 1kg m−2. The MFE of IWV for IWV> 3kg m−2 is 5 to 8 %. The SIWP retrieval strongly benefits from submillimeter channels, which reduce the MFE by a factor of 2, compared to pure microwave retrievals. The IWV and the LWP retrievals also benefit from submillimeter channels, albeit to a lesser degree. The retrieval was applied to ISMAR and MARSS brightness temperatures from FAAM flight B897 on 18 March 2015 of a precipitating frontal system west of the coast of Iceland. Considering the given uncertainties, the retrieval is in reasonable agreement with the SIWP, LWP, and IWV values simulated by the ICON NWP model for that flight. A comparison of the retrieved IWV with IWV from 12 dropsonde measurements shows an offset of 0.5kg m−2 and an RMS difference of 0.8kg m−2, showing that the retrieval of IWV is highly effective even under cloudy conditions.


Introduction
Ice clouds are in an ongoing focus of atmospheric remote sensing as they play an important role in atmospheric radiation due to their reflection of sunlight and their entrapment of infrared radiation. The bulk mass of ice in the atmosphere is typically used to describe the column-integrated bulk mass of atmospheric ice, also known as the ice water path (IWP). Measuring the IWP continues to remain a challenging task and is an important gap in the current global climate observation system. Buehler et al. (2012b) and Holl et al. (2014) argued that this discrepancy is one of the reasons why there are large differences in the IWP estimates in climate models. In general, the term IWP is defined for the whole integrated ice bulk mass, for example in the work of Evans et al. (2012) and Holl et al. (2014). However, in this paper, henceforth, we distinguish between cloud ice, which consists mainly of ice particles with diameters < 100 µm, and snow, which consists mainly of ice particles with diameters > 100 µm. This threshold results from the particle size distribution used (see Sect. 3.2). This distinction between small and large ice particles is similar to that in atmospheric models such as the Icosahedral Nonhydrostatic (ICON) model (Zängl et al., 2015) or in the IFS-137 model of the European Centre for Medium-Range Weather Fore-Published by Copernicus Publications on behalf of the European Geosciences Union. casts (ECMWF) (Eresmaa and McNally, 2014). Hereinafter, we define the CIWP as the column-integrated bulk mass of cloud ice and we define the snow ice water path (SIWP) as the column-integrated bulk mass of snow. Note that snow defined in this way can and does occur at high altitudes; typical cirrus clouds in the used model fields contained about twothirds of their mass in the form of snow, and only the remaining in the form of cloud ice.
Existing methods to estimate the IWP use passive sensors within the microwave, infrared, and visible ranges of the electromagnetic spectrum and use active sensors such as radar or lidar, or combinations of different sensors. Comprehensive overviews of existing methods can be found in Eliasson et al. (2013) and Holl et al. (2014). According to Holl et al. (2014) active sensors, especially combined radar-lidar are probably capable of estimating IWP with a higher accuracy than any existing passive sensor. Furthermore, because of the principle on which their measurements are based, active sensors such as lidar and radar are much more suited for also retrieving the vertical structure. The problem with active sensors is that they lack horizontal coverage, because they only sample the atmosphere directly below the satellite.
Existing passive sensors are problematic in that their sensitivity is highly selective. Passive microwave sensors for example lack sensitivity for thin ice clouds but are capable of sensing the whole column, whereas infrared and optical sensors are capable of sensing thin ice clouds but cannot sense the whole column because high clouds obscure lower clouds. Submillimeter waves are much more sensitive to ice clouds compared to microwaves, as we show in Sect. 3, but passive submillimeter waves are still capable of sensing the whole column in contrast to infrared or visible waves. The use of submillimeter waves therefore ensures that the retrieval of the IWP based on combined microwave and submillimeter wave measurements is more effective than when using infrared or visible waves. This approach also obviates the need for collocating data from different sensors, for example when using the SPARE-ICE product (Holl et al., 2014). However, regardless of the technique that is used, remote sensing of ice clouds is a difficult task because of the many factors that can influence the measurement (Evans et al., 2012).
The launch of the Meteorological Operational Satellite Second Generation B (MetOp-SG B) is planned for the early 2020s. Among other sensors, this satellite will be equipped with an Ice Cloud Imager (ICI) and Microwave Imager (MWI). ICI will be the first operational spaceborne downlooking sensor with the ability to measure in the submillimeter range of the electromagnetic spectrum. The main purpose of ICI, as indicated by its name, is to sense ice clouds. Even though the studies of Buehler et al. (2012b), Buehler et al. (2007), and Jiménez et al. (2007) were not explicitly carried out for ICI, they provide a useful overview of the fundamentals of ICI. The International Submillimetre Airborne Radiometer (ISMAR) is an airborne radiometer that measures at several frequencies between 118 and 664 GHz of the elec-tromagnetic spectrum. One of the main tasks of ISMAR is to serve as a satellite demonstrator for ICI (Fox et al., 2017). Apart from ISMAR, another airborne radiometer that measures in a similar region of the electromagnetic spectrum is the Compact Scanning Submillimeter-wave Imaging Radiometer (CoSSIR). Evans et al. (2012) used measurements from CoSSIR on board the ER-2 aircraft to estimate the IWP.
In March 2015, COSMICS (Cold-air Outbreak and sub-Millimetre Ice Cloud Study) was carried out around the northern part of the United Kingdom and Iceland. Among other measurements, COSMICS recorded airborne radiometer measurements with ISMAR and the Microwave Airborne Radiometer Scanning System (MARSS). These measurements were conducted using the BAe-146 aircraft from the Facility for Airborne Atmospheric Measurements (FAAM). ISMAR and MARSS together cover most of the ICI and MWI channels ≥ 89 GHz, which makes ISMAR and MARSS very useful in view of MetOp-SG B.
The main purpose of this work is to develop a method to retrieve the paths of ice and snow in the atmosphere, known as frozen hydrometeors, from data recorded by airborne millimeter and submillimeter radiometer and to apply the retrieval on real observations. Our plan is to base the retrieval method on artificial neural networks (NNs). The artificial NNs are trained using a database of atmospheric profiles taken from a numerical weather prediction (NWP) model and associated brightness temperatures calculated using a radiative transfer model. The model profiles are broadly representative of the conditions during the flight, but they span a much greater range of atmospheric conditions. As the simulations need information about cloud liquid water, precipitating water, and water vapor, we additionally investigated retrieval for column-integrated cloud liquid water, which we term the liquid water path (LWP), the column-integrated precipitating water, which we term the rainwater path (RWP), and the column-integrated bulk mass of water vapor, which we term integrated water vapor (IWV). Our retrieval approach is similar to a previous approach of Jiménez et al. (2007). However, our study differs from theirs in three main respects: first, we apply the retrieval method to real measurements; second, we are not only interested in frozen hydrometeors; and, third, our system can be employed over the ocean, whereas the approach of Jiménez et al. (2007) worked only over land. The performance of the NN retrieval is evaluated using an independent set of atmospheric profiles and simulated brightness temperatures to get an error estimate of the retrieval. Furthermore, the retrieval is applied to the observation and the retrieved quantities are compared to NWP model values as a consistency check. Although Wang et al. (2016) followed a similar approach to estimate hydrometeor paths, they did not apply their approach to measured data. They only used measurement data up to 200 GHz to validate their simulations. In contrast to our study, the retrieval system they developed was intended for retrieval over land and ocean.
Atmos. Meas. Tech., 11, 611-632, 2018 www.atmos-meas-tech.net/11/611/2018/ The text is structured as follows: in Sect. 2 we provide an overview of ISMAR and MARSS. In Sect. 3 we describe the retrieval method. This includes the basic assumptions of the method, the structure and training of the artificial NNs we used, the approach we followed to conduct the simulations, and the approach we used to construct the dataset on which to train the NN and to check the consistency of the simulations. Section 4 contains the results of our test of the retrieval system under ideal conditions to obtain the limits of the procedure and a discussion of the results. In Sect. 5 we present the results of applying the retrieval method to ISMAR and MARSS measurements and discuss the results. In Sect. 6 we summarize the results.

ISMAR
ISMAR is an along-track scanning heterodyne radiometer, which measures between 118 and 664 GHz (Table 1). IS-MAR is jointly funded by the UK Met Office and the European Space Agency (ESA). One task of ISMAR is to serve as an airborne demonstrator for the upcoming ICI on MetOp-SG B. ISMAR measures at similar frequencies as ICI except for the channels at approximately 118 GHz, which form part of MWI instead, and which is on board the same satellite. ISMAR measures radiation as Rayleigh-Jeans calibrated brightness temperatures. This means, within ISMAR, the received radiation power is converted to brightness temperatures using the Rayleigh-Jeans approximation for a blackbody. Except for the window channels at 243.2 and 664.0 GHz, ISMAR measures single linear polarization. The window channels measure dual orthogonal linear polarization. ISMAR is mounted on the left side of the aircraft, allowing both upward and downward views. Downward views with nominal nadir incidence angles between +50 and −10 • are possible, where positive angles indicate directions towards the front of the aircraft. Zenith observations can be made in the +10 to −40 • range. The nadir +50 • view is designed to give a close match in incidence angle to conically scanning imagers such as ICI. However, in this work we use only the near-vertical nadir view in order to eliminate any polarization differences. For further details on ISMAR see Fox et al. (2017). Polarization differences are not expected in the vertical view as both polarizations are orthogonal to both the surface and the clouds, and the sensed medium is likely to be random in the azimuth direction. Therefore, the two polarizations of the window channels were averaged.

MARSS
MARSS is an along-track scanning heterodyne radiometer, which measures between 89 and 183 GHz (Table 1). The viewing directions of MARSS are 40 to −40°nadir and 40 to −40°zenith. MARSS is an airborne version AMSU-B (Mc-Grath and Hewison, 2001). MARSS is also mounted on the side of the aircraft, allowing similar upward and downward views. MARSS measures single linear polarization and measures the radiation as Rayleigh-Jeans calibrated brightness temperatures. Further details on MARSS can be found in Mc-Grath and Hewison (2001) and the articles cited therein. In this work, we use only the nadir-viewing direction.

Retrieval method
Retrieving hydrometeor paths from brightness temperatures or in general from the radiance is an inverse problem with the generic form (Rodgers, 2000): where Y is the vector of the measured brightness temperatures, X is the vector quantities to retrieve, f (the forward model) is the radiative transfer and sensor model that can simulate brightness temperatures for a given atmospheric state, and ε is the measurement noise. The typical inverse problem in remote sensing is an ill-proposed problem. Many different ways have been reported in the literature to overcome this problem, for example optimal estimation (Rodgers and Connor, 2003), Monte Carlo integration in combination with Bayesian inference (Evans et al., 2012), or artificial NNs (Defer et al., 2008;Jiménez et al., 2007). We followed the latter approach and used NNs to retrieve the desired quantities. For a detailed introduction on NNs, see for example Rojas (2013). Before it can be used, a NN requires training data to set up the network. Construction of the training dataset is explained in the next subsection. Details of the NN follow in Sect. 3.4.

Training database
The training database plays a crucial role in NN-based retrieval. All the assumptions on which the retrieval method is based are condensed in the database. For example, the database needs to cover the actual measurement space (the full range of Y s), failing which the retrieval would be unsuccessful for some measurements. This would imply that the assumptions about the atmosphere and the interaction with electromagnetic radiation were inadequate. Therefore, it is important to make reasonable assumptions. The two main assumptions in terms of retrieval are that the atmospheric profiles from a NWP model are sufficient to describe the possible states of the atmosphere and that the interaction of the atmosphere with the electromagnetic radiation can be described by a radiative transfer model. We use atmospheric profiles from simulations of a regional version of the ICON model, details of which can be found in Zängl et al. (2015) and Reinert et al. (2016). The atmospheric profiles were taken from three ICON forecast runs on 11, 13, and 18 March 2015 of the region between 50 and 75 • N and  Fig. 1. The selected profiles cover a much wider range of atmospheric conditions than the actual conditions during the flight. The flight took about 3 h west of Iceland, whereas the selected profiles span in total a time range of 72 h over a much larger area than the actual flight. Because the atmospheric profiles were from the same season and they cover a wide range of atmospheric conditions including the conditions during the flight, these profiles are expected to sufficiently cover the situations encountered during the measurement flight without being optimized for this specific flight. Although the database covers a wide range of atmospheric conditions, it is constrained to a similar season and similar latitude over ocean. A retrieval based on this database is likely to provide insufficient results when applied to different seasons, different latitudes, or even over land. Simulated brightness temperature measurements for each atmospheric profile were generated for the database using the Atmospheric Radiative Transfer Simulator (ARTS) (Eriksson et al., 2011 andBuehler et al., 2005). Atmos. Meas. Tech., 11, 611-632, 2018 www.atmos-meas-tech.net/11/611/2018/

Radiative transfer simulations
ARTS, which is a radiative transfer model for thermal radiation, can process fully polarized radiative transfer calculations with scattering. This is important as microwave and submillimeter radiation mostly interacts with ice particles by scattering. We used ARTS version 2.3. The discrete ordinate iterative (DOIT; Emde, 2004) method was used as scattering solver within ARTS. The Rayleigh-Jeans brightness temperatures were simulated for each randomly selected atmospheric profile. No explicit spectral response function was used to simulate the ISMAR and MARSS channels; instead, we conducted monochromatic radiative transfer simulations for the center frequencies of the two side bands of each channel and obtained their average. Tests with highly spectral resolved clear-sky simulations showed that the error by using only the center frequency of each pass band is < 1 K. Possible effects due to different footprint sizes and beam-filling are neglected as the footprints of MARSS and ISMAR are much smaller than similar satellite instruments, or an ICON model grid cell. The footprint size at ground level is pretty much the same for all the ISMAR channels is of the order of 700 m for a flight altitude of 10 km. The footprint size at the surface varies with channel; for a flight altitude of 10 km it varies between 700 and 1500 m. Within ARTS, gas absorption was taken into account by using the HITRAN database (Rothman et al., 2013) and the MT_CKD model for the continuum absorption of water vapor and molecular nitrogen in version 2.52 (Mlawer et al., 2012). The gas absorption of molecular oxygen was processed by using the full absorption model of Rosenkranz (1998) modified by the values from Tretyakov et al. (2005). The surface emissivity was calculated using the FAST microwave Emissivity Model (FASTEM; Liu et al., 2011) implementation within ARTS 2.3 using the surface wind speed and surface temperature from the ICON model dataset. Although FASTEM was originally developed for low microwave frequencies, with further development the valid frequency range was enhanced to higher frequencies. Liu et al. (2011) tested FASTEM up to 150 GHz. Prigent et al. (2016 compared FASTEM with the Tool to Estimate Sea-Surface Emissivity from Microwaves to sub-Millimeter waves (TESSEM2) and with ISMAR measurements from two low-level flights at low wind speeds. They showed that FASTEM tends to underestimate the emissivity at 243.3 GHz leading to errors of order 5 K in the upwelling brightness temperature close to the surface (flight altitudes < 300 m). The emissivity using FASTEM at 243 GHz is roughly between 0.7 and 0.8 for nadir-viewing direction and atmospheric conditions during FAAM flight B897. At surface level and for a surface temperature of 276 K, which is the surface temperature in the ICON model for the beginning of the first transect of FAAM flight B897 (see also Sect. 5), these emissivities result in upwelling brightness temperatures of 193 and 221 K and a difference of 28 K in the up-welling brightness temperatures. Clear-sky simulations using ARTS for conditions similar to the driest conditions during the FAAM flight B897 show for an IWV of 6 kg m −2 at a flight level of 10 km an upwelling brightness temperature of 233 K for a surface emissivity of 0.7 and 243 K for a surface emissivity of 0.8. The difference in upwelling brightness temperatures is reduced to 10 K at a flight level of 10 km. This is roughly one-third of the upwelling brightness temperature difference at surface level. So, a 5 K error in the upwelling brightness temperature at the surface will result in a worst-case error of approximately 1.8 K at 10 km. For greater IWV the error is even smaller. Therefore, considering the strong scattering signal at 243.3 GHz (see Fig. 3), we do not consider this problematic. For the higher-frequency ISMAR channels (325 GHz and higher, ch. 12-18) the effect of surface emissivity errors will be smaller due to the strong water vapor absorption at these frequencies.
Each atmospheric profile consists of the following profiles with 90 pressure levels between 0.02 hPa and the surface pressure: Atmospheric temperature in K, altitude in m, atmospheric humidity in vmr, cloud liquid water in kg m −3 , cloud ice water in kg m −3 , rain in kg m −2 s −1 , and snow in kg m −3 . Oxygen and nitrogen levels were assumed to be constant with volume mixing ratios of 0.2095 and 0.7808, respectively.
The ICON runs used a one-moment microphysics scheme with four distinct hydrometeor types, namely liquid cloud water, cloud ice, rain, and snow. Assumptions on particle size distributions and shape are necessary in order to simulate brightness temperatures. Our assumptions are similar to Geer and Baordo (2014) with one exception: Geer and Baordo (2014) use sector-like snowflakes from the Liu (2008) database to simulate the scattering of snow. The Liu database is valid only for frequencies up to 340 GHz, which is insufficient for our simulations. Instead, we use aggregates from the database of Hong et al. (2009) to simulate the scattering of snow, because the Hong aggregate is the only aggregate habit for which there exist publicly available data above 340 GHz. According to Eriksson et al. (2015), Hong aggregates reasonably represent the average scattering properties of snow. However, in some respects the Hong database is also problematic. Firstly, the effective density of the Hong aggregates is constant, whereas the effective density of snow changes with the particle size. Secondly, the data are based on the old Warren (1984) refractive index data, which do not include the temperature dependencies. We therefore used a corrected version of the Hong et al. (2009) database in which the absorption is rescaled using the Mätzler (2006) parameterization for the refractive index of ice. Rescaling is achieved by multiplication with imag(n)/imag(n 0 ), where n 0 and n are the refractive indices from Warren (1984) and Mätzler (2006), respectively. The rescaling is used to obtain data for 183, 213, 243, and 266 K. The scattering extinction and all six of the phase matrix values are maintained constant. This means that only the absorption is rescaled. Our assumptions about the microphysics are the same in terms of the basic hydrometeor types but differ from the internal microphysics of the ICON model in terms of size, shape, and density. However, this is not considered an issue, because the function of the ICON model for the database is simply to deliver physically realistically profiles, which span the range of conditions that may be encountered. For this case, it is not necessary to be fully consistent with the ICON model. If the interest is in the ICON microphysics then consistency would be needed.
Explicitly, we used the following four hydrometeors for the radiative transfer simulations.
1. Liquid cloud water: the scattering properties were calculated under the assumption of a spherical shape using the Mie theory. The size distribution was calculated using a modified gamma distribution where D the diameter of the spheres using the coefficients of Geer and Baordo (2014). The parameters µ, γ , and are provided in Table 2. The scale parameter N 0 is set according to the mass concentration using the expression for the third moment of a modified gamma distribution (Petty and Huang, 2011) 2. Cloud ice: the scattering properties were calculated under the assumption of a soft sphere with a density of 900 kg m −3 using the Mie theory as in Geer and Baordo (2014). The size distribution was calculated using a modified gamma distribution (Eq. 2). The parameters µ, γ , and are listed in Table 2. The scale parameter N 0 is set according to the mass concentration using the expression for the third moment of a modified gamma distribution (Petty and Huang, 2011).
3. Rain: the scattering properties were calculated under the assumption of a spherical shape using the Mie theory. The size distribution was calculated using the Marshall-Palmer size distribution (Marshall and Palmer, 1948), for which the mass flux was converted to the rain rate by assuming a constant density of 1000 kg m −3 .
4. Snow: we assume snowflakes behave similar to the aggregates from the Hong DDA database (Hong et al., 2009). The size distribution was calculated using the midlatitude version of the distribution from Field et al. (2007). The mass-dimension relationship we used is where α = 65.4 kg and β = 3 are the shape parameters, D is the maximum diameter, and D 0 is the unit maximum diameter. The shape parameters α and β were calculated from the shape dimensions. The selected size distributions define the size range covered by the different hydrometeor habits. These choices result in cloud ice mainly consisting of particles < 100 µm, whereas snow mainly consist of particles > 100 µm.

Comparison of simulations and measurements
Before we can start with the retrieval, we have to verify whether the data in our training database cover the measurements. If the simulations do not cover the full range of measurements or only partially cover this range, the retrieval is likely to provide insufficient results. In Fig. 2 the brightness temperature of each channel at a flight altitude of 10 500 m is plotted against that of all the other channels, such that the plot consists of 18 times 18 subplots. The diagonal is empty by definition. The channels stated above the plots correspond to the brightness temperatures on the x axis and the channels stated on the right-hand side correspond to the brightness temperatures on the y axis. The plot in Fig. 2 shows how each channel is correlated with every other channel. First, let us consider the upper right half of the plot, where the measurements are plotted over the simulations. Although the measured values cover a smaller area than the simulated values, the former of these values are mostly surrounded by the latter values. This means that the variability of our simulations is higher than the variability of the measurements. As we chose the profiles randomly we do not expect to obtain an exact match between each measurement and its simulation. Actually, this is not necessary and is not our intention. The ICON profiles only have to be physically realistic and span the possible range of conditions. The important point is that the set of measurements is contained within the set of simulations.
In the lower left half, where the simulations are plotted over the measurements, we can easily determine whether the set of measurements is within the set of simulations. Mostly, the red dots are covered by the blue dots, meaning the measurements are within the set of simulated values. The simulated brightness temperatures of the 183.31 ± 1 GHz channel, the 325.15 ± 1.5 GHz channel, the three 448 GHz channels, and the 664 GHz channel are slightly higher than the measured brightness temperatures. One reason could be the presence of an insufficient amount of water vapor in the upper troposphere of the randomly selected atmospheric profiles from the ICON model dataset, because these channels are sensitive to the upper troposphere. Another reason could be the spectroscopy used within ARTS.
Of course this comparison cannot prove the sufficiency of our training database for retrieval purposes; however, thus far the behavior seems to be reasonable and understandable. We therefore expect the training database to be adequate for the retrieval.
Before we discuss the NN, we investigate the influence of frozen hydrometeors on the brightness temperatures. The liquid particles interact with the electromagnetic radiation by absorption and in the case of rain also by scattering, whereas the frozen particles interact with the electromagnetic radiation mainly by scattering. Furthermore, the absorption is mostly related to the total mass of the particles and is less dependent on the particle size, whereas scattering strongly depends on the particle size. Buehler et al. (2007) showed, for frequencies similar to those of ISMAR, that the frozen particles must have an effective diameter > 100 µm to have a significant influence on the brightness temperatures. Figure 3a shows the difference in brightness temperatures between a subset of 450 simulations without cloud ice and with cloud ice as a function of the CIWP. The maximum difference is < 0.5 K, which is mostly smaller than the noise of ISMAR and MARSS. This means, by using ISMAR and MARSS, there is no possibility to physically sense CIWP, bearing in mind that within this study CIWP is the column-integrated bulk mass of ice particles mostly smaller than 100 µm. In this respect, our work contrasts with that of Wang et al. (2016), who stated that they can estimate CIWP. The reason for this difference is that they assume a different particle size distribution for cloud ice, which results in larger cloud ice particles. Figure 3b shows the difference in brightness temperatures between a subset of 450 simulations without snow and with snow as a function of the SIWP. A clear relationship between the SIWP and the difference in brightness temperature can be seen. The difference in brightness temperature is up to 50 K. For the 243 GHz channel, it is even up to 80 K (outside the y-axis range of Fig. 3).

Neural network
Before the NN is set up, the retrieval method has to be defined. The main interest of this study is to retrieve SIWP as well as to investigate the retrieval of IWV, LWP, and RWP. Except for IWV, these quantities have a high dynamic range and all four quantities are always greater than or equal to 0 kg m −2 . Therefore, we retrieve the logarithm of the ratio of the desired quantity and the unit path, for example for SIWP: with SIWP 0 = 1 kg m −2 the snow ice water unit path. As the logarithm is not defined for zero, every zero value of the four quantities is assigned the value of 10 −9 kg m −2 before computing the logarithm, which was the order of the smallest values above zero, to avoid infinite values. Thus, the smallest value of a retrieval quantity is −9. Henceforth, writing the SIWP or one of the other three quantities in lowercase means that the decadic logarithm of the quantity was used. Our state vector X refers to The measurement vector Y consists of 18 components. Each component is the measured brightness temperature T b of one of the 18 combined channels of ISMAR and MARSS: with the channels defined as in Table 1. Instead of using one NN for the retrieval, we use an ensemble of NNs. According to Heskes (1997), an ensemble of NNs is expected to provide a more accurate estimate of the true regression than would be possible with only one NN. The retrieved state vector X is then the average over the estimated state vectors of each NN of the ensemble: where N is the number of neural networks and X n is the estimated state vector of the nth neural network. An ensemble of 20 NNs is used for the retrieval. Each NN consists of one input, one hidden, and one output layer with 18, 12, and 4 neurons, respectively. The input neurons are the components of the measurement vector Y , i.e., the measured brightness temperatures. The output neurons are the components of the state vector X, i.e., the logarithms of the path of the three hydrometeors and the logarithm of the IWV. Each NN is trained with simulated measurement vectors from the training database and the corresponding state vectors. The noise behavior of the measured brightness temperatures is included by adding a Gaussian distributed error to every simulated brightness temperature with a standard deviation of the noise of each channel (see also  (Hagan and Menhaj, 1994).
As simple to use and as powerful as NNs are, these networks have a downside. As soon as one part of the measurement setup is changed, a new NN must be trained. If the number of channels or even simply the position of one channel is changed, it is necessary to train a new NN. This has the implication that for airborne measurements, different NNs are Atmos. Meas. Tech., 11, 611-632, 2018 www.atmos-meas-tech.net/11/611/2018/ required for different flight altitudes. Nonetheless the computational burden is not high. Once the NNs are trained, which takes some hours, they are very fast. For satellites such as MetOp-SG B, which will carry MWI and ICI, this is less of an issue because observation will always be from above the top of the atmosphere. The main issue for a satellite application is that the training database must cover the global range of atmospheric conditions. Therefore our retrieval is limited to similar seasons and latitude range as the used database, but there is no fundamental limit in the usage a NN for global retrieval application, as long the database covers the wide range of globally possible atmospheric conditions and the NN can capture this variability. For example, Holl et al. (2014) applied their trained NN globally to retrieve IWP. By using for example similar ICON model runs for several globally distributed regions and different seasons, our retrieval can be expanded to global applications.

Basic retrieval performance
Retrieval simulations for a flight altitude of 10.5 km are used to test the basic retrieval performance. We applied the NN, which was trained with one part of the training database, to the other part of the training database. This means that the retrieval procedure was applied to approximately 7000 measurement vectors with simulated brightness temperatures. For each of these 7000 measurement vectors the corresponding state vector is known. Thus, the results of the retrieval can be compared directly with the true state vectors. This is a test under ideal conditions as retrieval and test data are based on the same assumptions. Possible errors due to radiative transfer simulation or errors of the model profiles are excluded in this test. In that case, the retrieval performance is limited by the errors of the artificial NNs and from the radiometer noise of MARSS and ISMAR in combination with limited interaction between the electromagnetic radiation and the atmosphere. We excluded the error of the radiative transfer simulation and the error of the atmospheric model because the modeling errors are difficult to estimate, as there are no data to compare with. Therefore, the errors from the direct comparison are an estimate of the physical limits of our retrieval approach. The retrieval error when applying the retrieval on measured brightness temperatures is likely to be larger, as the a priori assumptions will be never completely fulfilled.

Offset
In Fig. 4 the difference between the retrieved state vector and the true state vector is shown as a two-dimensional histogram. The x and y axes show the component of the retrieved state vector and the corresponding component of the difference between the retrieved and the true state vector, respectively. On the x and y axes, 45 equally sized bins between −9 and 2 and 121 bins between −5 and 12 are used, respectively. Because of the different value range of IWV, 121 equally sized bins between −1 and 2 and 161 bins between −1 and 1 are used on the x and y axes, respectively. The histograms are normalized with respect to the number of state vectors. The relative frequency of occurrence is coded as different grey shadings. Recall that the components of the state vectors are logarithmic quantities, as mentioned in the beginning of Sect. 3. The difference in the logarithmic quantities is the same as the logarithm of the ratio of the linear quantities. For example, a y-axis value of 1 in Fig. 4 corresponds to a factor 10 error, and a value of 0.1 corresponds to a 25 % relative error. To look for systematic errors of each component, the offset O i as a function of the j th bin of the binned ith com- where w ij k is the number of occurrences of bin (k, j ) of component i and x ij k is the binned difference x i = x ret,i − x true,i between the component i of the retrieved and the true state vector of bin (k, j ). The standard deviation σ O,i is calculated to consider the random error. The standard deviation σ O,i is shown by red dashed lines on either side of the offset The offset and the standard deviation were calculated for each j th binned component of the estimated state vector but only if the summed number of occurrences in the j th bin is at least 1 % of the number of state vectors to avoid statistical fluctuation due to small numbers. Strikingly, there is a straight line in the upper half of the plots of the retrieved hydrometeors, indicating a bimodal distribution for small values. For values > −3 this second mode vanishes. These lines depict cases of overestimation of the specific hydrometeor. All cases on these lines are cases where we set the specific component of the state vector to −9 to avoid infinite values, because for these cases the actual hydrometeor path was zero.
The SIWP histogram has a bell-mouthed shape, from which we can infer that with increasing amount the error decreases. The offset of the retrieved SIWP is 0 for SIWP > −2. In addition to this, the standard deviation is symmetric around zero for SIWP > −2. For SIWP < −2, the offset is oscillating around zero with increasing amplitude for decreasing SIWP. Up to this point, we can record that for SIWP > −2 the retrieval has no offset and the standard deviation decreases from 0.6 to 0.2 with increasing SIWP. The standard deviation of SIWP > −2 is of the same order of magnitude as the error for the retrieved IWP within the work of Evans et al. (2012). These authors used combined passive microwave and submillimeter radiometers to retrieve the IWP, among other quantities. The IWP of Evans et al. (2012) corresponds to the column-integrated bulk mass of atmospheric ice, whereas SIWP is the column-integrated bulk mass of snow. However, as the column-integrated bulk mass of cloud ice, which is our definition for CIWP, is typically an order of magnitude smaller, the IWP of Evans et al. (2012) corresponds mostly with the SIWP in our retrieval. A detailed comparison with the work of Evans et al. (2012) is difficult since there is no distinct information about the error as function of the IWP as there is in the work of Holl et al. (2014), for example.
The LWP histogram differs from the SIWP histogram. For LWP < −1 the LWP histogram consists mainly of a straight line in the upper half and a wider strip in the lower half. The various values in the lower half mean that many estimated values are underestimated. Due to the fact that cases with no LWP are strongly overestimated, the offset has some stronger jumps around zero. For LWP > −2 the offset and the standard deviation become smoother with increasing LWP. For LWP > −1 the offset changes only slightly with increasing LWP and the standard deviation decreases from 0.8 to 0.4 at LWP > 0. The RWP histogram is similar to the LWP histogram for RWP < −1. For RWP > −1, the size of the standard deviation is still similar to the standard deviation of LWP but compared to LWP there is a strong change of the RWP offset with increasing RWP indicating a significant non-zero offset.
The IWV histogram differs strongly from the histograms of the three estimated hydrometeor paths. It has a rectangular shape and the differences are at least 1 order of magnitude smaller. Except for IWV > 1.3 (IWV > 20 kg m −2 ), the offset over the whole range of values is practically zero and the standard deviation is almost constant with a value of 0.04. This means the IWV retrieval is offset-free over that range of values. For IWV > 1.3, there is a small offset of 0.02.

Median fractional error (MFE)
Thus far, we know which quantities can be measured offsetfree. We next address the retrieval error, which is described using the MFE, which was also used by Holl et al. (2014) to estimate the error of IWP of the SPARE-ICE product. The MFE is defined as follows: For example 100 % MFE on SIWP means that for half of the considered cases the retrieved value is within the interval SIWP true /2, 2 · SIWP true . For MFE < 30 % it is approximately equal to the relative error. The MFE for each component of the state vector as function of the corresponding component of the estimated state vector is shown as blue lines in Fig. 5. To compute the MFE, the components of the state vector were binned on a logarithmic grid with 45 bins starting from 10 −9 kg m −2 and ending at 10 2 kg m −2 . The different value range for the MFE of IWV necessitated the use of a logarithmic grid with 121 bins starting from 10 −1 kg m −2 and ending at 10 2 kg m −2 was used for IWV. The MFE is shown only for bins that include at least 1 % of the total number of state vectors to avoid statistical fluctuations. The MFE of SIWP decreases with increasing SIWP. Whereas the MFE of SIWP is more than 600 % for SIWP < 10 −3 kg m −2 it decreases to 20 % at SIWP = 1 kg m −2 . For SIWP > 0.01 kg m −2 the MFE is less than 100 % and for SIWP > 0.1 kg m −2 the MFE is less than 50 %, which is in good agreement with the relative error of SIWP over the ocean of Wang et al. (2016) for combined simulated MWI-ICI measurements. They used an approach similar to ours but with the difference of an additional frozen hydrometeor, different assumptions about the particle size distributions, and they used an additional NN-based classification before the retrieval. For snow they also assume slightly different scattering properties. Jiménez et al. (2007) conducted a simulated retrieval of IWP using channels similar to ISMAR and NNs but, in contrast to our retrieval, they carried out the retrieval over land and for different meteorological situations. These authors defined the column-integrated bulk mass of atmospheric ice as IWP, which, as written in the previous subsection, corresponds mostly with the SIWP of our retrieval. Comparing the MFE of SIWP with the retrieval error of IWP by Jiménez et al. (2007) shows that their retrieval error is approximately half as large as the MFE of SIWP. One has to be cautious when comparing these errors, because the exact error definition in Jiménez et al. (2007) is not clear. Because the datasets and assumptions in Jiménez et al. (2007) differ from ours, compared to our retrieval the errors cannot be expected to be the same, but they should be of the same order, which they are.
A comparison with the error estimation of the SPARE-ICE product (Holl et al., 2014), which combines the results that were obtained with the current operational microwave and infrared sensors, shows that the MFE of SIWP for SIWP = 0.01 kg m −2 is of similar size as the MFE of IWP of the SPARE-ICE product and that with increasing SIWP the MFE of SIWP decreases to about half of the MFE of IWP of the SPARE-ICE product. The IWP of the SPARE-ICE product is defined as the column-integrated bulk mass of atmospheric ice but should be comparable to SIWP in our retrieval. For SIWP < 0.01 kg m −2 the MFE of SIWP is larger than the MFE of IWP of the SPARE-ICE product. The SPARE-ICE product is a good measure to compare with because it provides a good estimate of the performance of the latest operational passive sensors, but there are also two caveats in the comparison. Firstly, our MFE is based on model simulations under ideal conditions, whereas the MFE of SPARE-ICE is based on the 2C-ICE product (Deng et al., 2010), which is derived from lidar and radar measurements. Secondly, our error estimation is obtained from the perspective of the retrieval results, whereas that of Holl et al. (2014) is from the perspective of the reference data, but as long as the retrieval is offsetfree this should not make a significant difference. For SIWP < 0.01 kg m −2 it is more effective to use a retrieval that includes thermal infrared channels as in SPARE-ICE, because the interaction between atmospheric ice and microwaves and submillimeter waves is too weak for such a low amount of SIWP (see Fig. 3). For now, we can keep in mind that our retrieval is capable of estimating SIWP with MFE lower than 100 % for SIWP > 0.01 kg m −2 and that the MFE of SIWP is reduced to about 20 % for high SIWP.  English (1995) estimated an error of 0.03 to 0.05 kg m −2 for LWP < 0.5 kg m −2 using a retrieval based on measurements of the 89 GHz channel and the 157 GHz channel of MARSS, which is of the same order as our retrieval. For LWP > 0.5 kg m −2 , English (1995) argued that the retrieval is unreliable by estimating an error of 0.85 kg m −2 for a LWP of 1 kg m −2 . However, these researchers performed their LWP retrieval on low liquid clouds, apparently without any ice. Compared to their results the error of our retrieval is almost one-third of the error of their retrieval and the meteorological conditions in our retrieval are much more complicated. Horvath and Davies (2007) compared the retrieval of LWP of warm nonprecipitating clouds from Tropical Rainfall Measurement Mission (TRMM) Microwave Imager (TMI) and from Moderate Resolution Imaging Spectroradiometer (MODIS). They found a RMS difference 0.025 kg m −2 between the two LWP retrievals for a mean LWP of 0.1 kg m −2 . Care needs to be exercised when comparing the errors of English (1995) and Horvath and Davies (2007) with our error estimate, because our error definition differs. Nonetheless, considering the fact that the meteorological conditions in our retrieval are more complex because of coexisting frozen and liquid hydrometeors and because we do not focus on a specific cloud form, the estimated MFE of LWP is reasonable. Our previous consideration of RWP indicated that the retrieval of RWP using MARSS and ISMAR is difficult. Except for the section around RWP = 0.1 kg m −2 , where the MFE of RWP is about 50 %, the MFE of RWP is much larger than 100 %. Interestingly, the MFE of RWP decreases for RWP < 0.1 kg m −2 and afterwards the MFE increases with increasing RWP. If we compare the MFE of RWP with the offset of RWP, then we can identify the regions with the lowest MFE as the regions where the offset of RWP is zero. The MFE of RWP increases for RWP > 0.1 kg m −2 with increasing RWP, because the offset of RWP increases with increasing RWP even though the standard deviation of RWP changes little with increasing RWP. This is in contrast to the findings of Wang et al. (2016), who estimated a relative value of < 40 % for RWP > 0.1 kgm −2 . The reason for this is that their training database includes more cases with higher RWP than our training database, so that their training database is more suitable for estimating RWP. If our database included more cases with higher RWP it is likely that our retrieval would provide a similar result to Wang et al. (2016).
Let us now consider the MFE of IWV, which is also shown in Fig. 5. As for the consideration of the offset of IWV, the MFE of IWV differs strongly from the results of the hydrometeor paths. The MFE of IWV is 1 order of magnitude smaller compared to the MFE of the hydrometeor paths and almost constant over the whole range of values changing little be-tween 5 and 8 %. Converted to an absolute value, this corresponds to an error of 0.2 kg m −2 for low IWV and to an error of 2 kg m −2 for high IWV. This error range of IWV corresponds to the range of differences of several different IWV retrievals (microwaves, infrared, radio sonde) and GPSretrieved IWV within the work of Buehler et al. (2012a). Note that, as we did not place any restriction on IWV, the retrieval for IWV is effective for cloudy conditions as well as for clear-sky conditions.

Benefit of the high-frequency channels of ISMAR
It is interesting to explore the benefit of the new highfrequency channels of ISMAR. We answer this question by comparing the retrieval, which we name the "ISMAR-MARSS" retrieval hereafter, with two additional retrievals: one retrieval using all channels up to 183 GHz (Table 1, ch. 1-10) and another retrieval using the 89 GHz, the 157 GHz, and the 183 GHz channels (Table 1, ch. 1, 7-10), which are the same five channels at which AMSU-B measures (Saunders et al., 1995); see also Sect. 2.2. We name the former and latter retrievals LF and AMSU-B, respectively. Except for the number of channels used and the number of hidden layer neurons, the setup is exactly as for the ISMAR-MARSS retrieval. Compared to the ISMAR-MARSS retrieval the number of hidden layer neurons of the LF retrieval and the AMSU-B retrieval were reduced to reduce the chance of overfitting, but tests showed that this is still adequate. The LF and AMSU-B retrievals use seven and five hidden layer neurons, respectively.
The MFE for each component of the state vector as a function of the corresponding component of the estimated state vector is shown in Fig. 5. The MFEs for RWP of the LF retrieval and of the AMSU-B retrieval are shown only for the sake of completeness, because we already know from our above considerations that the retrieval is insufficient for RWP. Therefore, we concentrate on SIWP, LWP, and IWV. For SIWP, the MFE of the ISMAR-MARSS retrieval is reduced at SIWP ≈ 0.01 kg m −2 below 100 %, whereas the MFEs of the LF retrieval and of the AMSU-B retrieval of SIWP decrease at SIWP = 0.06 kg m −2 and SIWP = 0.1 kg m −2 below 100 %, respectively. At SIWP = 0.06 kg m −2 the MFE for the ISMAR-MARSS retrieval is already at 50 %. For SIWP, the MFEs of the LF retrieval and of the AMSU-B retrieval are consistently higher than the MFE of the ISMAR-MARSS retrieval, but with increasing SIWP the difference between the MFE decreases. Because of the higher frequencies of the ISMAR channels (ch. 11-18) the MFE of SIWP can be reduced by a factor of as much as 2 with respect to the AMSU-B configuration. The 118 GHz channels are less important for the retrieval of SIWP because the difference between the LF retrieval and the AMSU-B retrieval is smaller.
For LWP, the MFE of the ISMAR-MARSS retrieval decreases monotonically, whereas the MFE of the LF retrieval and of the AMSU-B retrieval of LWP decreases only up to LWP = 0.1 kg m −2 and then increases with increasing LWP, whereas the MFE of the LF retrieval only increases slightly. The reason for the strong increase of the MFE of the AMSU-B retrieval is a strong underestimation of LWP > 0.1 kg m −2 . The AMSU-B retrieval estimates almost no LWP > 0.2 kg m −2 . The increase of the MFE of the LF retrieval is less strong than the MFE of the AMSU-B retrieval. The reason for the increase of the MFE of the LF retrieval is an increase of the offset with increasing LWP, which results in an overestimation of the LWP. Therefore, the higher-frequency ISMAR channels (ch. 11-18) deliver valuable information for the retrieval of LWP.
Thus, the lower-frequency ISMAR channels (ch. 2-6) and the higher-frequency ISMAR channels (ch. 11-18) are valuable for the retrieval of IWV. Whereas the MFE of IWV for the AMSU-B retrieval is on average about 10 % below an IWV of 12 kg m −2 and higher than 10 % above an IWV of 12 kg m −2 , the MFE of IWV for the LF retrieval is on average approximately 8 % and the MFE of IWV for the ISMAR-MARSS retrieval is approximately 6 % on average.
Thus, we can say that compared to an AMSU-B type sensor, the ISMAR channels deliver very valuable information for the retrieval, especially for SIWP, but also for a more accurate IWV retrieval and for a LWP retrieval under complex meteorological conditions.

Basic performance summary
This section describes the retrieval tests under ideal conditions. This means that retrieval and test data are based on the same assumptions. By doing so, the error of the radiative transfer simulation and the error of the atmospheric model were excluded from this investigation. The investigated errors result from the artificial NNs and from the physical limits of the retrieval, which are, on the one hand, the limited interaction between the electromagnetic radiation and the atmosphere and, on the other hand, the noise of the radiometers ISMAR and MARSS. Therefore, the investigated errors are an estimate of the limits of our retrieval approach. The retrieval error when applying the retrieval on measured brightness temperatures is likely to be larger, as the a priori assumptions will never be completely fulfilled.
One basic requirement of a retrieval is, in general, that the retrieval should be bias-free or, in our terms, the retrieval should have no offset. Based on that, the retrieval fulfills this requirement for SIWP > 0.01 kg m −2 , LWP > 0.1 kg m −2 , and for IWV > 3 kg m −2 . We cannot say whether the retrieval also has an offset of zero for IWV < 3 kg m −2 because there were almost no states with IWV< 3 kg m −2 . We can say that the requirement is not fulfilled for RWP.
In summary, a comparison with the simulated retrieval of Jiménez et al. (2007) showed that the performance of our SIWP is of the same order. The performance of our SIWP is also in good agreement with the performance of the SIWP retrieval of Wang et al. (2016). When the SIWP is not excessively small, i.e., above 10 −2 kg m −2 , ISMAR has the potential to perform more effectively than the SPARE-ICE (Holl et al., 2014) product. For smaller SIWP, SPARE-ICE performs more effectively, because it uses infrared channels, which are more sensitive to very thin clouds than millimeter and submillimeter waves. A comparison with the retrieval of English (1995) and the study Horvath and Davies (2007) showed that the results of the LWP retrieval are reasonable. The LWP retrieval method is capable of retrieving LWP in situations with coexisting frozen and liquid hydrometeors. Furthermore, our retrieval is capable of retrieving IWV under cloudy and clear-sky conditions with an error, which is comparable with existing clear-sky IWV retrievals.
A comparison of our retrieval with retrievals using only the channels up to 183 GHz enables us to conclude that the retrieval of SIWP strongly benefits from the higher-frequency ISMAR channels (ch. 11-18; see Table 1). The MFE of SIWP is reduced by a factor of 2 compared to retrievals using only channels up to 183 GHz channels. Both the IWV and LWP retrievals benefit from the higher-frequency ISMAR channels.

Flight B897: measurements on 18 March 2015
In this subsection, we describe the application of the retrieval to brightness temperatures measured during the FAAM flight B897 on 18 March 2015 as part of COSMICS. On that day, the FAAM BAe-146 aircraft measured a precipitating frontal system west of the coast of Iceland. The aircraft had several instruments on board to measure the size of ice particles, among which were in situ probes, ISMAR, and MARSS. We focus on the measurements of these two radiometers. Details about FAAM BAe-146 and the other instruments on board can be found on the website of FAAM (http://www.faam.ac.uk). Figure 6 shows the flight track, overlaid on MODIS images from 18 March 2015. The flight consisted of three northsouth transects across the frontal structure starting in the north. The transects were flown along a straight line starting at 66 • N and 25 • W and ending at 62 • N and 25 • W. The airplane required 2.5 h for the three transects. During these transects a total of 12 dropsondes (Vaisala Dropsonde RD94) were dropped. The altitude time series is also shown in Fig. 6. The airplane was above the clouds most of the time. During the flight the clouds varied from thin, broken clouds in the north to full-depth precipitating clouds in the south. The frontal structure moved slightly northwards during the flight.
Every time step at which the aircraft was not in stable, straight, and level flight was excluded from the brightness temperature time series to ensure that the retrieval is only applied to measurements recorded when the aircraft was at constant altitude with its wings level. In stable, straight, and level flight, the aircraft actually has a pitch of 5 • , resulting Atmos. Meas. Tech., 11, 611-632, 2018 www.atmos-meas-tech.net/11/611/2018/ in slightly different incidence angle for ISMAR and MARSS instead of nadir, but this slight change in the incidence angle has no significant effect on the retrieval. The sampling period of the brightness temperature time series is 3.6 s. The time se-ries is smoothed by a 3.5 min running mean to improve the compatibility of the measurements with those of the ICON model and to reduce the amount of noise. A 3.5 min running mean corresponds to a path length of ≈ 23 km. This is on the order of the smallest horizontal size of features that can be resolved within of the ICON model, which is twice the grid resolution of ICON. As stated in Sect. 3.4, different NNs need to be trained for different flight altitudes. Thus, we divided the flight into nine discrete pressure levels, for which NNs, as described in Sect. 3.4, were trained using 6000 randomly selected profiles from the database. These NNs were applied to the measured brightness temperature time series, which is shown in Fig. 7. The flight consisted of three crossings of a frontal system. The brightness temperature time series starts at 12.3 h in the north, then flying southward until 13.4 h, crossing the frontal system and flying back northward until 14.1 h, and finally flying southward. The brightness temperature time series itself reflects the flight pattern, as it is symmetric around the turning points (13.4, 14.1 h). From the symmetry within the brightness temperature time series, it is to be expected that the meteorological conditions are also symmetric with respect to the turning points. This symmetric pattern is a good test for the consistency of the retrieval procedure, because the retrieved hydrometeor path and IWV time series should reflect this pattern. In the 89 GHz channel, we can clearly see the crossing of the frontal system. At the beginning the brightness temperatures are about 190 K and this low brightness temperature indicates that the sensed radiation was emitted from the ocean surface. At 89 GHz the emissivity of the ocean surface is approximately 0.7, resulting in a brightness temperature of about 190 K for a surface temperature of about 273 K. Over the ocean an increase in the amount of liquid water in the atmosphere leads to an increase in the brightness temperature at 89 GHz. When the aircraft moved towards the frontal system, the 89 GHz brightness temperature increased up to a maximum of 250 K around the turning point at 13.4 h. This increase in the brightness temperature enables us to conclude that there must be a strong increase in the amount of liquid water in the atmosphere, because the high brightness temperature indicates that the large amount of absorption suggests that the sensed radiation is not emitted by the ocean surface but from somewhere in the lower troposphere.

Retrieval applied to flight B897
Time series of the retrieved SIWP, LWP, and IWV are shown as blue lines in Fig. 8. In the absence of in situ data except for the 12 water vapor profiles from dropsonde measurements, the retrieval is compared with the ICON model. The red lines indicate the value of the corresponding component of the ICON model state vectors interpolated to the time and location of the aircraft measurement. Of course, the ICON model itself is far from being perfect due to internal assumptions as well as limited temporal and spatial resolution. Therefore we cannot expect that the model is accurate in terms of retrieval quantities, time, and location. To get an estimate for the uncertainty of the ICON model, we produced histograms of the corresponding components of the ICON state vectors within a 50 km radius and within ±1 h of the time of measurement for every time step and location of the flight. The histograms are plotted as grey shades underneath the ICON time series. The model data are not considered as truth but they serve as a consistency check within this analysis.
Comparing the retrieval to the model state is not a true validation for several reasons, notably the dependence of the training data on the same model, and the fact that the model hydrometeors may be quite far from the true hydrometeors at the time and location of measurement. Nevertheless, testing whether the ICON simulations and ISMAR or MARSS measurements are comparable is important to ensure consistency, given our assumptions in representing the model hydrometeors in the radiative transfer simulations. Big errors in these assumptions would mean that the simulated and observed brightness temperature for a given profile would be very different. This implies that the result from the retrieval applied to the actual observation would be a very different to the model.
In general, the time series of the retrieved state vectors in Fig. 8 are within the given uncertainties and in reasonable agreement with the time series of the ICON model. The blue lines are mostly within the grayish area. The retrieved SIWP, LWP, and IWV time series are symmetric with respect to the turning points (13.4, 14.1 h), which is consistent with the abovementioned expectation. Although the agreement is good in general, there are substantial differences between the retrieval and the model, for example the time period between 13 and 13.5 h of the SIWP time series. Possible sources for the difference between the retrieved time series and the modeled time series are as follows: 1. The limit of the retrieval itself, namely the combined error from the NN approach and the radiometer noise. 2. The assumptions for the radiative transfer simulations, namely the assumption about particle size distributions and hydrometeor types and their shape. Time series of SIWP, LWP, and IWV retrieved from simulated brightness temperatures of the flight are shown in Fig. 9 in a similar way as in Fig. 8 in order to illustrate the performance of the retrieval in idealized conditions. Under these ideal conditions, as simulation and retrieval are based on the same assumptions, the agreement between the retrieved time series and the model time series is very good, and differences are within the range expected from the analysis in Sect. 4. In Fig. 7 both the observed and simulated brightness temperature time series are shown. The observed brightness temperatures of the 89 GHz and the 118.75 ± 5 GHz channels show, for example, a steady increase between 12.5 and 13.3 h flight time, whereas the increase of the simulated brightness is rather discontinuous, being flatter at the beginning and steeper after 13.2 h. As described at the beginning of Sect. 5, an increase of the 89 GHz brightness temperature over ocean indicates an increase of liquid water within the atmosphere. The same holds for the 118.75 ± 5 GHz channel. The conclusion from this comparison of brightness temperatures is that in the model the increase of liquid water is delayed compared to reality. This implies that the model predicts the front further south, with a more rapid increase in liquid water. These behaviors are also reflected in the retrieved LWP time series from the observation (Fig. 8) and from the simulation (Fig. 9). LWP retrieved from the observation shows a more steady increase, whereas LWP retrieved from simulation shows a more discontinuous increase, with a strong increase at 13.2 h. Therefore, it is unlikely that the differences arise from NN and noise-related uncertainties and that their effect is less important, because the retrieval shows for observation and simulation a coherent behavior in terms brightness temperature and LWP. Furthermore, the brightness temperature time series were smoothed to reduce the noise. However, it is likely that the differences mainly arise from the inaccuracies of the ICON model in the spatial, temporal, and structural representation of the front, because the difference between LWP retrieved from the observation and LWP retrieved from simulation corresponds to the difference between observed and simulated brightness temperatures. Nonetheless, unresolved features in the ICON model cannot be excluded as possible source for the difference, too. The errors made by the radiative transfer simulations and the assumptions therein also influence the retrieval, but this reflects the general agreement between retrieval and model. A quantitative error estimate is difficult as there are no in situ data to compare with and the model error of ICON and the radiative transfer simulations are unknown.
For IWV we can compare the retrieval with the in situ data from the dropsondes. The dropsonde IWV is shown as orange crosses in Fig. 8. The retrieved IWV measurement captures the trend of the dropsonde IWV measurement, but compared to the dropsonde IWV the retrieved IWV is shifted to slightly higher values. The offset (mean difference) between the 12 dropsonde IWV values and the retrieved IWV value at the time of the start of dropsonde measurements is 0.5 kg m −2 . This offset could be due to a dry bias of the radiosondes or due to a wet bias within the retrieval. Nonetheless, for an IWV value of > 5 kg m −2 this offset results in an error of less than 10 %. The RMS difference between the 12 dropsonde IWV values and the corresponding retrieved IWV value is 0.8 kg m −2 . This corresponds to an MFE of 16 % for an IWV value of 5 kg m −2 and to a MFE of 4 % for an IWV value of 18 kg m −2 . When removing the offset, the RMS difference is 0.6 kg m −2 , which is similar to the random error 0.66 kg m −2 between the radiosonde measurements and the GPS measurement of the IWV values in Buehler et al. (2012a). The IWV error is in the expected range of Sect. 4.2. Despite the accuracy of the statistic being such that a detailed analysis is not possible, this comparison is encouraging, showing that the retrieval of IWV measurements, in general, is effective under both cloudy and clear-sky conditions.
We know from Sect. 4 that the retrieval is insufficient for RWP. Nonetheless, we apply the retrieval for RWP out of curiosity. Figure 10a shows the time series of the retrieved RWP, which seems to represent the general structure of the modeled time series. The retrieved RWP time series is symmetric with respect to the turning points (13.4, 14.1 h), which is consistent with the stated expectations. The retrieved RWP time series shows a strong increase within the time period between 12.5 and 13.4 h with a maximum RWP at approximately 13.4 h, which is consistent with our conclusion from the brightness temperature time series. In Sect. 4, we concluded that the retrieval is insufficient for RWP, but at first glance the retrieval of RWP seems to be effective according to Fig. 10. We verified this by applying the retrieval to a simulated brightness temperature time series, because, if the retrieval of RWP was effective, then the retrieved RWP should be similar to the ICON RWP. The time series of the RWP retrieved from the simulated brightness temperature is shown in Fig. 10b. For RWP the blue and red lines are not in agreement. Therefore, our conclusion from Sect. 4 still holds. Even though the RWP retrieval is unreliable, it can still deliver some useful information, such as an approximate classification that indicates whether there is rain or not.

Summary of flight analysis
We applied the retrieval method to the brightness temperatures measured during flight B897. As a consistency check we compared the retrieved state vectors with the ICON model state vectors, which we interpolated to the time and location of the aircraft measurements. Considering the given uncertainties, the agreement between the estimated SIWP, LWP, and IWV and the SIWP, LWP, and IWV from ICON is reasonable. There are strong local differences due to the misplacement of spatial features in the ICON model and smallscale variability. Compared to SIWP, LWP, and IWV, the RWP retrieval is less satisfactory, which is consistent with the results from Sect. 4. Furthermore, we compared the retrieved IWV with IWV from 12 dropsonde measurements.

Summary
This study involved an investigation of strategies for hydrometeor path retrieval from airborne radiometer measurements. We distinguish between cloud ice, which consists mainly of ice particles < 100 µm, and snow, which consists mainly of ice particles > 100 µm. This distinction between small and large ice particles is similar to the distinction in atmospheric models. We defined the CIWP as the column-integrated bulk mass of cloud ice and we defined the SIWP as the columnintegrated bulk mass of snow. As the use of ISMAR and MARSS makes it possible to sense SIWP but not CIWP, we developed a retrieval method based on a NN by using nadirviewing brightness temperature measurements with the main purpose of estimating SIWP. We also tried to estimate LWP, RWP, and IWV with the retrieval. The NNs were trained by simulated brightness temperatures and atmospheric profiles from the ICON model. The brightness temperatures were simulated by ARTS with the atmospheric profiles from the ICON model as input. The scattering properties of the hydrometeors were assumed to behave as Mie spheres except for SIWP particles, which were assumed to behave like the aggregates from the Hong et al. (2009) database. We tested the retrieval with simulated measurements of which the true state is known. This test enabled us to estimate the physical limits of this retrieval process: -If SIWP > 0.01 kg m −2 , then the MFE of our retrieval is lower than 100 %, which decreases to about 20 % for high SIWP and the retrieval has an offset of zero.
-If LWP > 0.05 kg m −2 , then the MFE of our retrieval is lower than 100 %, which decreases to about 30 % for high LWP and the retrieval has an offset of zero.
-If IWV > 3 kg m −2 , then the MFE is 5 to 8 %. Converted to an absolute value, this corresponds to an error of 0.2 kg m −2 for low IWV measurements and to an error of 2 kg m −2 for high IWV measurements.
The retrieval is insufficient for RWP determination because it is not bias-free and the MFE is mostly higher than 100 %. Furthermore, we showed that the magnitude of the error in the SIWP determination of the retrieval using ISMAR and MARSS measurements is only half of that of the retrieval using only AMSU-B channel combinations. This shows that estimating SIWP strongly benefits from submillimeter wave measurements but also that estimating LWP and IWV benefits from the higher-frequency ISMAR channels.
We applied the retrieval method to brightness temperature measurements recorded during flight B897. As a consistency check we compared the estimated SIWP, LWP, and IWV values with the SIWP, LWP, and IWV values that were obtained by using the ICON model, which were interpolated to the time and location of flight B897. Considering the stated uncertainties, the agreement between the estimated SIWP, LWP, and IWV values and the SIWP, LWP, and IWV values obtained with ICON is reasonable. A comparison between the retrieved IWV values with those from the 12 dropsonde measurements shows that the mean difference between them is 0.5 kg m −2 and the RMS difference is 0.8 kg m −2 . We showed thereby that we can use brightness temperature measurements obtained using ISMAR in combination with MARSS to estimate SIWP, LWP, and IWV. This is especially interesting in view of the upcoming MetOp-SG mission, where ICI together with MWI will provide brightness temperature measurements with a similar combination of channels. Although our retrieval is limited in season and latitude range, there is no fundamental limit in using NN for global retrievals. The main requirement for global application is that the training database covers the wide range of global possible atmospheric conditions. After establishing that the retrieval of SIWP, LWP, and IWV is effective, the next steps would be to firstly proceed beyond estimating integrated quantities and retrieve profiles because of the considerable potential of the combination of the channels of ISMAR and MARSS, which we did not ex-ploit in our actual retrieval. Secondly, the scattering properties of snow have to be investigated especially in the submillimeter range, because data for the scattering properties of this range of the electromagnetic spectrum are rare and partially inconsistent with measurements. The mass of the taken Hong aggregates is proportional to the third power of the maximum dimension of these aggregates (see also 3.2), whereas the measurements show that the mass is approximately proportional to the second power of the maximum dimension (Cotton et al., 2013). This is especially important in view of retrievals for the upcoming ICI sensor, because the retrieval results will strongly depend on the goodness of the scattering properties. Therefore, a more thorough validation is clearly needed, for example against in situ measurements. Setting up such validation experiments will be logistically challenging, ideally using at least two different aircraft, one with the radiometer and one with the in situ probes. Colocated aircraft cloud radar would be also very helpful.