Journal cover Journal topic
Atmospheric Measurement Techniques An interactive open-access journal of the European Geosciences Union
Journal topic
Atmos. Meas. Tech., 12, 2967–2977, 2019
https://doi.org/10.5194/amt-12-2967-2019
Atmos. Meas. Tech., 12, 2967–2977, 2019
https://doi.org/10.5194/amt-12-2967-2019

Research article 29 May 2019

Research article | 29 May 2019

# The cost function of the data fusion process and its application

The cost function of the data fusion process and its application
Simone Ceccherini, Nicola Zoppetti, Bruno Carli, Ugo Cortesi, Samuele Del Bianco, and Cecilia Tirelli Simone Ceccherini et al.
• Istituto di Fisica Applicata “Nello Carrara” del Consiglio Nazionale delle Ricerche, Via Madonna del Piano 10, 50019 Sesto Fiorentino, Italy

Correspondence: Simone Ceccherini (s.ceccherini@ifac.cnr.it)

Abstract

When the complete data fusion method is used to fuse inconsistent measurements, it is necessary to add to the measurement covariance matrix of each fusing profile a covariance matrix that takes into account the inconsistencies. A realistic estimate of these inconsistency covariance matrices is required for effectual fused products. We evaluate the possibility of assisting the estimate of the inconsistency covariance matrices using the value of the cost function minimized in the complete data fusion. The analytical expressions of expected value and variance of the cost function are derived. Modelling the inconsistency covariance matrix with one parameter, we determine the value of the parameter that makes the reduced cost function equal to its expected value and use the variance to assign an error to this determination. The quality of the inconsistency covariance matrix determined in this way is tested for simulated measurements of ozone profiles obtained in the thermal infrared in the framework of the Sentinel-4 mission of the Copernicus programme. As expected, the method requires sufficient statistics and poor results are obtained when a small number of profiles are being fused together, but very good results are obtained when the fusion involves a large number of profiles.

1 Introduction

Vertical profiles of atmospheric variables are often obtained with the inversion of remote-sensing observations performed by instruments operating on space-borne and airborne platforms, as well as from ground-based stations. When the same portion (or nearby portions) of atmosphere is measured more times by the same instrument or by different instruments the measurements can be combined in order to obtain a single vertical profile of improved quality with respect to that of the profiles retrieved from the single observation. The simultaneous retrieval from several observations is considered the most comprehensive way to combine different measurements of the same quantity (Aires et al., 2012); however, recently a new method, referred to as complete data fusion (CDF) (Ceccherini et al., 2015), was proposed that, with simpler implementation requirements, provides products of quality equivalent to that of the simultaneous retrieval products. The equivalence is exact if the linear approximation holds in the range of the retrieved products.

The CDF method was proved (Ceccherini, 2016) to be equivalent to the measurement space solution data fusion method (Ceccherini et al., 2009) and the latter was successfully applied to the data fusion of MIPAS-ENVISAT and IASI-METOP measurements (Ceccherini et al., 2010a, b) and of MIPAS-STR and MARSCHALS measurements (Cortesi et al., 2016).

Conversely, as highlighted in Ceccherini et al. (2018), the CDF provides poor results when applied to inconsistent measurements. Three causes of inconsistency are possible:

• i.

The profiles to be fused (in the following referred to as fusing profiles) are represented on different vertical grids.

• ii.

A variability is present in the observed species and the fusing profiles refer to different times and space locations.

• iii.

The fusing profiles are affected by different forward model errors.

These inconsistencies were addressed in Ceccherini et al. (2018), but some problems remain open. The inconsistency of case (i) was solved by adding to the measurement covariance matrix (CM) of each fusing profile an interpolation CM, which is built using the grids of the fusing profiles and by the a priori CM. The inconsistency of case (ii) was solved by adding to the measurement CM of each fusing profile a coincidence CM, which describes the variability of the observed species in the field of the observations. Regarding the inconsistency of case (iii), it was suggested to follow an approach similar to cases (i) and (ii) by adding to the measurement CM of each fusing profile a CM describing the forward model errors due, for example, to approximations in the model and uncertainties in atmospheric and instrumental parameters. The problem that remains open is the realistic estimate of these inconsistency CMs, which are otherwise determined on the basis of some educated guesses.

The value of the cost function, which is minimized in the fusion process, depends on the inconsistency CMs and can be used to establish some constraints on their amplitude. The goal of this paper is to determine the expected value of the cost function and to use this expectation to build a procedure for the improvement of our educated guess of the inconsistency CMs.

In order to assess its advantages we apply this procedure to simulated measurements of ozone profiles obtained in the thermal infrared in the framework of the Sentinel-4 mission (ESA, 2017) of the Copernicus programme (https://sentinel.esa.int/web/sentinel/missions, last access: 20 May 2019).

The paper is organized as follows: in Sect. 2, we recall the formulas of the CDF method in order to establish the formalism used in the subsequent sections. In Sect. 3, we describe the properties of the cost function and in particular determine the expected value and the variance of the cost function distribution. In Sect. 4, we describe the method that estimates the inconsistency CMs using the expected value of the cost function, apply it to the determination of the coincidence CM and assess the cases in which it provides useful information. Conclusions are drawn in Sect. 5.

2 The CDF method

Let us assume to have N independent measurements of the vertical profile of an atmospheric target referring to the same space-time location. Performing the retrieval of the N measurements with the optimal estimation method (Rodgers, 2000), we obtain N vectors ${\stackrel{\mathrm{^}}{\mathbit{x}}}_{i}$ ($i=\mathrm{1},\mathrm{2},\mathrm{\dots },N$), here assumed to be estimates of the same profile made on a common vertical grid. The vectors ${\stackrel{\mathrm{^}}{\mathbit{x}}}_{i}$ are characterized by the CMs Si and the averaging kernel matrices (AKMs) Ai (Ceccherini et al., 2003; Ceccherini and Ridolfi, 2010; Rodgers, 2000). The CMs Si are each defined as $〈{\mathbit{\sigma }}_{i}{\mathbit{\sigma }}_{i}^{\mathrm{T}}〉$, where the vector σi contains the error due to the propagation of the observation noise through the retrieval process (and differs from the total error, equal to the difference between true and retrieved vertical profiles, by the smoothing error due to the use of a constraint in the retrieval), the superscript T indicates the transpose of the vector and the symbol 〈…〉 indicates the statistical expected value.

The fused profile xf of these N measurements, as provided by the CDF method, is obtained minimizing the following cost function (see Ceccherini et al., 2015):

$\begin{array}{ll}& c\left(\mathbit{x}\right)=\sum _{i=\mathrm{1}}^{N}{\left({\mathbit{\alpha }}_{i}-{\mathbf{A}}_{i}\mathbit{x}\right)}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}\left({\mathbit{\alpha }}_{i}-{\mathbf{A}}_{i}\mathbit{x}\right)+{\left(\mathbit{x}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}\\ \text{(1)}& & {\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\left(\mathbit{x}-{\mathbit{x}}_{\mathrm{a}}\right),\end{array}$

where xa and Sa are the a priori profile and its CM that are used to constrain the data fusion, and

$\begin{array}{}\text{(2)}& {\mathbit{\alpha }}_{i}\equiv {\stackrel{\mathrm{^}}{\mathbit{x}}}_{i}-\left(\mathbf{I}-{\mathbf{A}}_{i}\right){\mathbit{x}}_{\mathrm{a}i}\end{array}$

is a modified fusing profile with xai the a priori profile used in the ith retrieval and I the identity matrix.

It is possible to verify that the modified fusing profile of Eq. (2) is also a measurement of the true profile xt obtained using the averaging kernels:

$\begin{array}{}\text{(3)}& {\mathbit{\alpha }}_{i}={\mathbf{A}}_{i}{\mathbit{x}}_{\mathrm{t}}+{\mathbit{\sigma }}_{i}.\end{array}$

This measurement does not depend on the a priori profile and has the same CM as the fusing profile.

The CDF solution xf is the profile that corresponds to the minimum of c(x) and is obtained with the following equation:

$\begin{array}{}\text{(4)}& {\mathbit{x}}_{\mathrm{f}}={\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}\left(\sum _{i=\mathrm{1}}^{N}{\mathbf{A}}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\alpha }}_{i}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbit{x}}_{\mathrm{a}}\right),\end{array}$

where we have introduced the matrix

$\begin{array}{}\text{(5)}& \mathbf{F}=\sum _{i=\mathrm{1}}^{N}{\mathbf{A}}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{A}}_{i},\end{array}$

which is the Fisher information matrix (Ceccherini et al., 2012; Fisher, 1935) of the fused profile, equal to the sum of the Fisher information matrices of the fusing measurements. As indicated by the name, this matrix fully characterizes the information content of each measurement.

The fused profile is characterized by the following CM and AKM:

$\begin{array}{}\text{(6)}& & {\mathbf{S}}_{\mathrm{f}}={\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}\mathbf{F}{\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}},\text{(7)}& & {\mathbf{A}}_{\mathrm{f}}={\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}\mathbf{F}.\end{array}$

When the fusing profiles ${\stackrel{\mathrm{^}}{\mathbit{x}}}_{i}$ are represented on different vertical grids, it is necessary to perform a resampling of the AKMs (Calisesi et al., 2005), which defines new ${\mathbf{A}}_{i}^{\prime }$ matrices with their second index equal to that of the common fusion grid. Following Ceccherini et al. (2016), we define such a transformation as follows:

$\begin{array}{}\text{(8)}& {\mathbf{A}}_{i}^{\prime }={\mathbf{A}}_{i}{\mathbf{R}}_{i},\end{array}$

where Ri represent the generalized inverse matrices of the interpolation matrices Hi, which interpolate the fusing profiles on the fusion grid.

In general, in order to account for interpolation, coincidence and forward model errors, the CDF formula can be modified (Ceccherini et al., 2018) by replacing αi with

$\begin{array}{}\text{(9)}& {\stackrel{\mathrm{̃}}{\mathbit{\alpha }}}_{i}={\mathbit{\alpha }}_{i}-{\mathbf{A}}_{i}\left({\mathbf{C}}^{\left(i\right)}-{\mathbf{R}}_{i}{\mathbf{C}}^{\left(f\right)}\right){\mathbit{x}}_{\mathrm{a}},\end{array}$

where C(i) and C(f) are the sampling matrices that select the grids (i) and the grid (f), respectively, from a fine grid that includes all the levels of the fusion grid (f) and of the N grids (i), and replacing Si with

$\begin{array}{}\text{(10)}& {\stackrel{\mathrm{̃}}{\mathbit{S}}}_{i}={\mathbf{S}}_{i}+{\mathbf{S}}_{i,\mathrm{int}}+{\mathbf{S}}_{i,\mathrm{coin}}+{\mathbf{S}}_{i,\mathrm{other}},\end{array}$

where Si,int, Si,coin and Si,other are the CMs associated with the interpolation error, with the coincidence error and with the forward model errors.

The CM associated with the interpolation error is given by

$\begin{array}{}\text{(11)}& {\mathbf{S}}_{i,\mathrm{int}}={\mathbf{A}}_{i}\left({\mathbf{C}}^{\left(i\right)}-{\mathbf{R}}_{i}{\mathbf{C}}^{\left(f\right)}\right){\mathbf{S}}_{\mathrm{a}}{\left({\mathbf{C}}^{\left(i\right)}-{\mathbf{R}}_{i}{\mathbf{C}}^{\left(f\right)}\right)}^{\mathrm{T}}{\mathbf{A}}_{i}^{\mathrm{T}},\end{array}$

where here Sa is the a priori CM represented on the fine grid. The CM associated with the coincidence error is given by

$\begin{array}{}\text{(12)}& {\mathbf{S}}_{i,\mathrm{coin}}={\mathbf{A}}_{i}{\mathbf{C}}^{\left(i\right)}{\mathbf{S}}_{\mathrm{coin}}{\mathbf{C}}^{\left(i\right)T}{\mathbf{A}}_{i}^{\mathrm{T}},\end{array}$

where Scoin is the CM describing the variability of the true profiles of the fusing measurements. The CM associated with the forward model errors is given by (Rodgers, 2000)

$\begin{array}{}\text{(13)}& {\mathbf{S}}_{i,\mathrm{other}}={\mathbf{G}}_{i}{\mathbf{S}}_{i,\mathrm{FM}}{\mathbf{G}}_{i}^{\mathrm{T}},\end{array}$

where Gi is the gain matrix, which includes the derivatives of the retrieved profile with respect to the observations and Si,FM is the CM describing the forward model errors due, for example, to approximations in the model and uncertainties in atmospheric and instrumental parameters.

3 The cost function

In this section, the expected value and the variance of the cost function are derived. In order to keep the formalism as simple as possible we deal with the cost function given in Eq. (1), where the treatment of inconsistency errors is not included. However, since the inconsistency errors only modify the CMs and the vectors αi and do not affect the fusion formula, the results obtained in this Section are valid in the general case.

Once the fused profile xf is calculated from Eq. (4) we can substitute it in Eq. (1) in order to obtain c(xf)≡cmin, which is the minimum value of the cost function. Because of measurement errors, cmin does not have a definite value, but assumes values according to a probability distribution. The properties of this probability distribution (in the following referred to as cost function distribution) are considered and in particular we determine the expected value and the variance of the distribution. In order to calculate these quantities we have to make explicit the errors σi in the expression of cmin; see next section. We assume that the errors σi are normally distributed with expected values equal to zero, have CMs equal to Si and are uncorrelated for different measurements.

## 3.1 The dependence of the cost function on the measurement errors

Substituting in Eq. (4) the expression of αi given by Eq. (3) and using Eq. (7), we obtain the following expression for xf:

$\begin{array}{}\text{(14)}& {\mathbit{x}}_{\mathrm{f}}={\mathbf{A}}_{\mathrm{f}}{\mathbit{x}}_{\mathrm{t}}+\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right){\mathbit{x}}_{\mathrm{a}}+{\mathbit{\sigma }}_{\mathrm{f}},\end{array}$

where σf is the error on xf given by

$\begin{array}{}\text{(15)}& {\mathbit{\sigma }}_{\mathrm{f}}={\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}\sum _{i=\mathrm{1}}^{N}{\mathbf{A}}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}\end{array}$

and characterized by the CM ${\mathbf{S}}_{\mathrm{f}}=〈{\mathbit{\sigma }}_{\mathrm{f}}{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}〉$ given in Eq. (6).

Substituting in Eq. (1) the expression of αi given by Eq. (3) and x with the expression of xf given by Eq. (14), we obtain the expression of cmin(σi) as a function of the measurement errors:

$\begin{array}{ll}& {c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)=\sum _{i=\mathrm{1}}^{N}{\left[{\mathbit{\sigma }}_{i}-{\mathbf{A}}_{i}{\mathbit{\sigma }}_{\mathrm{f}}+{\mathbf{A}}_{i}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]}^{\mathrm{T}}\\ & {\mathbf{S}}_{i}^{-\mathrm{1}}\left[{\mathbit{\sigma }}_{i}-{\mathbf{A}}_{i}{\mathbit{\sigma }}_{\mathrm{f}}+{\mathbf{A}}_{i}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]\\ \text{(16)}& & +{\left[{\mathbit{\sigma }}_{\mathrm{f}}+{\mathbf{A}}_{\mathrm{f}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\left[{\mathbit{\sigma }}_{\mathrm{f}}+{\mathbf{A}}_{\mathrm{f}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right],\end{array}$

where σf is a linear function of σi expressed by Eq. (15).

Equation (16) contains several matrix products, which produce several terms; we can rearrange these terms in the following way:

$\begin{array}{}\text{(17)}& {c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)={c}_{\mathrm{0}}^{\mathrm{min}}+{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)+{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right),\end{array}$

where ${c}_{\mathrm{0}}^{\mathrm{min}}$ is independent of the errors, ${c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)$ is linear in the errors and ${c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)$ is quadratic in the errors.

In the case of the term independent of the errors, performing algebraic operations and using Eqs. (5) and (7), we obtain

$\begin{array}{ll}& {c}_{\mathrm{0}}^{\mathrm{min}}=\sum _{i=\mathrm{1}}^{N}{\left[{\mathbf{A}}_{i}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}\left[{\mathbf{A}}_{i}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]\\ & +{\left[{\mathbf{A}}_{\mathrm{f}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\left[{\mathbf{A}}_{\mathrm{f}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]={\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}\\ \text{(18)}& & {\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)=\mathrm{tr}\left[\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\right],\end{array}$

where tr[] identifies the trace of the matrix and we have used the relation for the trace of a product of two matrices tr[CD]=tr[DC] when D and CT have the same shape.

In the case of the term linear in the errors, performing algebraic operations and using Eqs. (5), (7) and (15), we obtain

$\begin{array}{ll}& {c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)=\mathrm{2}\sum _{i=\mathrm{1}}^{N}{\left[{\mathbf{A}}_{i}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}\left[{\mathbit{\sigma }}_{i}-{\mathbf{A}}_{i}{\mathbit{\sigma }}_{\mathrm{f}}\right]\\ \text{(19)}& & +\mathrm{2}{\left[{\mathbf{A}}_{\mathrm{f}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)\right]}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbit{\sigma }}_{\mathrm{f}}=\mathrm{2}{\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbit{\sigma }}_{\mathrm{f}}.\end{array}$

In the case of the term quadratic in the errors, performing algebraic operations and using Eqs. (5) and (15), we obtain

$\begin{array}{ll}& {c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)=\sum _{i=\mathrm{1}}^{N}{\left({\mathbit{\sigma }}_{i}-{\mathbf{A}}_{i}{\mathbit{\sigma }}_{\mathrm{f}}\right)}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}\left({\mathbit{\sigma }}_{i}-{\mathbf{A}}_{i}{\mathbit{\sigma }}_{\mathrm{f}}\right)\\ \text{(20)}& & +{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbit{\sigma }}_{\mathrm{f}}=\sum _{i=\mathrm{1}}^{N}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}-{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}}.\end{array}$

From Eqs. (17)–(20) we obtain that the full expression of cmin(σi), arranged as a function of the errors, is

$\begin{array}{ll}& {c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)=\mathrm{tr}\left[\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\right]+\mathrm{2}{\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}\\ \text{(21)}& & {\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbit{\sigma }}_{\mathrm{f}}+\sum _{i=\mathrm{1}}^{N}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}-{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}},\end{array}$

where σf is a function of σi according to Eq. (15).

## 3.2 Expected value of the cost function

The expected value of the cost function is equal to the summation of the expected values of its three terms. Since ${c}_{\mathrm{0}}^{\mathrm{min}}$ is independent of the errors, its expected value coincides with its constant value. The expected value of ${c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)$ is zero because this term is linear in σi, and the expected values of σi are equal to zero. Therefore, we need to calculate only the expected value of ${c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)$:

$\begin{array}{ll}& 〈{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉=\sum _{i=\mathrm{1}}^{N}〈{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}〉-〈{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}}〉\\ & =\sum _{i=\mathrm{1}}^{N}\mathrm{tr}\left(〈{\mathbit{\sigma }}_{i}{\mathbit{\sigma }}_{i}^{\mathrm{T}}〉{\mathbf{S}}_{i}^{-\mathrm{1}}\right)-\mathrm{tr}\left(〈{\mathbit{\sigma }}_{\mathrm{f}}{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}〉\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)\right)\\ \text{(22)}& & =\sum _{i=\mathrm{1}}^{N}\mathrm{tr}\left({\mathbf{I}}_{i}\right)-\mathrm{tr}\left({\mathbf{S}}_{\mathrm{f}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)\right)=\sum _{i=\mathrm{1}}^{N}{n}_{i}-\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}\right),\end{array}$

where Eqs. (6) and (7) have been used and ni is the number of eigenvalues different from zero of ${\mathbf{S}}_{i}^{-\mathrm{1}}$ rather than the number of its diagonal elements. When Si is singular (or near singular) the inversion is performed by means of the generalized inverse (Kalman, 1976), and therefore ${\mathbf{S}}_{i}^{-\mathrm{1}}$ may have some eigenvalues equal to zero.

Finally, the expected value of the cost function is given by

$\begin{array}{}\text{(23)}& 〈{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉=\sum _{i=\mathrm{1}}^{N}{n}_{i}-\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}\right)+\mathrm{tr}\left[\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\right].\end{array}$

Recalling that the trace of the AKM represents the number of degrees of freedom (DOFs), which is the number of independent parameters actually determined by the analysis (Rodgers, 2000), we see that the expected value of the cost function is equal to the following: a first term that counts the number of available measurements minus a second term that is the number of DOFs plus a third term that depends on the difference between the a priori profile and the true profile.

## 3.3 Variance of the cost function

Using Eq. (21) it is possible to calculate the expression of the variance of the cost function. For those interested, the lengthy calculation is reported in Appendix A. The result is

$\begin{array}{ll}& \mathrm{var}\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]=\mathrm{2}\sum _{i=\mathrm{1}}^{N}{n}_{i}-\mathrm{4}\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}\right)+\mathrm{2}\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}^{\mathrm{2}}\right)\\ \text{(24)}& & +\mathrm{4}\mathrm{tr}\left[\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\right].\end{array}$

Equations (23) and (24) provide new relationships that make it possible to calculate the expected value and the variance of the cost function minimized in the CDF.

A particular case is that in which we take a priori errors going to infinity (unconstrained case). In that case ${\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}$ tends to the null matrix and Af coincides with the identity matrix, therefore, we obtain

$\begin{array}{}\text{(25)}& & {〈{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉}_{{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\to \mathrm{0}}=\sum _{i=\mathrm{1}}^{N}{n}_{i}-n,\text{(26)}& & \mathrm{var}{\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]}_{{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\to \mathrm{0}}=\mathrm{2}\left(\sum _{i=\mathrm{1}}^{N}{n}_{i}-n\right),\end{array}$

where n is the number of levels of the fused profile. As expected, Eqs. (25)–(26) are equal to the expected value and the variance of the chi-square distribution.

More generally, we notice that the third term of Eq. (23) and the fourth term of Eq. (24), which are only present when a constraint is used for the calculation of the fused profile, are a very small correction whenever mild constraints are used.

## 3.4 Reduced cost function

It is useful to introduce the reduced cost function defined as the ratio between the cost function and the expected value of the cost function:

$\begin{array}{}\text{(27)}& {c}_{\mathrm{r}}\left(\mathbit{x}\right)=\frac{c\left(\mathbit{x}\right)}{〈{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉},\end{array}$

with an expected value equal to 1.

Accordingly, the variance of the reduced cost function is equal to

$\begin{array}{}\text{(28)}& \mathrm{var}\left[{c}_{\mathrm{r}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]=\frac{\mathrm{var}\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]}{{〈{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉}^{\mathrm{2}}}.\end{array}$
4 Application

## 4.1 Method to estimate the inconsistency CMs

When the correct CMs are used, the reduced cost function is bound to be equal to 1 within the variability determined by its variance. In turn using the expected value of the reduced cost function as a constraint, we can tune the values of the CMs that characterize the inconsistencies of the fusing profiles, in particular either the CM Scoin describing the variability of the true profiles of the fusing measurements or the CMs Si,FM describing the forward model errors. Of course the reduced cost function is a single constraint, furthermore limited by the uncertainty introduced by its variance, and can only be used to determine one parameter of the inconsistency CMs. However, if the same unknown CM is involved in several fusion processes, a more elaborate determination of the CM may also be considered. In the following we consider the simple case in which the inconsistency CM is parametrized with a single parameter.

### 4.1.1 Estimate of the k parameter

If the inconsistency CM is written as kΣ, where k is a multiplicative parameter and Σ is an assumed CM that describes the inconsistency error, the value of the k parameter can be determined imposing that the reduced cost function is equal to 1:

$\begin{array}{}\text{(29)}& {c}_{\mathrm{r}}\left[{\mathbit{x}}_{\mathrm{f}}\left(k\right),k\right]=\mathrm{1}.\end{array}$

Since cr[xf(k),k] is a monotonic decreasing function of k, the value of k satisfying Eq. (29) can be easily found numerically.

### 4.1.2 Estimate of the error of the k parameter

The variance of the reduced cost function determines an error Δk on the value of the parameter k that is given by the following expression:

$\begin{array}{}\text{(30)}& \mathrm{\Delta }k=\frac{\sqrt{\mathrm{var}\left\{{c}_{\mathrm{r}}\left[{\mathbit{x}}_{\mathrm{f}}\left(k\right),k\right]\right\}}}{\frac{\mathrm{d}{c}_{\mathrm{r}}\left[{\mathbit{x}}_{\mathrm{f}}\left(k\right),k\right]}{\mathrm{d}k}}.\end{array}$

The determination of k and Δk by means of Eqs. (29) and (30) requires the calculation of cmin(σi)〉 and var[cmin(σi)] by means of Eqs. (23) and (24), which depend on the true profile. Since the true profile is unknown, in the following analysis we replace the true profile with the fused profile, which is its best estimate.

## 4.2 The determination of the coincidence CM in the case of simulated ozone data

### 4.2.1 Simulated data

The use of CDF will be particularly relevant for the analysis of the future atmospheric Sentinel missions of the Copernicus programme (https://sentinel.esa.int/web/sentinel/missions, last access: 20 May 2019). The number of data that will be available from these missions will pose technical challenges to most applications and the CDF can be used to reduce the number of products while maintaining the information content of the full datasets. In general, we have a good understanding of the average geographical variability of the observed products and a reasonable assumption can be made of the Scoin that is used for the data fusion, but local fluctuations may also have significant effects. Therefore, the possibility of using a scalar k, which takes into account the local fluctuations, may provide an important improvement for these data. For this reason, simulated data of the Sentinel-4 are a good opportunity for the test of the method described in Sect. 4.1.

In the framework of the AURORA project (Cortesi et al., 2018) we simulated Sentinel-4 ozone vertical profile measurements as they could be obtained by the infrared sounder operating in the thermal infrared on board the Meteosat Third Generation satellite (http://www.eumetsat.int/website/home/Satellites/FutureSatellites/MeteosatThirdGeneration/MTGDesign/, last access: 20 May 2019). The Sentinel-4 and the Sentinel-5P observations will improve our ozone composition knowledge (Quesada-Ruiz et al., 2019) and the AURORA project assesses the advantages offered by CDF in the exploitation of the data. The atmosphere used for the simulations is taken from the Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) reanalysis (Gelaro et al., 2017). The MERRA-2 data are provided by the Global Modelling and Assimilation Office (GMAO) at NASA Goddard Space Flight Center. This reanalysis covers the modern era of remotely sensed data, from 1979 through the present. The data of a geostationary image, acquired on 1 April 2012 in about 1 h, were considered, and of the available 423 719 measurements only the 35 594 measurements in clear sky have been simulated. A coincidence cell of 0.5 step of latitude and 0.625 step of longitude was chosen for the data fusion and a total of 1296 cells, where there are at least two measurements that can be fused, are obtained. The time coincidence is in our case very short and is practically negligible.

The a priori profiles provided by the McPeters and Labow climatology (McPeters and Labow, 2012) are used for all fusing and fused profiles. The a priori CMs are obtained using the standard deviation of the McPeters and Labow climatology when its value is larger than 20 % of the a priori profile and a value of 20 % of the a priori profile in the other cases. The off-diagonal elements are calculated considering a correlation length of 6 km. The correlation length provides an effective regularization that reduces oscillations in the retrieved profiles and the value of 6 km is typically used for nadir ozone profile retrieval (Liu et al., 2010; Kroon et al., 2011; Miles et al., 2015).

The method described in Sect. 4.1 to determine the coincidence CMs for the fusion of these simulated data is used. We model Scoin as kSa; that is we make the hypothesis that the variability of the true profiles is a fraction of that represented by the a priori CM, which describes the climatological variability of the true profiles on geographical regions that are larger than the fusion cells.

In Fig. 1, we report the k values given by Eq. (29) as a function of the number of fusing profiles; the Δk errors are given by the colour scale. Figure 1b provides an enlargement of Fig. 1a for small values of k. From Fig. 1 we see that large values of k are obtained when the number of fusing profiles is small and large errors are present. Since k is a positive parameter, the uncertainty in its determination manifests itself mainly with large positive values and sufficient statistics are needed for a useful determination of k and in our case the number of fusing profiles must be greater than 10.

Figure 1(a) Values of the parameter k as a function of the number of fusing profiles. (b) Enlargement of panel (a) for small values of k. The Δk errors are reported in the colour scale.

Increasing the number of fusing profiles the errors decrease and smaller values of k are obtained, although the Δk uncertainty, together with differences in the geographical variability, is still responsible for some dispersion of the k values. When the number of fusing profiles is sufficient to produce reliable k values, we obtain k values that are a fraction of the unity, confirming that a small geographical variability, a fraction of the climatological variability, occurs within the cell chosen for the data fusion. In order to assess the entity of the obtained values, it is important to notice that k multiplies the CM and, accordingly, is proportional to the square of the geographical variability.

### 4.2.2 Results for a single cell with a large number of fusing profiles

As an example, we analyse the behaviour of a cell with a large number of fusing profiles for which the k value is determined to be significantly larger than the error Δk. We deal with a cell with 80 fusing profiles, for which, applying the method described in Sect. 4.1, we obtain k=0.068 and Δk=0.014. In this case, Δk is about one-fifth of the k value.

The use of simulated data makes it possible to compare the results with the true quantities that we want to measure. In Fig. 2, we report the differences between three fused profiles, obtained with k=0.068, with k=0 and with the method used in the previous paper on the importance of coincidence errors (Ceccherini et al., 2018), and the true profile of the fusion, calculated as the mean of the true profiles corresponding to the fusing profiles. In the previous paper, an educated guess was made of the coincidence error and Scoin equal to a matrix with the square of the 5 % of the a priori profile on the diagonal elements, and a correlation length of 6 km for the off-diagonal elements was used. In the figure, the errors and the numbers of DOFs of the three fused profiles are also reported.

Figure 2Differences between the fused profiles obtained with k=0.068 (black line), k=0 (red line) and the method used in Ceccherini et al. (2018) (green line) (* in this case k is applied to a CM built in a slightly different way; see text) and the mean of the true profiles related to the fusing profiles as a function of altitude (and pressure). The errors (dotted lines) and the numbers of DOFs of the three fused profiles are reported as well.

We see that the fused profile with k=0 has large differences with respect to the true profile of the fusion, while the other two fused profiles have smaller and comparable differences. The errors are basically the same for all three fused profiles and the numbers of DOFs are about equal for k=0.068 and for the method used in the previous paper and slightly larger for k=0. The importance of using a coincidence CM is confirmed because it provides a significant reduction of the differences with the true profile at the cost of a negligible reduction of the number of DOFs. The difference between the results obtained with the two coincidence CMs is small, but the method described in Sect. 4.1 provides a slightly better compromise between reproduction of the true profile and number of DOFs and, more important, is an objective determination based on a mathematical constraint.

In Fig. 3, we report the square root of the diagonal elements of Scoin estimated by the method described in Sect. 4.1 (with its errors) and by the method used in the previous paper as a function of altitude (and pressure) and compare them with the standard deviation of the true profiles corresponding to the fusing profiles.

We see that the method described in Sect. 4.1 is able to reproduce the standard deviation of the true profiles well up to about 30 km of altitude. Above 30 km this method overestimates the spread of the true profiles, likely because we assume Scoin proportional to the a priori CM, which includes the day–night variability of ozone. This variability is instead absent in the fusing profiles because they belong to a single geostationary image that is acquired in 1 h. The educated guess of Scoin significantly overestimates the standard deviation of the true profiles below 8 km and above 15 km of altitude.

Figure 3Square root of the diagonal elements of Scoin estimated by the method described in Sect. 4.1 (black line), with its errors (grey band around the black line), and by the educated guess used in Ceccherini et al. (2018) (green line) (* in this case k is applied to a CM built in a slightly different way; see text) as a function of altitude (and pressure) compared with the standard deviation of the true profiles corresponding to the fusing profiles (blue line).

Within the limits posed by the fact that a single parameter is used for the estimate of a CM, the coincidence error determined with the constraint of the cost function is a very good representation of the real geographical variability, much better than that obtained with the educated guess (please note the logarithmic scale in Fig. 3), although the effect of this difference on the fusion process is very small, given the negligible consequences of overestimates of the coincidence error.

### 4.2.3 Analysis of all fusion cells

In order to evaluate the performances of the method described in Sect. 4.1 we introduce a quantifier β equal to the root of the square sum of the relative differences between the fused profile and its true profile:

$\begin{array}{}\text{(31)}& \mathit{\beta }=\sqrt{\sum _{i=\mathrm{1}}^{n}{\left(\frac{{\mathbit{x}}_{\mathrm{f}i}-{\mathbit{x}}_{\mathrm{t}i}}{{\mathbit{x}}_{\mathrm{t}i}}\right)}^{\mathrm{2}}},\end{array}$

where xfi is the ith component of the fused profile, xti is the ith component of the true profile of the fusion and n is the number of levels of the fused profile. We calculated this quantifier for all the fusion cells.

In Fig. 4, we show the scatter plots of β and of the number of DOFs (panel a and panel b, respectively) of the fused profiles obtained with the k values determined by the method described in Sect. 4.1 as a function of the same quantities obtained with k equal to zero. The number of fusing profiles is reported in the colour scale. In the case of β, small values are preferred; in the case of number of DOFs, large values are preferred.

From Fig. 4 we see that for large values of the number of fusing profiles in general the method described in Sect. 4.1 determines a significant reduction of β with respect to the case of k equal to zero, while the effect on the number of DOFs is negligible. In some cases for small values of the number of fusing profiles, we see that the use of the large value of k, erroneously determined by the method for the insufficient statistics, causes a significant reduction of the number of DOFs and sometimes also an increase in β. A worse value of β is also obtained in a few cases for cells that do not have a very small number of fusing profiles; however the loss observed in these cases is much smaller than the gain obtained in the much more numerous cells for which a reduction of β is observed. The distribution of the colours in Fig. 4b clearly shows that the number of DOFs increases when the number of fusing profiles increases, confirming the improvement of information obtained with the fusion of many profiles.

Figure 4(a) Scatter plot of βk as a function of β0. The quantifier βk corresponds to the k value determined by the method described in Sect. 4.1 and is reported on the y axis, while β0 corresponds to k equal to zero and is reported on the x axis. (b) Scatter plot of DOFsk as a function of DOFs0. DOFsk is the number of DOFs of the fused profile obtained when k is determined by the method described in Sect. 4.1 and is reported on the y axis, while DOFs0 is the number of DOFs of the fused profile obtained when k is equal to zero and is reported on the x axis. The number of fusing profiles is reported in the colour scale.

A complete evaluation of the performances of the method has to take into account both the ability to reproduce the true profile (represented by β) and the number of DOFs. For this reason, we define a new quantifier γ, equal to the ratio between β and the number of DOFs, which takes into account both aspects:

$\begin{array}{}\text{(32)}& \mathit{\gamma }=\frac{\mathit{\beta }}{\mathrm{DOFs}}.\end{array}$

The quality of the fused profile improves when the value of γ is reduced. In Fig. 5, we show the scatter plot of γ of the fused profiles obtained with the k values determined by the method described in Sect. 4.1 as a function of γ of the fused profiles obtained with k equal to zero. If the number of fusing profiles is smaller than 10 the points are reported in red, otherwise they are in blue.

From Fig. 5 we see that when the number of fusing profiles is larger than 10 in general the method described in Sect. 4.1 determines a reduction of γ, improving the quality of the fused profiles. Occasionally a worsening is observed, but improvements, in number and in entity, are overwhelmingly larger than worsenings. When the number of fusing profiles is smaller than 10 the k values determined by the method are affected by large errors (see Fig. 1) and values that are much larger than a reasonable expectation may be obtained. Therefore, it is not a surprise that in these cases the dominant effect is a degradation of the quality of the fused profiles with respect to the case of k equal to zero. For this reason, the method described in Sect. 4.1 can only be used when either k is determined with a small error or, similarly, the number of fusing profiles is sufficiently large. In the other cases, an educated guess should be used, possibly supported by the indications provided by the results obtained in the cells with a large number of measurements.

Figure 5Scatter plot of γk as a function of γ0. The quantifier γk corresponds to the k value determined by the method described in Sect. 4.1 and is reported on the y axis, while γ0 corresponds to k equal to zero and is reported on the x axis. If the number of fusing profiles is smaller than 10 the points are reported in red, otherwise they are in blue.

5 Conclusions

The measurements that we wish to fuse often have some inconsistencies due to representations on different vertical grids, imperfect time and space coincidence, and different forward model errors. In order to apply the CDF method to inconsistent measurements it is necessary to add to the measurement CM of each fusing profile a CM that qualifies these inconsistencies as errors and prevents their use as erroneous features of the profile. Therefore, a realistic estimate of the inconsistency CM is required for effectual fused products. In this paper, we propose using the statistical properties of the cost function distribution to improve the estimate of the inconsistency CM.

The expected value and the variance of the cost function distribution of the data fusion have been analytically determined for the first time. This allowed us to calculate the reduced cost function, which is bound to be equal to 1 within the variability determined by its variance. Modelling the inconsistency CM with one parameter, we used the expected value of the reduced cost function as a constraint to tune the value of this parameter and the variance of the reduced cost function to assign an error to this value.

We applied this method to simulated measurements of ozone profiles obtained in the thermal infrared in the framework of the Sentinel-4 mission of the Copernicus programme. The results show that when the number of fusing profiles is small the values of the parameter are affected by large errors; in particular they are almost completely undetermined if the number of fusing profiles is smaller than 10. For such small values of the number of fusing profiles, the method is not able to provide reliable values of the parameter and it is better to use an educated guess for the estimate of the inconsistency CM. Conversely, when the number of fusing profiles is large enough the values of the parameter provided by the method are affected by small errors and the estimated coincidence CMs generally improve the performances of the CDF method, providing a significant reduction of the differences between retrieved profile and true profile, with a negligible reduction of the number of DOFs.

Data availability
Data availability.

The data of the simulations presented in the paper are available from the authors upon request.

Appendix A

In this appendix we make the calculation of the variance of the cost function given in Eq. (24).

The variance is equal to

$\begin{array}{}\text{(A1)}& \mathrm{var}\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]=〈{\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)-〈{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉\right]}^{\mathrm{2}}〉.\end{array}$

Substituting in Eq. (A1) the expression of cmin(σi) given by Eq. (17), we obtain

$\begin{array}{ll}& \mathrm{var}\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]=〈\left[{c}_{\mathrm{0}}^{\mathrm{min}}+{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)+{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right\\ & {-〈{c}_{\mathrm{0}}^{\mathrm{min}}〉-〈{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉-〈{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉]}^{\mathrm{2}}〉\\ & =〈{\left[{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)+{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)-〈{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉\right]}^{\mathrm{2}}〉\\ \text{(A2)}& & =〈{\left[{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]}^{\mathrm{2}}〉+〈{\left[{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]}^{\mathrm{2}}〉-{〈{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉}^{\mathrm{2}},\end{array}$

where we used $〈{c}_{\mathrm{0}}^{\mathrm{min}}〉={c}_{\mathrm{0}}^{\mathrm{min}}$, $〈{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉=\mathrm{0}$ and $〈{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right){c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)〉=\mathrm{0}$ because the product ${c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right){c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)$ is cubic in the errors and, therefore, its expected value is zero as a consequence of the symmetry of the normal distribution.

Using Eq. (19), for the first term of Eq. (A2) we obtain

$\begin{array}{ll}& 〈{\left[{c}_{\mathrm{1}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]}^{\mathrm{2}}〉=\mathrm{4}〈{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbit{\sigma }}_{\mathrm{f}}〉\\ & =\mathrm{4}\mathrm{tr}\left[{\mathbf{S}}_{\mathrm{f}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right]\\ \text{(A3)}& & =\mathrm{4}\mathrm{tr}\left[\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\right],\end{array}$

where we have used the relation ${\mathbf{S}}_{\mathrm{f}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}={\mathbf{A}}_{\mathrm{f}}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)$ that comes from Eqs. (6) and (7).

Using Eq. (20), for the second term of Eq. (A2) we obtain

$\begin{array}{ll}& 〈{\left[{c}_{\mathrm{2}}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]}^{\mathrm{2}}〉=〈{\left[\sum _{i=\mathrm{1}}^{N}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}\right]}^{\mathrm{2}}〉\\ & +〈{\left[{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}}\right]}^{\mathrm{2}}〉\\ \text{(A4)}& & -\mathrm{2}〈{\left[\sum _{i=\mathrm{1}}^{N}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}}\right]}^{\mathrm{2}}〉.\end{array}$

Some further elaboration is needed to evaluate these three terms:

$\begin{array}{ll}& 〈{\left[\sum _{i=\mathrm{1}}^{N}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}\right]}^{\mathrm{2}}〉=\sum _{i=\mathrm{1}}^{N}〈{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}〉\\ & +\sum _{i,k=\mathrm{1},\phantom{\rule{0.25em}{0ex}}i\ne k}^{N}〈{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}〉〈{\mathbit{\sigma }}_{k}^{\mathrm{T}}{\mathbf{S}}_{k}^{-\mathrm{1}}{\mathbit{\sigma }}_{k}〉=\mathrm{2}\sum _{i=\mathrm{1}}^{N}\mathrm{tr}\left({\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{S}}_{i}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{S}}_{i}\right)\\ & +\sum _{i=\mathrm{1}}^{N}{\left[\mathrm{tr}\left({\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{S}}_{i}\right)\right]}^{\mathrm{2}}+\sum _{i,k=\mathrm{1},i\ne k}^{N}\mathrm{tr}\left({\mathbf{S}}_{i}{\mathbf{S}}_{i}^{-\mathrm{1}}\right)\mathrm{tr}\left({\mathbf{S}}_{k}{\mathbf{S}}_{k}^{-\mathrm{1}}\right)\\ \text{(A5)}& & =\mathrm{2}\sum _{i=\mathrm{1}}^{N}{n}_{i}+\sum _{i,k=\mathrm{1}}^{N}{n}_{i}{n}_{k}=\sum _{i=\mathrm{1}}^{N}{n}_{i}\left(\mathrm{2}+\sum _{k=\mathrm{1}}^{N}{n}_{k}\right),& 〈{\left[{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}}\right]}^{\mathrm{2}}〉=\mathrm{2}\mathrm{tr}\left[\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbf{S}}_{\mathrm{f}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbf{S}}_{\mathrm{f}}\right]\\ \text{(A6)}& & +{\left\{\mathrm{tr}\left[\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbf{S}}_{\mathrm{f}}\right]\right\}}^{\mathrm{2}}=\mathrm{2}\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}^{\mathrm{2}}\right)+{\left[\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}\right)\right]}^{\mathrm{2}}\end{array}$

and

$\begin{array}{ll}& -\mathrm{2}〈{\left[\sum _{i=\mathrm{1}}^{N}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}{\mathbit{\sigma }}_{\mathrm{f}}^{\mathrm{T}}\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right){\mathbit{\sigma }}_{\mathrm{f}}\right]}^{\mathrm{2}}〉\\ & =-\mathrm{2}\sum _{i=\mathrm{1}}^{N}〈{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{A}}_{i}{\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}{\mathbf{A}}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}〉\\ & -\mathrm{2}\sum _{i,k=\mathrm{1},\phantom{\rule{0.25em}{0ex}}i\ne k}^{N}〈{\mathbit{\sigma }}_{i}^{\mathrm{T}}{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbit{\sigma }}_{i}〉〈{\mathbit{\sigma }}_{k}^{\mathrm{T}}{\mathbf{S}}_{k}^{-\mathrm{1}}{\mathbf{A}}_{k}{\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}{\mathbf{A}}_{k}^{\mathrm{T}}{\mathbf{S}}_{k}^{-\mathrm{1}}{\mathbit{\sigma }}_{k}〉\\ & =-\mathrm{2}\sum _{i=\mathrm{1}}^{N}\mathrm{2}\mathrm{tr}\left[{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{A}}_{i}{\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}{\mathbf{A}}_{i}^{\mathrm{T}}\right]\\ & -\mathrm{2}\sum _{i=\mathrm{1}}^{N}\mathrm{tr}\left({\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{S}}_{i}\right)\mathrm{tr}\left[{\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{A}}_{i}{\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}{\mathbf{A}}_{i}^{\mathrm{T}}\right]\\ & -\mathrm{2}\sum _{i,k=\mathrm{1},\phantom{\rule{0.25em}{0ex}}i\ne k}^{N}\mathrm{tr}\left({\mathbf{S}}_{i}^{-\mathrm{1}}{\mathbf{S}}_{i}\right)\mathrm{tr}\left[{\mathbf{S}}_{k}^{-\mathrm{1}}{\mathbf{A}}_{k}{\left(\mathbf{F}+{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}\right)}^{-\mathrm{1}}{\mathbf{A}}_{k}^{\mathrm{T}}\right]\\ \text{(A7)}& & =-\mathrm{2}\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}\right)\left(\mathrm{2}+\sum _{i=\mathrm{1}}^{N}{n}_{i}\right),\end{array}$

where we have used the formula for the expected value of the quartic form given in Petersen and Pedersen (2012), Eqs. (5)–(7) and Eq. (15).

The third term of Eq. (A2) is given by Eq. (22).

From Eq. (A2), using Eq. (22) and Eqs. (A3)–(A7), we obtain the expression of the variance of the cost function:

$\begin{array}{ll}& \mathrm{var}\left[{c}^{\mathrm{min}}\left({\mathbit{\sigma }}_{i}\right)\right]=\mathrm{2}\sum _{i=\mathrm{1}}^{N}{n}_{i}-\mathrm{4}\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}\right)+\mathrm{2}\mathrm{tr}\left({\mathbf{A}}_{\mathrm{f}}^{\mathrm{2}}\right)\\ \text{(A8)}& & +\mathrm{4}\mathrm{tr}\left[\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right){\left({\mathbit{x}}_{\mathrm{t}}-{\mathbit{x}}_{\mathrm{a}}\right)}^{\mathrm{T}}{\mathbf{S}}_{\mathrm{a}}^{-\mathrm{1}}{\mathbf{A}}_{\mathrm{f}}\left(\mathbf{I}-{\mathbf{A}}_{\mathrm{f}}\right)\right].\end{array}$
Author contributions
Author contributions.

SC calculated the expected value and the variance of the cost function and wrote the draft version of the paper. NZ wrote the Python code of the complete data fusion and applied the procedure to determine the coincidence covariance matrix to the ozone simulated measurements. BC suggested the idea to use the cost function to determine the inconsistency covariance matrices and contributed to the interpretation of the results. SDB performed the simulation of the ozone measurements. UC and CT are, respectively, the principal investigator and the project manager of the AURORA project and coordinated the activity of the project. All the authors revised the paper.

Competing interests
Competing interests.

The authors declare that they have no conflict of interest.

Acknowledgements
Acknowledgements.

The results presented in this paper arise from research activities conducted in the framework of the AURORA project (http://www.aurora-copernicus.eu/, last access: 21 May 2019) supported by the Horizon 2020 research and innovation programme of the European Union (call: H2020-EO-2015; topic: EO-2-2015) under grant agreement no. 687428.

Financial support
Financial support.

This research has been supported by the European Commission, H2020 (AURORA, grant no. 687428).

Review statement
Review statement.

This paper was edited by Andre Butz and reviewed by two anonymous referees.

References

Aires, F., Aznay, O., Prigent, C., Paul, M., and Bernardo, F.: Synergistic multi-wavelength remote sensing versus a posteriori combination of retrieved products: Application for the retrieval of atmospheric profiles using MetOp-A, J. Geophys. Res., 117, D18304, https://doi.org/10.1029/2011JD017188, 2012.

Calisesi, Y., Soebijanta, V. T., and Oss, R. V.: Regridding of remote soundings: formulation and application to ozone profile comparison, J. Geophys. Res., 110, D23306, https://doi.org/10.1029/2005JD006122, 2005.

Ceccherini, S.: Equivalence of measurement space solution data fusion and complete fusion, J. Quant. Spectrosc. Ra., 182, 71–74, 2016.

Ceccherini, S. and Ridolfi, M.: Technical Note: Variance-covariance matrix and averaging kernels for the Levenberg-Marquardt solution of the retrieval of atmospheric vertical profiles, Atmos. Chem. Phys., 10, 3131–3139, https://doi.org/10.5194/acp-10-3131-2010, 2010.

Ceccherini, S., Carli, B., Pascale, E., Prosperi, M., Raspollini, P., and Dinelli, B. M.: Comparison of measurements made with two different instruments of the same atmospheric vertical profile, Appl. Opt., 42, 6465–6473, 2003.

Ceccherini, S., Raspollini, P., and Carli, B.: Optimal use of the information provided by indirect measurements of atmospheric vertical profiles, Opt. Express., 17, 4944–4958, 2009.

Ceccherini, S., Carli, B., Cortesi, U., Del Bianco, S., and Raspollini, P.: Retrieval of the vertical column of an atmospheric constituent from data fusion of remote sensing measurements, J. Quant. Spectrosc. Ra., 111, 507–514, 2010a.

Ceccherini, S., Cortesi, U., Del Bianco, S., Raspollini, P., and Carli, B.: IASI-METOP and MIPAS-ENVISAT data fusion, Atmos. Chem. Phys., 10, 4689–4698, https://doi.org/10.5194/acp-10-4689-2010, 2010b.

Ceccherini, S., Carli, B., and Raspollini, P.: Quality quantifier of indirect measurements, Opt. Express, 20, 5151–5167, 2012.

Ceccherini, S., Carli, B., and Raspollini, P.: Equivalence of data fusion and simultaneous retrieval, Opt. Express, 23, 8476–8488, 2015.

Ceccherini, S., Carli, B., and Raspollini, P.: Vertical grid of retrieved atmospheric profiles, J. Quant. Spectrosc. Ra., 174, 7–13, 2016.

Ceccherini, S., Carli, B., Tirelli, C., Zoppetti, N., Del Bianco, S., Cortesi, U., Kujanpää, J., and Dragani, R.: Importance of interpolation and coincidence errors in data fusion, Atmos. Meas. Tech., 11, 1009–1017, https://doi.org/10.5194/amt-11-1009-2018, 2018.

Cortesi, U., Del Bianco, S., Ceccherini, S., Gai, M., Dinelli, B. M., Castelli, E., Oelhaf, H., Woiwode, W., Höpfner, M., and Gerber, D.: Synergy between middle infrared and millimeter-wave limb sounding of atmospheric temperature and minor constituents, Atmos. Meas. Tech., 9, 2267–2289, https://doi.org/10.5194/amt-9-2267-2016, 2016.

Cortesi, U., Ceccherini, S., Del Bianco, S., Gai, M., Tirelli, C., Zoppetti, N., Barbara, F., Bonazountas, M., Argyridis, A., Bós, A., Loenen, E., Arola, A., Kujanpää, J., Lipponen, A., Nyamsi, W. W., van der A, R., van Peet, J., Tuinder, O., Farruggia, V., Masini, A., Simeone, E., Dragani, R., Keppens, A., Lambert, J.-C., van Roozendael, M., Lerot, C., Yu, H., and Verberne, K.: Advanced Ultraviolet Radiation and Ozone Retrieval for Applications (AURORA): A Project Overview, Atmosphere, 9, 454, https://doi.org/10.3390/atmos9110454, 2018.

ESA: Sentinel-4: ESA's Geostationary Atmospheric Mission for Copernicus Operational Services, SP1334, April 2017, available at: http://esamultimedia.esa.int/multimedia/publications/SP-1334/SP-1334.pdf (last access: 21 May 2019), 2017.

Fisher, R. A.: The logic of inductive inference, J. Roy. Stat. Soc., 98, 39–54, 1935.

Gelaro, R., McCarty, W., Max, J., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G. K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLI-D-16-0758.1, 2017.

Kalman, R. E.: Algebraic aspects of the generalized inverse of a rectangular matrix, Proceedings of Advanced Seminar on Generalized Inverse and Applications, M. Z. Nashed, Academic, San Diego, 111–124, 1976.

Kroon, M., de Haan, J. F., Veefkind, J. P., Froidevaux, L., Wang, R., Kivi, R., and Hakkarainen, J. J.: Validation of operational ozone profiles from the Ozone Monitoring Instrument, J. Geophys. Res., 116, D18305, https://doi.org/10.1029/2010JD015100, 2011.

Liu, X., Bhartia, P. K., Chance, K., Spurr, R. J. D., and Kurosu, T. P.: Ozone profile retrievals from the Ozone Monitoring Instrument, Atmos. Chem. Phys., 10, 2521–2537, https://doi.org/10.5194/acp-10-2521-2010, 2010.

McPeters, R. D. and Labow, G. J.: Climatology 2011: An MLS and sonde derived ozone climatology for satellite retrieval algorithms, J. Geophys. Res., 117, D10303, https://doi.org/10.1029/2011JD017006, 2012.

Miles, G. M., Siddans, R., Kerridge, B. J., Latter, B. G., and Richards, N. A. D.: Tropospheric ozone and ozone profiles retrieved from GOME-2 and their validation, Atmos. Meas. Tech., 8, 385–398, https://doi.org/10.5194/amt-8-385-2015, 2015.

Petersen, K. B. and Pedersen, M. S.: The matrix cookbook, available at: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf (last access: 21 May 2019), 2012.

Quesada-Ruiz, S., Attié, J.-L., Lahoz, W. A., Abida, R., Ricaud, P., El Amraoui, L., Zbinden, R., Piacentini, A., Joly, M., Eskes, H., Segers, A., Curier, L., de Haan, J., Kujanpää, J., Oude-Nijhuis, A., Tamminen, J., Timmermans, R., and Veefkind, P.: Benefit of ozone observations from Sentinel-5P and future Sentinel-4 missions on tropospheric composition, Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2018-456, in review, 2019.

Rodgers, C. D.: Inverse Methods for Atmospheric Sounding: Theory and Practice, Vol. 2 of Series on Atmospheric, Oceanic and Planetary Physics, World Scientific, Singapore, 2000.