Progress in turbulence detection via GNSS occultation data

The increased availability of radio occultation (RO) data offers the ability to detect and study turbulence in the Earth’s atmosphere. An analysis of how RO data can be used to determine the strength and location of turbulent regions is presented. This includes the derivation of a model for the power spectrum of the log-amplitude and phase fluctuations of the permittivity (or index of refraction) field. The bulk of the paper is then concerned with the estimation of the model parameters. Parameter estimators are introduced and some of their statistical properties are studied. These estimators are then applied to simulated log-amplitude RO signals. This includes the analysis of global statistics derived from a large number of realizations, as well as case studies that illustrate various specific aspects of the problem. Improvements to the basic estimation methods are discussed, and their beneficial properties are illustrated. The estimation techniques are then applied to real occultation data. Only two cases are presented, but they illustrate some of the salient features inherent in real data.


I Introduction
There is a long and distinguished history in the study of electromagnetic (EM) wave propagation through random media (cf. Tatarskii, 1971;Yeh and Liu, 1982;Ishimaru, 1997). These works have provided a firm theoretical foundation for estimating statistical properties of the neutral atmosphere and ionosphere via the statistical properties of the received EM signals. That is, the characteristics of the turbulent atmosphere can be deduced from, for example, correlation and/or spectral analysis of the phase and/or amplitude of the received signal. In the past, the bulk of the experimental analysis in this area has been performed with ground-based transmitters and receivers (e.g., radars and lidars), as well as with ground-based receivers and space-based transmitters. With the advent of Global Navigation Satellite System (GNSS) constellations, a new avenue has become available to investigate the turbulence properties of the earth's atmosphere. The deployment of an ever-increasing number of Low Earth Orbiting (LEO) satellites -with high quality, high-sampling rate receivers -provides a very valuable new source of turbulence measurements: GNSS-LEO occultations.
Previous efforts to study turbulence with RO signals have been focused on the study of the middle to upper stratosphere and the ionosphere (e.g., Gurvich and Brekhovskikh, 2001;Gurvich and Kan, 2003;Gurvich and Chunchuzov, 2005). On the other hand, very few works have addressed the characterization of turbulence in the upper troposphere and lower stratosphere. Numerous other practical applications can benefit by having turbulence measurements (and resultant climatologies) from GNSS-aircraft occultations. For example: space weather (Hajj et al., 2000), trans ionospheric communication links (Secan et al., 1997), aviation safety and navigation systems (Cornman et al., 2004;Conker et al., 2003), the accuracy of ground-based GNSS receivers (Gangulv^et al.?, 2004), as well as the effect of turbulence on measuring atmospheric state variables from GNSS-LEO occultations (Gorbunov and Kirchengast, 2005).
In this paper, a detailed analysis of the problem is presented, including the derivation of a model for the power spectrum of the lo g-amplitude and phase fluctuations of the permittivity (or index of refraction) field. The bulk of the paper is then concerned with the estimation of model parameters, such as the intensity and location of the turbulence along the line of sight. A thorough study of the properties of the parameter estimators is presented using simulated data. This includes global statistics over a large number of realizations, as well as case studies that illustrate various specific aspects of the problem. The application of the methodologies is then made with real occultation data. As the focus in this paper was the development of methods, the purpose of the real data analysis is to show how the theoretical and practical concepts carry over into the real data. Hence, only two cases are presented, but they illustrate some of the salient features inherent in real data. A more thorough analysis of real data is left to follow-on efforts.
2 Wave propagation theory

Assumptions
In order to make the problem tractable, a number of simplifying assumptions will be made. Most of these are routinely used in the literature for optical wavelengths (cf. Tatarskii, 1971;Ishimaru, 1997). Very little work has been done to justify many of these assumptions for the decimeter wavelengths in the GNSS problem; however, a few excellent papers exist, including Strohbehn and Clifford (1967), Clifford and Strohbehn (1970), Clifford and Lataitis (1985), and Lee and Harp (1969). We will list all of the assumptions here, but their use will be introduced below in the context of the development.
1. Polarization effects are ignored. This allows for the wave equation to be written in terms of a single scalar component of the electric field vector. This is a reasonable assumption for the decimeter wavelengths consider for the GNSS problem (cf. Clifford and Strohbehn, 1970).
Weak and single scattering is assumed. This is reasonable for the GNSS transmitter wavelengths and typical path lengths through turbulence (Clifford and Strohbehn, 1970).
The assumption that the transmitter wavelength, ;1, is much smaller than the smallest fluctuations in the permittivity field, 10, (; «l0), allows for a number of simplifications in the development (including the first two items above). While this is not necessarily true for decimeter wavelengths, Clifford and Strohbehn (1970) show that the practical effects of the assumption X «l0, can be extended into regime,), > 1 0 , but not ),» 10.
4. Taylor's Hypothesis is valid. That is, for short time periods advection dominates over internal fluid stresses (Monin and Yaglom, 1972). This is important for a medium that is moving, which in general is the case for the atmosphere. The practical effect of this assumption is that we can assume that the permittivity for a small scattering region is constant as it moves during a short time interval.
The Markov Approximation is valid (Tatarskii, 1971, Chapter 5, Section B). This means that there are minimal effects on the received signal due to correlation in the permittivity field along the line-of-sight. Therefore, one can assume that this correlation is represented by a delta-function.
6. The transmitter is a point source (i.e., we do not consider the radiation pattern), and the receiver acts as point aperture along the line-of-sight (i.e., we do not consider the gain pattern). Due to the typical large distances between the transmitter and the scattering medium, the first assumption is quite reasonable. If the distance between the scattering volume and the receiver is also large, and a high-gain antenna is used, the second assumption is also a reasonable one. For an omnidirectional receiver antenna and/or the receiver very close to the scattering volume (e.g., for an airborne receiver in the atmosphere), then these effects do need to be accommodated (cf. Lee and Harp, 1969;Tatarskii, 1971, Section 53).
7. Atmospheric attenuation due to absorption is assumed to be smooth and varying slowly during an occultation, and so that its effects can be accommodated by trend removal of the amplitude or phase signal in the time domain. This also means that we can consider the permittivity field to be real-valued.
S. The turbulence patch has compact support. This is required to ensure the existence of various integrals that arise in the calculations.
9. The effects due to an inhomogeneous background (e.g., refraction) are minimal. This means that we can assume straight-line propagation.
10. The medium is purely random. There are no deterministic structures, such as layered phenomena.
Most of the assumptions presented above are reasonable for a first-order analysis of the turbulence problem. Nevertheless, there are two aspects of the problem that are not addressed in this initial study, the effects due to a background or turbulent permittivity field characterized by inhomogeneity and/or anisotropy. These are obviously real-world situations, but as we are trying to keep to a simple analysis at this point, they are not included in what follows.

The first Rytov approximation
If polarization effects can be ignored, the vector wave equation for monochromatic signals, derived from Maxwell's equations can be written in the scalar form, where U is a component of the electric field (a transverse component for our case), k is the (constant) wavenumber for the transmitted frequency, k = 27r f/c = 27r/;^, and E is the permittivity of the medium. The index of refraction, n, is given by F = n 2 , and free space is characterized by F = 1. Since we have assumed weak scattering, a perturbation approached is used. Re-write Eq. (37) as where E _ (E) +E', and E' << (E). It is assumed that (E) = 1, its free-space value. The angled brackets refer to an ensemble average. Standard approaches to solving this equation are provided in the literature (e.g., Tatarskii, 1971;Ishimaru, 1997) -one of which is the first-order Rytov approach. We return to the Helmholtz wave equation and let where the complex phase, Ali, is given by and X is the log-amplitude function, and S is the phase function. Plugging Eq. (39) into Eq. (37) and solving for Uo i/r1 leads to the first-order term (cf. Tatarskii, 1971;Ishimaru, 1997), Where ik r r' Go(r -r') = e  47c Ijr -r'l = differential operator? is the free-space Green's function for the problem (the double vertical lines indicate the vector magnitude). Equation (40) is known as the first Rytov approximation.

Correlation function and frequency spectrum
We depart from the standard development as presented by, for example, Tatarskii (1971) or Ishimaru (1997), going forward. We consider the situation where the transmitter, receiver, and media are all moving relative to each other, as well as with respect to a fixed coordinate system. Typically, the origin of the coordinate system is set at the transmitter or at a point within the scattering volume. However, in what follows it is assumed that the origin of the coordinate system is at an arbitrary point. For the GNSS-LEO geometry, we will want the origin of the coordinate system to be at the (local) center of the earth. In the arbitrary coordinate system, the receiver is at the point r R , the transmitter is at rT, and a given scatterer is at rs (see Fig. 1) (we are considering a continuum field of permittivity, so the term scatterer is used loosely). We assume that multiple scattering does not occur, and that the transmitter acts as a point source, (i.e., an outgoing spherical wave), which leads to II r Rr Sll Ilrs-rTII In order to ensure the existence of the integral, we assume that the permittivity fluctuations have compact support, and E'(rs) is non-zero (or has a minimal effect on Vi I (r R )) in a region close to the line of sight, r R -r T -If E'(rs) is arandom field, we assume that Eq. (42) exists in the sense that it is a member of an ensemble of random permittivity fluctuations. We will assume straight-line propagation. Return to Eq. (42). If the scattering angles are small -a reasonable assumption if the ratio )./lo is small -we can make the so-called parabolic approximation in the integral equation (Tatarskii, 1971, Section 45) (lo is a length scale on the order of the smallest fluctuations in the medium). In order to use this approximation, we will consider basis vectors that are parallel or perpendicular to the vector r R -rT. We will denote the component of vectors parallel to r R -r T , "x" and the perpendicular parts, p (i.e., the projection of a given vector onto the plane perpendicular to r R -r T ). Note that PR -PT = 0. Consider the quantity, II r Rr sll = [(xRx s)' + 1PR -Ps111 1/ . We will assume that XT < XS < xR . If xR -XS, the distance along the x axis from the scatterer to the receiver, is much greater than the transverse distances (valid for small-angle scatter- This is the parabolic approximation. A similar expansion can be made for II r R -rT IL, which is simply X R -XT. Since kII r R -rSII = 2nll r R -r sll/^^» 1, we will use the full expression in Eq. (43) for the exponential terms in Eq. (42); whereas, we only need to keep the leading term for the non-exponential terms. Plugging all these factors back into Eq. (42), we have for the non-exponential terms II r R -r TII _
which we will denote as b (XR ,XS ,XT ). It can be shown that the exponential term from Eq. (42) is given by Plugging Eqs. (44) and (45) Ps -pill e (rs)drs (10) If we move the origin to r T , this is equivalent to setting XT = 0 and P T = 0, and since we have assumed that PR -PT = 0, this means that P R = 0. Equation.
(46) then describes the field, k2 xR which is equivalent to Eq. (49)-(5) in Tatarskii (1971). Therefore, we see that Eq. (46) has the correct limiting form when the origin is placed at the transmitter. Next, consider the transmitter, receiver and atmosphere moving relative to each other -and all with respect to a fixed origin, e.g., the center of the Earth. Since the longitudinal motions do not chance the value of the correlation of the field in an appreciable manner (Klvatskin, 1970), we assume that the "x"-positions are not changing, and so the velocity vectors described below are the projection of the three-dimensional velocity vectors onto the plane perpendicular to r R -r T . We consider a short enough time interval, T = t2t i , such that P; ( t2 ) = P i ( t I ) + V; T, (where "i" stands for R, T, or S). For this work we will assume that the velocities are constant over all (short) time intervals (i.e., Vi (T, ) = Vi ); that they are the same for all the scatterers (i.e., Vs, = Vs,); and that they are deterministic quantities (i.e., (V;) = Vi ). The same procedure used to derive Eq. (46) can be applied to calculating Vi i (r R (t2 )), except that we now use basis vectors parallel and perpendicular to r R -rT + (V R -V T ) T (see Fig. 2). Unfortunately, this will complicate matters later on when we calculate the correlation functions; which in turn contain products of Vf I (rR(tt)) and VfI ( r R (12)). That is, we will have to transform the components of vectors in one basis into the other. However, if the two vectors rR -rT and rR-rT+(VR -VT) Tare close to parallel we can then choose the basis vectors perpendicular to them to be identical. This assumption is valid to first order if IIVRTII «IIrR -rTII and IIVTTII «IIrR -rTII. In this case, the components of a vector in each basis system will be the same -up to the level of approximation used. Given the distances involved in typical GNSS-LEO occultations, these conditions will be met; hence, we can safely assume that the same basis can be used for calculations involving the components of both r Rr T + (V R -VT ) r and rR-rT.
where the subscripts "1" and "2" on xs and ps have been used to indicate that we are considering scatterers at the points rs, and rs" respectively. The important distinction between deterministic and random quantities is that the latter should be quantified via their statistical properties. For our purposes, we will consider second-order correlation functions, and subsequently the associated Fourier frequency spectrum. Since *1 = X1 +iS1, it can be seen that X1 = ReVi l = (V f l +** )/2 and S 1 = IMVfI = (r/rl -,P ) /2i.ElEvhat do Im stands for? It is straightforward to show that the correlation function of the lo;-amplitude field is g iven by B X, (PR ( t ] ), PR ( 12)) = 2 [Re(VGl (PO O) V/1 (PR(t2))) +Re(*l (PR(h))'4 (PR(t2)))] For the correlation function of the phase, the first term in Eq. (16) changes sign; and for the cross-correlation of the log-amplitude and phase, the real parts are replaced by imaginary parts and the second term in Eq. (16) changes sign. In order to obtain simplified expressions for the double integrals in Eq. (16), two further approximations are invoked. First, we apply Taylor's Hypothesis (Monin and Yaglom, 1972) to the field E' (rs 2 (t2 )) = E' (rs2 (t l ) + V S2 t). This hypothesis assumes that: due to pure advection, -( r S2 (t2)) = E' (r S2 (t l )). That is, the permittivity fluctuation due to a scatterer at r S2 (tl ) will have the same numerical value after it moves to rs 2 (t2 ). This means that the correlation function of the permittivity field can be written as Br, (r S, (h ), r s2 (t2)) = (E' (r S, ( t l )) E' ( r s2 (t2))) _ (E ' (r s, (tl )) E ' (r S, ( t 1 ))) (it is assumed that the permittivity field is a real-valued one). This allows for Eqs. (51) and (52) to be written solely as a function of t i , so we can drop that notation except for where it is needed explicitly. Next, we invoke the Markov approximation (Tatarskii, 1971, Section 64 For a homogeneous field, the correlation function can only be a function of ps, -ps,and xs, -xs" and furthermore if the field is also isotropic it can only be a function of 11 P s, -Ps2 Î and X S , -xs, I. Under the Markov approximation it is assumed that there is no correlation of the permittivity field in the longitudinal direction, and hence where AF, is the correlation function in the transverse coordinates. Using Eq. (19), the two quantities, (*1 (PR( t l))VI I (PR(t2))) and (V 1 1 (PR( t 1))'P1 (POD)) can be evaluated as and, (21)  2 In Eqs. (20) and (21), ^o,, (O, IIgIU is the three-dimensional spectrum of the (assumed) isotropic permittivity fluctuations with q = (q,,,qz ), tt = (XS ,+xs 2 )/2, and b(q) =

R
where R = XR -xT. We need the real and imab inary parts of Eqs. (20) and (21) to calculate the correlations. Equation (21) is real-valued, so its imaginary part is zero. The real and imaginary parts of Eq. (20) will result in a cosine and a sine term, respectively, in the integral. Except for these trigonometric terms, the integrands for Eqs. (20) and (21) are the same. Therefore, using the trigonometric identity, 2sin2 0 = 1 -cos20, the correlation function of the log-amplitude fluctuations is given by where we have used the spectrum of the index of refraction field, (p" , instead of that for the permittivity field. (For weak scattering, cp,, ti 4rp",.) We have also indicated the explicit ranges of the integrals. If we set V R = V T = 0, this means that V ,, ^ -V S , and as above, if we move the origin to the transmitter we recover Eq. 09)--(11) in Ishimaru (1997). So www.atmos-meas-tech . net/5/ l/2012/ Atmos. Meas. Tech., 5, 1-20, 2012 again, we see that our results have the correct limiting form as presented in the literature. Using 2cos 2 0 = 1 +cos20, the correlation function of the phase fluctuations is the same as that for the log-amplitude case, excepting that the sinesquared term becomes a cosine-squared term. The crosscorrelation function is of the same form excepting that the sine-squared term becomes a sine term -without the factor of two in the denominator of the argument -and the coefficient in front is given by 7r 2 k2 /2. (After we performed the derivation described above, a reference was found that provides a very similar result: Strohbehn, 1974) To compute the frequency spectrum, we take the cosine transform of Eq. (22). We define the cosine transform of a function g(t) as After performing the integration over rand 11411^'e get (Comman, et al., 2009) 111-113,iay]] (171 (25) 51'(5/6) where we have used the isotropic von Karman form for the index of refraction spectrum 2 _11/6 cP",(0.114112)=ACn,(r))I119112+(2nIL) ] where A is a constant, C 2 ,(11) is proportional to the intensity of the turbulence ("structure constant") at the point r), R is the distance along the line of sight from the transmitter to the receiver, k is the transmitter wavenumber, and L is a length scale related to the correlation function for the index of refraction field. Also in Eq. (25), we are assuming a patch of turbulence centered at q = ri i with extent 077; and we have denoted a = q(R-rl)/kR, z =2nf/V7,, y= X2 +2n/L' Tis the gamma function, and U is the confluent hypergeometric function of the second kind. The frequency spectrum of the phase is given by an expression similar to Eq. (25), excepting that the minus sign in the curly brackets is replaced by a plus sign. If the turbulence patch is narrow enough along the line of sight, then the integral over > can be approximated Amplitude Spectrum for Different q Values by a simple mid-point formula, Since it is a multiplicative factor, C2,(i7 j )Ar7 simply changes the overall amplitude of the spectrum, and changes in the turbulence length scale essentially raises or lowers the spectral amplitudes at the lowest frequencies. The changes in q, are more complicated, affecting both the amplitude and the frequency of the oscillations -as can be seen in Fig. 3 between one and two Hz. The variations in these parameters are not distinct, i.e., changes in one parameter can look like changes in another one. This fact makes parameter estimation difficult, as will be discussed subsequently.

Parameter estimation
Assuming that the received signal is a Gaussian random process, its spectral values at each frequency will have an exponential distribution. This result comes from the fact that the real and imaginary parts of the Fourier transform of the signal (divided by the variance) are both standard and independent Gaussian random variables. Hence, the spectrum, which is proportional to the sum of the squares of these quantities, is a sample from a Chi-Squared distribution with two degrees of freedom (cf. Theorem 8.8 in Freund, 2004). Furthermore, a Chi-Squared distribution with two degrees of freedom is an exponential distribution with mean and variance equal to two (cf. Definitions 6.3 and 6.4 in Freund. 2004). Therefore, the spectrum is distributed exponentially with mean and variance equal to twice the variance of the signal. It is important to note that this randomness is due to the random permittivity Atmos. Meas. Tech., 5, 1-20, 2012 www.atmos-meas-tech.
Amplitude Spectrum (ML) method (Freund, 2004). Consider a model of the form, field -not the additive receiver and processing noise. Using the fact that the spectral points are exponentially distributed, numerous realizations of the spectra can be generated by replacing each spectral point with a sample from an exponentially distributed random number sequence. An example of this is shown in Fig. 4. The exponential probability density function is given by, where P is the mean value for the distribution. The probability distribution function is given by P(x < b) = 1exp(-b/,B), so that P(x < 0) ti 0.63; and hence, more often than not, samples from this distribution will be less than the mean, as can be seen in Fig. 4. For an exponential distribution, the standard deviation is equal to the mean. Hence, for the simulation, the spectral level at a given frequency is given by replacing that point by a sample from an exponential distribution with its mean value being the theoretical spectral value at that point (i.e., from Eq. 27). The simulation is intended to represent real-world situations; hence, it was assumed that the received signal would be available at 50 Hz, and approximately 10s worth of data would go into a single spectrum. The parameters used in the simulation are: Cn, O 17 = 4 x 10-9 , L = 5 km, and 17 1 = 20 200 km. Furthermore, a noise floor, Q = 3.5 x 10 -5 , is added to the data. The magnitude of this level is derived from COSMIC GNSS-LEO data.
Referring back to Eq. (27), the parameters to be estimated are C 2 (g l )O17, L, >71 and Q: the turbulence intensity, the turbulence length scale, the location of the center of the turbulence patch from the transmitter, and the noise level. As the functional form of Eq. (27) is highly non-linear in L and 1 71, a simultaneous solution for all four parameters is complicated. The preferred solution is via the maximum likelihood where "i" refers to the frequency, f;, c = Cn,(r1 1 )A17, and (pi is the right-hand side of Eq. (27), divided by c. Consider a measured spectrum, denoted X i . It is assumed that the Xi are independent exponential random variables with expected values, (X i ) _ g i . The likelihood function is then given by where N is the number of frequency values (it is assumed that by context, the reader will not confuse the use of L for the likelihood function as well as the turbulence length scale). The negative log-likelihood function is The ML estimates for the parameters are found by solving the coupled set of non-linear equations, a(-lnL)18p j = 0, where the p j are the parameters, c, Q, L and '71 . Instead of dealing with this formidable problem, a simpler twodimensional problem will be discussed to motivate the techniques. It will be assumed that the length scale L is known. Since the model, µ i , is linear in c and Q, a linear weighted least squares approach can be emplo yed to estimate those two parameters. Once that is done, these values can then be used in a one-dimensional ML solution for 17 l . Some variations on this approach can also be used, and will be discussed below. Consider the least squares problem for c and Q. It will be the simultaneous solution to the coupled equations, 8Rldp j = 0, N (pj = {c, Q)), where R = (µ; -X;) 2 w;, and the weights i=1 w i are independent of c and Q. If the weights are chosen such that w; _ ^, I , then this is similar to aChi-Squared minimization problem. The solution for the parameters can then be solved via Cramer's method for a 2 x 2 s ystem of equations, giving An alternative is to calculate Q directly from the highfrequency parts of X i . Denoting this estimate, Q, a one-dimensional weighted least squares solution then gives A further alternative -the one that will be employed in the following -is to calculate c via a "pseudo" ML method. That is, we assume that the data X; is replaced by X; = X; -Q, and then solve for c via the ML formula, Equating this to zero, gives the desired ML-like estimate for c, Plugging this back into Eq. (31) gives Where the explicit dependence on 11 has been indicated (recall that we are not including dependence on the length scale L at this point). Note that the last term is independent of the data and can thereby be computed a priori. Equation (36) is now a one-dimensional function and the solution for r7 1 is given by minimizing it with respect to >. For the results presented below, we merely take the minimum value of Eq. (36) as a function of q as the estimate for > 1 . Once the estimate for 17 1 is determined, this value is plugged back into Eq. (35) to estimate c. Different values of c will shift the negative log-likelihood curves up or down (for simplicity in terminology, the phrase "likelihood" will mean "negative log-likelihood."). Figures 5  and 6 show the results of Eq. (36) when varying q j , and L. Changes in r7 i and L change the likelihood function in more complicated fashions, not only varying the level of the curves, but also their shape. The flatness of the likelihood function at the smaller > poses a problem for the >7 i estimation -one would like a steep bowl leading down to the correct minimum. The flatness means that variations in the likelihood function -due to the exponentially distributed random errors in the spectral values -can cause a false minimum to be smaller than one for the true minimum. A simple method was developed to deal with this problem: using a weighting function in the denominator of Eq. (35). This function was chosen to be solely dependent on q. The only portions of the line of sight that are of interest are those that go through the atmosphere or at least portions of it where turbulence can be stron c, enough to disturb the signal significantly. For example, excluding outer regions of the ionosphere where the permittivity perturbations due to fluctuations in the electron density field can be small. Hence, the weighting function can be chosen such that it is close to one for those portions of interest along the line of sight, and drops smoothly to zero away from those regions. The effect of the weighting function on the likelihood function is illustrated in Fig. 7. It can be seen that the likelihood function is unchanged essentially, between approximately 18 000 to 22 000 km. This is important, as the purpose here is not to bias the estimates, but to try to eliminate non-physical estimates.
In order to examine the statistical properties of the c and r11 estimators, 1000 realizations of the simulated spectra were generated. Before discussing these results, however, some preliminary concepts are presented.

Theoretical error analysis
Since the random errors in r7 i are not easily described theoretically, these will be analyzed in the context of the simulation results. However, the error statistics of c can be studied analytically. Consider then the estimator for c as given by Eq. (36). Recall that this was referred to as a pseudo-ML estimator. This is because the correct ML estimate is given by t.ikelin000 runction: wei ntl which is now non-linear in c, thus requiring a numerical solution. Therefore, we restrict our analysis to the "pseudo-ML" estimate. First, consider the expected value of c, tPi ( rll ) Which shows that this is an unbiased estimator of c (where it has been assumed that Q is an unbiased estimator of Q.and q , is the true value for the parameter >). Next, consider the variance of c -with the same assumptions made above, Since it has been assumed that the X i are independent, exponentially distributed random variables, their variance is equal to the square of their mean, and Eq. (42) becomes N (c^Pi (7) Therefore, it can be seen that the bias is a function of the sum of the ratios of the spectrum evaluated at the true and estimated q 1 values, respectively. Note that if r1 i = q 1, then the bias will be zero, as expected.
The variance of cis given by Eq. (43), but with f11 in the denominator. Note that if Q = 0, the variance of c is Since the X; are exponentially distributed random variables, the random error is significant -as can be seen in Fig. 4. In order to reduce the random error, averaging spectra is helpful. Consider the statistics of the estimator of c with averaged spectra (denoted c). The averaged spectral values are where j indexes the elements that go into the average -of which there are M. The estimator, c, is then given by Interchanging the order of summation and summing over i Var(c)=Var E It can be shown that the variance for a linear combination of independent random variables, y; , is given by (Freund. 2004)

J=1
Var^a;y; = Ta?Var(y;) where it was assumed that the (cj) are identical. This shows N` i=1 [(P, ( 111)  where it was assumed that the variances of cj are identical. This shows that averaging the spectra reduces the variance of c by a factor of I/ M. As seen in Eqs. (44) and (46), the ratio coj (rf l )/rp; (f I ) is important in determining the error properties of the estimator c. In order to study this further, we will need the follcming concepts. The probability density function for a gamma distribution is given by If X is gamma-distributed with parameters a and P, and v is a positive constant, then v X is gamma-distributed with parameters a and vp. Since the spectral values X; are assumed to be independent exponentially distributed random variablesor gamma-distributed with a = I and P = µ iit follows that the ratio X,/µ; will be exponentially distributed with mean one (use the property just discussed, with v = µ; ). If the µ; are evaluated with the estimates for Q, c and il l , then the ratio should be approximately distributed as an exponential distribution with mean one. The same holds for averaging the spectra, but here the ratio X; /µ; will be gamma-distributed with a = M and = 1 /M (Freund, 2004) (recall that X; is given by Eq. 47). These properties can then be used to test the quality of the estimation methods via a goodness-of-fit analysis.
No Weighting Weighted No Weighting, Weighted, Averaged Averaged

Simulation studies
As mentioned above, 1000 independent realizations of the amplitude spectrum were generated. Equation (37) was used to calculate estimates of r1 i , and then those are in tarn are used to estimate c via Eq. (36), with q = t) 1 . Recall that the true values for these parameters are rf l = 20 200 and c = 4 x 10 -9 . There are four different estimation methodologies that were employed: the pseudo-ML method, one in which a weighting function is used, and then the same methods applied to averaged spectra. Five spectra were used in the averaged-spectrum analysis. A discussion of the overall estimator performance is presented first, followed by a set of case study analyzes that deal with more specific aspects.

Overall statistical performance
Figures 8 and 9 illustrate the overall simulation results. These are hybrid plots, with both the coloring and widths being related to the number of samples at that specific parameter value. Furthermore, the smaller purple boxes indicate outliers. Tables 1 and 2 present some descriptive statistics for the results. Most of the statistics used in the tables are order statistics, i.e., quantiles and inter-quantile ranges. These metrics are more robust in the presence of outliers, and so they are more appropriate here than other dispersion metrics such as skewness and kurtosis. It can be seen that the median values are all quite close to the expected value, reflect-ing the comments above about how the estimators are unhiased. Those comments referred to the estimator for c, but that fact is also represented in the median value of the estimator for 1 1t . The use of the weighting function has the desired effect: it eliminates the most egregious outliers. However, from the tables and the figures, it can be seen that the decrease in outliers comes at the price of increasing the errors in the "middle" statistics (e.g., the 40-60 and 30-70 quartile ranges). It is clear that spectral averaging is a powerful method for reducing the random errors -even with averaging only five spectra. The beneficial effects of spectral averaging are far greater than using a weighting function. A definitive bias towards larger outliers in c can be seen in Fig. 8. This is more clearly exemplified in Fig. 10, a contour-scatterplot of the c estimates versus the q I estimates. The horizontal and vertical lines indicate the true value of those parameters. The solid black curve comes from the stem in Eq. (44) and subsequently embodied in the equation for the bias, Eq. (45). This plot show how the bias is a fundamental aspect of the problem, not an artifact of the processing methodology. Recall that the bias comes from the sum of the ratios of the true amplitude spectrum (i.e., with the true value of q, ) divided by the amplitude spectrum evaluated at a different qt. As can be seen from Fig. 3, excepting for the lowest frequencies, the spectrum for the rl t = 19200 curve is always lower than that for q t =20200; hence, the sum will be greater than one, thus producing an overestimate of c. Alternatively, if the estimated value of q t is greater than the true value, the sum of the ratios will be less than one and will give an underestimate of c. This explains the biased structure for the estimates of c, as seen in Figs. 8 and 10. A significant concern in any estimation algorithm is the identification and elimination (or mitigation) of outliers. In the results just presented, it could be seen that the overall statistical performance of the estimators is quite good, but in a real world application one wants to know if a given parameter estimate is accurate or not, This is a difficult problem in general and even more so in this application because the pa rameters do not have independent effects on the estimators. As mentioned above, the ratios of the true (in this case, simulated) spectra to the model spectra with the estimated parameters should be approximately distributed as an exponential random variable with mean one. The collection of ratios in question is taken over the set of frequencies samples for the given realization. An investigation was made into whether fitting an exponential distribution to the ratios could be used to determine the accuracy of the parameter estimates. This is a standard technique in the category of "goodness-of-fit" tests. A number of tests were employed: Anderson-Darlinig, Cramer-von Mises, Kolmogorov-Smirnov, Kuiper, Pearson chi-squared, and the Watson U' test (Lindgren, 1968). Each test produces a so-called test statistic, which then leads to the probability (or "p-value") that the given data comes from the specified distribution. For each realization, the ratios were calculated and then tested as to how likely they were to come from a mean-one exponential distribution. Figure I 1 illustrates the results using the chi-squared test, giving the p-value as a function of the c estimate for each of the 1000 realizations. The color-coding is proportional to the number of counts at that (c, p-value) grid cell. An optimal result would produce something like an upside-down "V" pattern (though not uniform in the number of counts), with the base of the two legs at the zero p-value axis, and the apex at the true c value and a p-value of one. That is, the higher the p-value (i.e., closer to one), the better the estimate -and vice-versa. As can be seen from the figure, the chi-squared test is clearly not optimal. There is skill, but there are too many mid to low p-values associated with good c estimates, as well as some poorer c estimates with high p-values. In other words, the p-value for this test could not be used as a single criterion for identifying outliers. The Pearson chi-squared test is used so frequently that a comment is warranted as to one reason that it does not perform well in this case. As discussed in Lindgren (1968, p. 486), the Pearson test is not well-suited to continuous distributions, the difficulty laying in choosing the appropriate binning strategy. The other tests, such as the Kolmogorov-Smirnov and Anderson-Darling, which use the distribution functions directly, do not have this limitation. However, it turned out that none of the individual tests had sufficient skill to be used in outlier detection. Thus, a simple hybrid method was used as the goodness of fit metricthe maximum p-value over all the tests. The results of this method for the c parameter estimation are shown in Fig. 12. While certainly not optimal, this approach is a significant im-8. X 10-9 7. X 10-9 6. X 30 -9 ro 5.X10-1 N w 4.X10-9 3. X 10-9 2. X 10-9 Average provement over the individual tests. For the rl l estimates, there was more skill than in the c estimation scenario, but as a stand-alone outlier detection method, this goodness-of-fit method is still not adequate. In the next section, a number of case studies are discussed, which leads to a better understanding of the difficulties in the estimation and outlier identification problem.

Simulation case studies
Five cases out of the 1000 realizations are analyzed in the subsequent material: realizations 64, 177, 306, 356, and 127. The first four were chosen to study the relationship between p-values and parameter estimates. Specifically, the four cases illustrate the combinations of high and low pvalues with both good and bad parameter estimates, (i.e., high-good, high-bad, low-good and low-bad). Realization 127 is used to show how statistical comparisons between the log-likelihood functions derived from the simulated data and parameter estimates can be misleading. The first case studied is realization number 64. Figure 13 through Fig. 15 show the spectra, likelihood function and dis-tribution of ratios for this realization, respectively. Figure 13 shows the trite spectrum (black), simulated spectrum (red), and the spectrum with the estimated c and nt values (green). Another spectrum function is plotted, (labeled "fit," in blue), NN here the parameters were set "by-eye" to get a better fit to the likelihood function (more on this below). Figure 14 illustrates the likelihood functions derived from the true spectrum (black); the simulated spectrum (red); the spectrum from the estimated parameters (green); and the spectrum from the fitted parameters (blue). Figure 15 shows a histogram of the spectral ratios using the spectrum derived from the estimated parameters, a carve representing a smoothed histogram (red), and a curve which is a mean-one exponential density function (black). Table 3 gives the estimated values of c and q a "low" and "high" rating of the p-values, and a chi-squared type of error function, computed as

EE-P ( -InL D( r1r) -( -InL E( rlt))]' (52)
, where, the sum is over all q values (15 000 to 25 000 km with a resolution of 10 km, i.e., P = 1000), -InL D is the negative C'hi-Square Test Max over all Tests a a 1.x10-9 2.x10-9 3.x10-9 4.x10-9 5.x10-9 6.x10 -9 7.x10-9 8.x10 -9 Estimated c" Values log-likelihood calculated from the data, and -InL E is that derived from the estimated parameters. Another statistic, EF, is calculated using the log-likelihood function from the fitted parameters instead of that from the estimated ones. Note that these statistics are not goodness of fit metrics for the fit to the spectrum, but rather for the fit to the likelihood function. The reason for this choice is that the parameter estimates come from the likelihood function -not the spectrum. From Table 3, it can be seen that the parameter estimates are quite poor for realization 64, and the error statistics are moderately large (relative to the other realizations). This impl ies that in this case, neither the E E and EF statistics, nor the difference between them would indicate that the parameters are significantly in error. The p-values for this realization are very high, which indicates that they are also not helpful in determining whether the parameters are outliers. This is reinforced by how well the distribution of the ratios matches the exponential curve -as seen in Fig. 15. As mentioned previously, the random errors for an exponential distribution are large, and hence the log-likelihood function can have undulations, which can result in a global minimum offset from s  the "true" minimum. It was also mentioned that this issue is especially problematic for smaller q values, where the likelihood function has a shalloNN incline. This is exactly what occurs with this realization, as clearly seen in Fig. 14. The model parameters chosen "by eye" produce a somewhat better fit to the empirical loo-likelihood function. This was a surrogate for a more sophisticated algorithm which might find the best fit to the empirical log-likelihood by varying the c and q, (as well as L) parameters as part of a threedimensional minimization algorithm. This gives hope for more accurate parameter estimation by using a more sophisticated minimization method, e.g., estimating L in addition to c and q, . Since the values for c and rj l are more important for practical applications, the error induced by lowering the value for L, in exchange for more accurate c and q, estimates is acceptable. However, as seen in Figs. 5, 6, and Eq. (37), the effect on the log-likelihood function due to variations in these parameters are not independent. This is quite apparent in the effect that changing q (cf. Fig. 5) has on the log-likelihood function. That is, both the level and minimum value change with varying q.
Realizations 356 and 177, both have low p-values with poor and excellent parameter estimates, respectively. EE is quite large for realization 356; however, adjusting c in a Ini-nor fashion (to 4.3 x 10 -9 ) and L in a more significant way (to 9), produced an excellent fit to the empirical likelihood function (EF = 10). Another interesting aspect of this case is that the fitted log-likelihood function, which produced an excellent fit to the empirical one, is still off in level to the true one. This is due to the random errors that produce a simulated log-likelihood function that is off in level. This means that minimizing the E F function can still produce parameters that are in error. Realization 177 resulted in excellent parameter estimates, as well as a very good EE value. The fit did not change the parameter estimates appreciably, although EF is better than E E . This case is interesting in that the estimated log-likelihood function is well matched to the true one -as seen in Fig. 16 -whereas the fit log-likelihood function is well matched to the simulated one. Of course, in a real world scenario, the true lob likelihood function would not be available so one would have to depend on the fit to give all the information.
This case also brings up an interesting point, one that is even more apparent in realization 127 (cf. Fig. 17): that good parameters estimates only require that the estimates of the level (c) and location of the minimum (r71 ) match those of the true data, i.e., at a single point in the (q j , -In L D ) domain. On the other hand, getting a good fit to the log-likelihood function requires reasonable matches at many points. No combination of parameters might produce a good fit to the "measured" (real or simulated) log-likelihood function; however, if the fit and true curve matched at just the right point, the estimates would be perfect. This implies that the fitting method should be more robust in parameter estimation -or at a minimum it could provide information regarding data quality. The hope that p-values could provide some measure of confidence in the parameter estimation was not fulfilled. Nevertheless, it was found that distributions with numerous outliers (i.e., large ratios) tended to produce lower p-values. This leads to two possible augmentations for the future: recalculating the p-values with the outliers removed, and using that as a revised confidence measure; and removing the outlier points from the parameter estimation process.

Weighting and spectral averaging
As mentioned above, other techniques that can improve parameter estimation entails the use of a weighting function in calculating the log-likelihood function, and spectral averaging. Figure 18 shows the benefit in using a weighting function. The red and brown curves are the un-weighted empirical and estimated likelihood curves, respectively, for realization 64. Recall that the parameter estimates for this case were poor (c = 8.4 x 10 -9 and 71 1 = 17210). Since the q, estimate is on the low end of the 71-range, the use of a weighting function should improve the estimates. As can be seen from the figure (blue and green curves for the weighted empirical and estimated likelihood curves, respectively), the new estimates are superior to the un-weighted method: c = 3.98 x 10 -9 and 711 = 20150. Tables I and 2 give the statistical results over 1000 realizations. Next, we turn to spectral averaging. A small number of realizations (5) are averaged to show the benefits of this method. Of course, for a GNSS-LEO occultation geometry, the vertical extent of the atmosphere covered over the period required to average five spectra can be large. Running averages could be used, but then the samples would not be independent and the effect of averaging on the variance -as given by Eq. (50)would be lessened. Figure 19 illustrates the benefit of averaging on reducing the random errors. The red curve is the averaged spectrum and the black curve is the true spectrum. Comparing this case with, for example, the case shown in Fig. 4, it can be seen that the averaged spectrum is matched better to the true spectrum, and hence the parameter estimation should be improved similarly. Figure 20 shows the log-likelihood functions from the true and averaged spectrum. Again, the beneficial effects of averaging can be seen, i.e., the log-likelihood function generated from the averaged spectrum is better matched to the true log-likelihood function. Tables I and 2 give the statistical results over 1000 realizations. It can be seen that by reducing the random error, spectral averaging improves the parameter estimation mach more significantly than the weighting scheme. It was shown above that when averaging spectra, the distribution function for the ratios becomes a gamma distribution with parameters a = M and = I /M; where M is the number of spectra going into the average (here, five). This can be seen in Fig. 21 with the simulated data.

GPS-COSMIC occultation case studies
The theory, techniques, and observations presented in the previous sections would be inadequate if they did not pertain directly to real data. In this section, an analysis of two case studies from GPS-COSMIC occultations is presented. The purpose of these case studies is to show how the methods and results from above apply to the real world scenario -not as an in-depth analysis of the real data. These cases were selected since they were from a region in which there Fig. 20. Log-likelihood function, as a function of q1 in k1n, derived from averaged and theoretical amplitude frequency spectra. Five spectra, beginning at realization 25, were averaged.
was vigorous convective activity, as well as turbulence reports from pilots in the vicinity. In each case, a 500-sample analysis window was selected and processed using the same methodologies as used with the simulated data. There is one difference, however. A trend was removed from the amplitude time series data prior to the analysis. The trend was computed via a running median filter with a width of 2001 points, or approximately 40 s. The appropriateness of the filter window was checked by verifying that the residuals had a mean that was close to zero. The noise level was computed by taking the median value of the log-amplitude spectrum over the last 50 points (i.e., the high-frequency region). Figure 22 shows the amplitude time series over the course Of occultation 2 (arbitrary nomenclature). The black curve is the raw data, the red curve shows the trend, and the two ver- Fig. 22. SNR data for COSMIC occultation number 2 (black). Trend curve is in red and the vertical, dashed blue lines indicate the analysis window.
tical dashed blue lines delineate the analysis window. This window was chosen because it had large-amplitude fluctuations, and could be used to test subjectively whether some of the assumptions presented in Sect. 2.1 were violated; specifically whether inhomogeneity and/or layered structures would affect the results. Figure 23 shows the spectra of the data (red), one using the estimated parameters (in green, with c=5.1 x 10 -10 , q l = 19010, L=5.0 and Q=7.Ox 10-6), and one using the fitted parameters (in blue, with c=5.19 x 10 -10 , q l = 19010, L=3.1 and Q = 6.O x 10 -6 ). The likelihood functions for the data (red), the estimated parameters (green) and fitted parameters (blue) are presented in Fig. 24. For this case, E E = 31.6 and EF = 2.1. Figure 25 illustrates the distribution of ratios. From these three figures it can be seen that the characteristics of the real data are similar to those of the simulated cases (e.g., realization 64).  Fig. 23. Amplitude frequency spectrum (in Hz) for occultation 2 (red), model spectrum using estimated parameters (green), and "byeye" fitted parameters (blue). Fig. 24. Log-likelihood functions, as a function of r7 l in km, from data (red), estimated parameters (green), and fitted parameters (blue) for occultation 1.
The p-values for this case are mid-level to high, as might be expected by how well the distribution from the real data matches an exponential one.
The amplitude time series for occultation I is shown in Fig. 26. It can be seen that there are significant fluctuations within the window. This window was chosen because of that fact -to see how well the theory applies to such a dynamic case. Figure 27 shows the spectra for the data (red), the spectra derived from the estimated parameters (green), and that using a set of fitted parameters (blue). The spectrum of the data shows features that were not seen in the simulated data. For example, the "hump" from around 3 to 6 Hz. In fact, if we look at the amplitude (red) and phase (black) time series over the analysis window (Fig. 28), a significant ramp can be seen in the amplitude data at around 54s, with what appears to be a damped oscillation in the subsequent 1.5 s. The phase data shows a step-like structure during the same 1.5s, with the steps in concert with those in the amplitude data. (It should be noted that the de-trending used with the phase data was very different than that used with the amplitude data. A polynomial fit over an extended time segment, centered at the analysis window was employed. For this example, the fit was performed over a window extending 150 points before and after the analysis window.) The source of these oscillations is unknown, but their periods are approximately 10-16 samples, which at 50 Hz. gives frequencies from around 3-5 Hz -which matches what is seen in the frequency spectrum. The hump in the spectrum generates magnified oscillations in the likelihood function, seen in the empirical (red) curve in Fig. 29, with the unfortunate by-product that one of these oscillations happens to be the global minimum, in turn skewing the parameter estimation (see the green curve in this figure). A by-eye fit was performed (blue curve) which provided far superior parameter estimates -or at least a better fit to the empirical likelihood function -since the true values are unknown. The estimated and fitted error statistics are EE = 299.2 and E F = 2.4, respectively. For the estimated parameters, c = 2.76 x 10-9, ?1 1 = 17730, Q = 1.8 x 10 -5 and L = 5, and for the fitted parameters, c = 1.31 x 10-9 , q, = 20800, Q = 6.0 x 10 -6 , and L = 7.0. It is encouraging that even with the data quality issues, physically reasonable parameter estimates can be obtained with fits to the likelihood function. The distribution of the ratios for this data also contains artifacts from the oscillations. In fact, these are directly related to the hump in the spectrum around 5 Hz, as those values -being much larger than the model spectrum -in turn produce large ratios. These large ratios in turn produce poor p-values.   (red), model spectrum using estimated parameters (green), and "byeye" fitted parameters (blue). Fig. 29. Log-likelihood functions, as a function of n1 in km, from data (red), estimated parameters (green), and fitted parameters (blue) for occultation 2.

Conclusions
In this paper, an overview of the derivation used to produce the model spectrum was presented. This derivation started from the Helmholtz wave equation, and using a reasonable set of assumptions, resulted in the model spectrum, Eq. (26), and its mid-point approximation, Eq. (28). This latter model was used in the subsequent analysis. The assumptions used in the derivation were spelled out clearly. It was pointed out that, as we considered this a first-order solution, two real world atmospheric situations, inhomogeneity (including layered structures) and anisotropy, were not considered. The model derivation was based on standard weak scattering theory which can be found in many references. However, in our case we needed to modify the standard methods to include the motions of the transmitter, receiver, and atmosphere, as well as placing the origin of the coordinate system at an arbitrary point. It was shown that our results match those in the literature when making the appropriate simplifications. A parameter estimation methodology was presented, followed by a discussion of the statistical properties of the estimators. A detailed study using simulated data was presented after the theoretical discussions. A number of metrics were examined to evaluate the accuracy of the estimates, e.g., median, standard deviation, and various inter-quantile ranges. It was found that the estimates of the turbulence intensity and the location of the turbulence alon g the line of sight could be retrieved in an accurate manner. In this instance, the term "accurate" refers to the global statistics over the 1000 simulated realizations. In a practical application, one would like to have accurate estimates for each measurement, or at least a notion as to when the estimates are unreliable. Goodness of fit metrics (p-values) were used to see if the distribution of the ratios of the measured/simulated spectrum values to those from the true one could be used as an indicator of poor estimates. It was found that this technique was not reliable in that regard. Four cases were shoN\ n that demonstrated the problem, one where the p-values were high and the estimates were poor, and vice versa -as well as cases where the pvalue was high and the estimates were good, and vice versa. This was also borne out with a scatterplot of the parameter estimates and p-values. Even when choosing the maximum p-value over the different goodness of fit tests, the correlation of p-value and quality of the parameter estimate was reasonable -but not great.
Since the parameter estimation technique used the global minimum of the log-likelihood function, random errors in the spectrum could result in significant errors when trying to estimate the location of the turbulence along the line of sight. An important step in improving the parameter estimation was fitting the model-based likelihood function to the empirical one. In addition to incorporating the turbulence length scale parameter into the fitting, a weighting function was introduced into the likelihood estimation methodology. The weights were close to one for most of the n values deemed to be in the earth's atmosphere, and decreased rapidly outside those bounds. This not only allowed for the elimination of the most egregious outliers, but it also lead to improved parameter estimates from the data interior to the weighting function's effective region. Finally, it was shown that, due to reducing the random errors, spectral averaging greatly improved parameter estimation. These smaller errors then propagate through the parameter estimation resulting in more accurate estimates. It was seem that spectral averaging was far superior to the weighting function in improving the parameter estimates. The methods developed for the analysis of the simulated data was then applied to the analysis of real occultation data. In the two cases presented, it could be seen that the general characteristics seen in the simulated data were also seen in the real data -which is encouraging.
Future work includes the implementation of a threeparameter estimation algorithm including the length scale, and a more thorough analysis of real occultation data. This is especially needed for phase data. Some analysis has been performed with simulated and real phase data, with encouraging results, but the extensive analysis presented here with the amplitude data has not been performed. There are distinct issues regarding trend removal with the phase time series. A local polynomial fitting method was discussed briefly; however, the frequency content in the data resulting from polynomial fitting is highly dependent on the order of the polynomial used and the window over which the fit is performed. This indicates that a more robust methodology needs to be developed.
As mentioned above, future work should include the effects of inhomogeneities (including layered structures) and anisotropy. The reason that the inclusion of layered media is difficult is because large-scale layers will refract the incoming wave, which then means that the straight-line approximation for the propagation path needs to be modified. The permittivity field is separated into background and random parts. (Note that in this context, "background" refers to all aspects of the deterministic permittivity field.) One could then rewrite the original differential equation in a non-Cartesian coordinate system, e.g., a local one which has as its axes the tangent, normal and binormal vectors to the "main" propagation path -for example that described by geometrical optics. These are so-called trajectory coordinates (see for example: Hill, 1985;Mazur and Felsen, 1987). Another technique is to use a path integral approach to determine the Green's function in a multiple scattering context (Mazur, 2002). Yet another approach is to use the so-called Distorted-Wave Born (or Rytov) Approximations (see for example: Beylkin and Orista(ylio, 1985;Devaney, 1979). In this method, one assumes that the permittivity field is separated into background and random parts, however the heuristic model is that the incoming waves are refracted by the background and are then scattered by the random field. The contribution of all such scattered fields is then what is measured at the receiver. That is, it is a single-scattering process, but the electric field at the scatterer is now "distorted" by the background field. Of the three methods, this is probably the most tractable one.