EARLINET Lidar Pre-Processor (ELPP)
The typical SCC analysis scheme comprises two steps : the
pre-processing of raw data with ELPP and the subsequent optical processing of
the pre-processed lidar data with ELDA. ELPP is based on open source
software, and it will be made available on-request to anyone interested to
contribute in the development.
By “pre-processing” we mean the set of operations, which must be applied to
the raw lidar data before they can be processed by ELDA. ELPP is designed to
operate on the lidar data measured by all of the EARLINET lidar systems in
a fully automatic way. This is made possible by registering all instrumental
parameters needed for the pre-processing in a centralized SCC database
, where this information is structured in terms of lidar
configurations. Each single lidar system can be linked to several lidar
configurations, which describe different lidar set-ups with specialized
measurement capabilities (for example day-time or night-time conditions).
When a raw measurement is submitted to the SCC, a corresponding entry is
created in the SCC database linking the measurement session to the lidar
configuration to be used for the analysis. According to that, the raw lidar
data are handled and corrected for instrumental effects to provide
pre-processed signals that, once saved on a local storage, can be directly
managed by the SCC optical processing module ELDA. The automatic procedures
implemented in ELPP do not require the interaction of a human operator to run
and to produce the final results. This is a fundamental point in the effort
to minimize the manpower needed to perform lidar analysis and consequently to
improve the near-real-time availability of the lidar data at network level.
ELPP has been developed as a very flexible and expandable tool: many
different lidar configurations can be pre-processed using ELPP in different
ways. This is made possible by introducing the concept of SCC usecases. In
summary, a usecase represents a procedure to deliver a particular aerosol
product like aerosol extinction or backscatter coefficient profiles. Usecases
select specific retrieval schemes to calculate the corresponding optical
products in both the pre-processing and the optical processing analysis
modules. As it will be discussed in Sect. , each lidar
configuration is connected to the retrieval of a specific set of aerosol
products. The way in which each product is retrieved is determined by
a specific usecase according to the lidar configuration characteristics. The
aerosol backscatter coefficient, for example, must be retrieved in different
ways depending on the number and the type of available lidar channels. That
means, the raw elastic and corresponding nitrogen Raman signals need to be
handled in a different way if they are split, for example, in near- and
far-range channels, or not. From ELPP development point of view, the
implementation of the SCC usecases required the identification of all
pre-processing procedures and instrumental corrections adopted within the
EARLINET community. All these procedures and corrections were critically
evaluated and finally implemented in ELPP, which enables the usage of this
tool by all EARLINET systems. More details about SCC usecase are discussed in
, where a list of all implemented usecases is reported in the
Appendix.
The modular structure of ELPP permits an easy implementation of new usecases
and thereby the use of ELPP and of the SCC for new EARLINET systems,
non-EARLINET systems, and for other lidar networks independent of EARLINET.
As a consequence, ELPP plays an important role in making the SCC extensible
in more general frameworks like, for example, GALION (GAW Aerosol LIdar
Observation Network) as well as in national lidar networks. Typically, in
such networks the optical processing retrieval algorithms are the same (or
very similar to the) ones used within EARLINET, but lidar experimental
configurations may change significantly.
Another key feature of ELPP is the fully traceability of the whole data
analysis process. As the input submitted to ELPP/SCC is the raw lidar data
without any post-measurement handling, all operations performed on them in
pre-processing or processing phases are traceable. All of the corrections,
input parameters, and algorithms used to calculate a specific aerosol optical
product are logged and provided together with the output data. In this way
the end user has all of the information necessary to fully utilize and to
evaluate the product, and it is easy to perform a consistent re-analysis of
any data set keeping trace of the history of the analyses made.
Moreover, ELPP provides the possibility to check the quality of all
corrections and data handling procedures. The implementation of
quality-certified procedures on both pre-processing and processing levels
allows the application of
a rigorous and homogeneous quality assurance program for the data measured by
lidars with different instrumental characteristics. This is particularly
important for a network like EARLINET, where the standardization of aerosol
optical products is a fundamental requirement.
ELPP is also an important tool for the long-term sustainability of the SCC.
While lidar systems in the research community are often upgraded with new
channels or new detection capabilities, ELPP, acting as the interface between
the hardware level and the optical retrievals, delivers the pre-processed
signals always in the same format. This allows the analysis of the data of
a new lidar without modifying any of the other SCC modules.
In the next sub-section we describe the technical aspects of ELPP including
its requirements and its different interfaces. After that, in
Sect. , a description of the implemented algorithms will be
provided specifying the procedures and the parameters that can be configured
for the pre-processing of raw lidar data.
Block structure of the Single Calculus Chain.
ELPP technical aspects
ELPP is a command line tool developed in ANSI C. It can be compiled using any
C compiler, which is compatible with ANSI C, such as the freely available GNU
Compiler Collection GCC (https://gcc.gnu.org/). The GCC can be used on
both 32 and 64 bit environments for a quite large number of processor
families and on many popular operating systems like Linux, Unix, Mac OS, and
Windows. As ELPP is developed in C, its source code can be implemented on any
platform supported by the GCC without any recoding. The main requirements of
ELPP are a MySQL database (http://www.mysql.com) and the NetCDF C
libraries (http://www.unidata.ucar.edu/software/netcdf/). All of the
files used and generated by the SCC are in NetCDF format.
ELPP can be operated as a SCC module and also as a stand-alone module.
When it is used as a SCC module, it is automatically started by a further
module (SCC daemon) whenever necessary. Figure shows the
general structure of the SCC and also the role played by ELPP in the
automatic analysis of raw lidar data.
If used as a stand-alone module, the ELPP executable requires some mandatory
command line parameters; i.e. the measurement ID of the lidar observation,
that should be pre-processed, and the name of the MySQL database containing
all of the instrumental parameters needed by the pre-processing phase. As an
optional command line parameter the name of a configuration file can be
provided, which contains, for example, the path of the input, output, and log
files.
ELPP provides a user-configurable logging system, which produces a log file
for each analyzed measurement.
The module ingests a NetCDF file containing a time series of raw lidar data
to be analyzed. Raw time series corresponding to different lidar channels can
be included in one single input file. The raw lidar data set is
a three-dimensional array with the dimensions measurement time, channel, and
range bin. It is fundamental that this array contains the raw lidar data as
measured and without any modification. In particular, the photon-counting
signals should be provided in counts (positive integer numbers) while the
analog signals should be provided in mV (real numbers). If the lidar
acquisition system provides photon-counting and/or analog raw signals in
different units, they need to be converted. The conversion must be applied by
the raw data provider before submitting the data to the SCC. As different
data acquisition systems provide the raw lidar data in different units, which
might even change with system versions, it is not possible to include this
conversion step in the SCC. ELPP checks if the photon-counting raw profiles
have been submitted using the required units, and if not, the corresponding
raw data file is not accepted for the analysis. A further check on
theoretical maximum count rate is performed in applying the dead-time
corrections as reported in Sect. .
Together with the raw lidar data,
more information can be included in the header of the NetCDF input file. In
particular, all parameters, which are different for each
measurement, can be provided using dedicated NetCDF variables or global
attributes. These are, for example, the start and the stop time
of each single lidar profile in the time series, the number of laser
shots accumulated for each signal profile, the laser pointing angles,
the measurement ID, and other similar parameters. The parameters, which
are not related to the individual measurement but to the lidar
configuration, like the laser repetition rate, the emitted and
detected wavelengths, the channels acquisition mode, and so on, are
retrieved from the SCC database. To retrieve the full set of the
SCC database parameters relevant for a specific raw data set, each single measurement
(i.e. each single NetCDF input file) gets registered in the database, and it is
associated to an alpha-numeric string (i.e. measurement ID) which is
defined in NetCDF input file. To assure a one-to-one correspondence
between each raw data set and the corresponding measurement ID string,
it is not allowed to submit NetCDF input file with a measurement ID already present in
the database. Using the measurement ID in appropriate database queries, it is
possible to retrieve all information needed for the analysis of
a specific measurement which is not included in the corresponding NetCDF input file.
The raw data provider should decide how
many raw lidar profiles to be included in a single input file taking into account
different aspects. The total size of a single NetCDF input file should be
less than 200–300 MB to ensure a stable uploading on the SCC
server. The maximum time length of
a single NetCDF file also depends on the availability of ancillary information to be used in the analysis (i.e. radio sounding
profiles, overlap correction functions). Each NetCDF input file can be
linked to one specific set of ancillary data sets, which are used for
the analysis of the whole time series. If, for example, there are four radio soundings
per day available for a certain site, the maximum time length of
a single NetCDF input file is 6 h.
The cloud screening is another important operation to be applied to the lidar data before the
submission to the SCC. The quality of the SCC
optical products can not be assured, if there are signatures of low-level
clouds in the raw lidar time series. As a consequence, individual lidar profiles
contaminated by low-level clouds should not be included in the NetCDF
input file. A new module implementing a fully automatic cloud masking
on high resolution lidar data is under development, and it will
be included in the SCC in the framework of the ACTRIS-2 (Aerosol, Clouds and
Trace gases Research InfraStructure Network) projects (http://www.actris.eu).
As already mentioned, it is possible to provide other kinds of input files to
ELPP together with the raw data NetCDF input file, i.e. a file containing
pressure and temperature profiles provided by a radio sounding to be used for
the calculation of the signal backscattered by atmospheric molecules (as
explained in Sect. ), a file containing the overlap
correction function, and a file consisting of the lidar-ratio profile to be
used in the retrieval of the particle backscatter coefficient using
elastic-only techniques. Even if these two last files typically are not
needed in the pre-processing phase, ELPP interpolates them at the same
vertical resolution of the pre-processed data and saves the corresponding
interpolated data in new files. In particular, ELDA is designed to use
these files for the retrieval of the aerosol optical properties.
The output files of ELPP are in NetCDF format and contain the pre-processed,
range-corrected signals, the so called intermediate files, which were handled
according to all analysis steps reported in Sect. .
Table summarizes the description of all NetCDF
variables used to identify different types of pre-processed signals in output files. For instance, the total elastic range-corrected signal is
represented by the variable
elT, the atmospheric nitrogen vibrational–rotational Raman
range-corrected signal by vrRN2. The range-corrected signal for the near
range and for the far range are represented by variables whose names contain
the “nr” or “fr” string, respectively. Selecting appropriate usecases, it is possible to specify
whether gluing procedures should be performed by ELPP gluing the raw signals,
or by ELDA gluing the optical products. The products calculated from near-range and far-range pre-processed signals are the
ones for which ELDA gluing has been selected by the raw data provider
. All of the NetCDF variables reported in
Table are bi-dimensional arrays with dimensions
time and range bin. The time and vertical resolutions of these arrays are
specified in the SCC database for each product as explained in the next
section. Moreover, as the products are defined in the SCC database for
a single emission wavelength (i.e. aerosol extinction coefficient at
355 nm or aerosol backscatter coefficient at 1064 nm), each
intermediate file refers always to a single wavelength.
Other information included in the ELPP output files is the molecular extinction and the molecular atmospheric
transmission profiles, the range resolution and the vertical resolution, the
number of averaged laser shots, and so on. All parameters for the optical
retrieval, which are provided by the user in the input NetCDF file, are
directly transferred to ELDA within the header of the intermediate files.
The input parameters needed for the
analysis of the lidar data are retrieved from the header of the NetCDF
file or, if not provided in the file header, from a relational MySQL
database (SCC database) with general values for a certain lidar system configuration. The
structure of this database is described in in Sect. 3.1.
Once ELPP has been started, it is possible to monitor the status of the
pre-processing using its return values. ELPP returns a null value if the
pre-processing was successfully performed and positive integer values in case
any error occurred. Each return value is associated to a specific type of
error, such as a failure in gluing the lidar signals or an inconsistency in
the definition of the variables in the raw input file, to provide detailed
information about the problem occurred.
Description
of different types of pre-processed, range-corrected signals delivered by
ELPP and the corresponding NetCDF
variable name in the ELPP output files. All of the
variables refer to a single emission wavelength. As
a consequence, pre-processed data corresponding to different
wavelengths are saved in separate files.
Pre-processed range-corrected signal
NetCDF variable name
Elastic total backscattered signal
elT
Perpendicular* polarization component of the total backscattered signal
elCP
Parallel* polarization component of the total backscattered signal
elPP
Vibrational–rotational Raman backscattered signal by nitrogen molecules
vrRN2
Near-range elastic total backscattered signal
elTnr
Near-range perpendicular* polarization component of the total backscattered signal
elCPnr
Near-range parallel* polarization component of the total backscattered signal
elPPnr
Near-range vibrational–rotational Raman backscattered signal by nitrogen molecules
vrRN2nr
Far-range elastic total backscattered signal
elTfr
Far-range perpendicular* polarization component of the total backscattered signal
elCPfr
Far-range parallel* polarization component of the total backscattered signal
elPPfr
Far-range vibrational–rotational Raman backscattered signal by nitrogen molecules
vrRN2fr
* With respect to the linear polarization
state of the incident laser beam.
ELPP work flow. ELPP gets the full set
of information and parameters needed for the pre-processing of a specific
measurement ID performing suitable queries to the SCC database. This
set includes how many products should be calculated (Np), how many lidar channels are needed for
the calculation of each product (Nc(p) with 0<p≤Np), all input
parameters required for the analysis (dead-time value,
trigger delay, etc.), and the name of the input NetCDF file
corresponding to the selected measurement ID. There are two main loops involved in the pre-processing chain: an external loop on the
products to be calculated (index p) and an internal loop in which all of the
product-related channels are pre-processed sequentially (index
c). The operations performed by the single blocks are described in
the text. A single output file (intermediate NetCDF file) is generated
for each product.
Description of implemented algorithms
All corrections and algorithms implemented in ELPP are schematically reported
in Fig. . Most of them are well known and well
described in the literature. For this reason the related relevant literature
is mentioned in the following sub-sections without providing detailed
descriptions of the implemented formulas. On the other hand, details of the
implementation and user adjustable parameters are explained. The
implementation of the automatic algorithm for the gluing of lidar signals is
discussed in greater detail in Sect. .
As already mentioned in the previous section, ELPP requires the presence of
a MySQL database where the characteristics of the analysis to be performed
are specified. In particular, starting from a measurement ID passed to ELPP
via the command line, it is possible to retrieve from the database all
required information such as how many products should be calculated (Np in
Fig. ), how many lidar channels are needed for the
calculation of each product (Nc(p) with 0<p≤Np), the full set of
the input parameters needed for the analysis (dead-time value, trigger delay,
etc.), and the name of the data file containing the raw input time series
corresponding to all lidar channels linked to the measurement ID to analyze.
Once this information is obtained, ELPP starts to calculate pre-processed
signals for all configured products. There are two main loops involved in the
pre-processing chain: an external loop on the products to be calculated
(index p in Fig. with p=1,…,Np), and an
internal loop in which all the product-related channels are pre-processed
sequentially (index c with c=1,…,Nc(p)). The pre-processing steps
performed to calculate a specific set of optical products can be illustrated
by means of a practical example. Let us suppose two products – (Np=2) the
aerosol backscatter coefficient (product p=1) and the aerosol extinction
coefficient (p=2), which should be calculated for a particular measurement
ID by using two elastic channels at 355 nm (elTnr, elTfr) – and
two vibrational–rotational N2 Raman channels at 387 nm
(vrRN2nr, vrRN2fr). To calculate the aerosol backscatter coefficient the channels elTnr
(channel c=1), elTfr (c=2), vrRN2nr (c=3) and vrRN2fr (c=4) are
needed so Nc(1)=4. Let us also suppose the two near-range channels are
detected using analog mode and the two far-range ones are photon-counted.
During the loop on index c=1,2,3,4, first, each channel is identified as
analog or photon counting, querying the SCC database. Dead-time correction is
only applied to photon-counting signals (see Sect. ),
and a different error propagation is used for analog and photon-counting
signals as explained in Sect. . As a consequence, the
elTnr and vrRN2nr are recognized as analog channels while elTfr and
vrRN2fr are labelled as photon-counting signals and corrected for dead
time. After this step, the following operations are made on the four signals:
atmospheric and (optionally) electronic background subtraction as reported in
Sect. , trigger-delay correction (see
Sect. ), and finally the signals are temporally
integrated over a time window defined in the SCC database, which is larger
than the raw data time resolution. The averaging time window should be
selected by the user to ensure the optimal balance between the stability of
atmospheric conditions and an adequately high signal-to-noise ratio (SNR).
This is particularly important for the analog signals because in this case,
as explained in Sect. , the statistical errors are
estimated by the standard error of the mean calculated within the integration
time interval. The way in which the error is propagated in the case of the
time integration of photon-counting signals is described in
Sect. . When all lidar channels needed for the
calculation of the current product (aerosol backscatter coefficient) have
been pre-processed, ELPP performs the gluing of near-range and far-range
channels. If the gluing of one or more pairs of signals has been configured,
the algorithm described in Sect. is used for the
corresponding signals. According to the example above, two signal gluings
need to be performed: the gluing of elTnr with elTfr and the gluing of
vrRN2nr with vrRN2fr. After this step, ELPP completes the calculation of
the current product performing the operations reported on the right part of
Fig. . Optionally a vertical smoothing of
pre-processed lidar signals is performed. Typically, smoothing is done to
increase the SNR of the pre-processed signals. Different smoothing options
can be selected, like linear, polynomial, and natural cubic spline
. Moreover, the signals are range-corrected and optionally
corrected for incomplete overlap. Finally, the molecular contribution to the
atmospheric extinction and transmissivity are calculated at the same
resolution of pre-processed lidar signals as described in
Sect. . The pre-processed signals are then stored in
a specific intermediate NetCDF file. This file will be used as input by the
ELDA module to retrieve the aerosol backscatter
product. In the specific case of the example above, this file contains the
time series of the elastic (N2 Raman) glued pre-processed signals
under the variable elT (vrRN2).
Once the pre-processing corresponding to the first product is ended, ELPP
switches to the next scheduled product (p=2) which is, according to the
example above, the aerosol extinction coefficient. The procedure is very
similar to the one already described for the aerosol backscatter coefficient.
The only difference is that for this product there are only two signals to be
pre-processed (vrRN2nr and vrRN2fr, Nc(2)=2) and only one gluing needs
to be performed. The results are stored in another intermediate NetCDF file
which contains the time series of the N2 Raman glued pre-processed
signals under the variable vrRN2. ELDA will use this file to
retrieve the aerosol extinction coefficient profile.
Dead-time correction
The dead-time correction of photon-counting signals is non-linear.
A typical lidar photon-counting channel consists of
a photo-multiplier, which ideally generates an electrical pulse
for each photon impacting its photo-cathode (event), a pulse discriminator to
reduce the noise counts, and finally a fast counter to count the number of
events in a fixed interval of time, the time bin. As each electrical pulse
has a certain width, two pulses closer to each other than about the pulse
width can not be discriminated. The actual minimum time interval between two
subsequently countable events, called dead-time τ ,
depends on the setting of the pulse discriminator and on the counting
electronics. The dead-time corresponds to a maximum count rate. The dead-time
causes a non-linearity between the actual intensity at the photo-multiplier
photo-cathode and the counted events, which can be described theoretically by
means of photon statistics. As the real processes are not ideal, the
mathematical correction of the non-linearity works only in first
approximation. Furthermore, there are two models to describe the counting
characteristic of a photon-counter, i.e. the paralyzable and the
non-paralyzable model. A paralyzable counting system is not able to provide
a second output count if a time τ is not elapsed after the previous
pulse. Moreover, if an additional pulse arrives within the dead-time τ,
the actual dead-time of the system is further extended by τ. In this
way, at high count rate, the unit is unable to respond, it is “paralyzed”,
and the count-rate output is 0. In contrast, a non-paralyzable counter
outputs counts at maximum count rate as long as subsequent photon pulses are
not discriminable. ELPP includes optionally both models for dead-time
correction (in first approximation). The formulas used by the SCC to correct
for dead-time are the following :
cm=crexp(-τcr)cm=cr1+τcr,
where cm and cr are the measured and the real count rate,
respectively. The Eq. () refers to a paralyzable
counter while the Eq. () is used if
a non-paralyzable counter is assumed.
Once the dead-time value τ and the model to use for the correction are
provided to ELPP, the corresponding photon-counting lidar signal will be
automatically corrected by solving the Eqs. () or
() for the unknown cr. The
Eq. () is solved numerically in the interval
[0,1/τ] using the well-known secant method .
It is important to underline that the Eq. () can be
solved only if cm is less or at least equal to the absolute maximum
of the exponential function on right-hand side. As a consequence, the
following condition on the measured count rate has to be verified:
cm≤1eτ,
where e is the Euler's number.
For the non-paralyzable model, the correction for dead time is made
by inverting Eq. ():
cr=cm1-τcm.
As cr≥0 and cm≥0, the
Eq. () can be solved only if the following
condition on the measured count rate is valid:
cm<1τ.
According to the selected model, the condition expressed by the
Eqs. () or () is used
as constraint on the actual values of the photon-counting signals rejecting
all the cases in which it is not verified.
As the dead-time correction is non-linear, it is applied as the first stage of
the pre-processing procedure as shown in Fig. .
Here it should be mentioned that in general the reliability of dead-time
correction decreases with increasing count-rate: both
correction models reported above usually fail in reproducing the correct
behaviour of a real counting system at high count-rates.
As a consequence, each photon-counting lidar channel should be
carefully adjusted to not exceed a maximum count-rate (typically 10–30 MHz
depending on the value of τ) in all the range bins for which the photon-counting signal
is supposed to be used.
The dead time of a photon-counting system can be evaluated measuring the
counting probability distribution generated by a Poissonian source
(like a tungsten lamp) as described in .
Trigger delay
In general, the data acquisition unit of a lidar system gets a trigger from
the laser to start the signal recording. Due to the electronic circuits in
the laser and in the data acquisition unit, there is always a delay between
the outgoing laser pulse and the time at which the acquisition system
actually starts to record the lidar profile. If this trigger delay is not
properly taken into account, a systematic error is made in associating each
lidar range bin with the corresponding atmospheric range. A delay, for
example, of 100 ns induces a systematic shift of the atmospheric
ranges of 15 m. This shift causes a systematic error in the
range-correction of the lidar signal, which propagates to the calculation of
the final aerosol properties. The error is especially large for the aerosol
extinction coefficient calculated with the Raman method in the near range.
The exact trigger delay can be measured and provided to ELPP as input
parameter for each lidar channel . If ΔT is
the trigger delay of a particular lidar channel and TS1=(t1, t2,
…, tn) is the time scale used by the acquisition system to
sample the lidar profile, the actual lidar range scale is calculated from the
delayed time scale TS2=(t1+ΔT, t2+ΔT, …, tn+ΔT).
If different lidar channels have different trigger delays, ELPP interpolates
all recorded lidar signals from the time scale TS2 (which may change from
channel to channel) to the time scale TS1 (which is the same for all
channels). This operation enables the consistent calculation of the lidar
products for which multiple channels are needed.
It is possible to choose a linear or a natural cubic spline interpolation
. The preferred option is the linear interpolation as usually
the trigger-delay correction requires only a time shift of lidar signals. As
first step, for each value tk of the time scale TS1 the closest higher
and lower values of time scale TS2 are selected. Let us suppose these
values are tl-1+ΔT and tl+ΔT respectively. The value of
the lidar signal Stk in tk is then determined by the equation of the
straight line passing through the points (tl-1-ΔT,Stl-1-ΔT) and (tl-ΔT,Stl-ΔT) as follows:
Stk=Stl-1+ΔT+Stl+ΔT-Stl-1+ΔTΔt(tk-tl-1-ΔT),
with tl-1+ΔT<tk≤tl+ΔT and Δt=tl-tl-1 representing the lidar signal range bin width.
If the trigger delay is a multiple of the signal range bin width (ΔT=uΔt), the Eq. () is equivalent to a re-binning of
the signal (Stl=Stl+u). For all the cases in which the
Eq. () is not equivalent to a re-binning, the
implemented trigger-delay correction introduces correlations between
neighbour range bins. ELPP takes into account for these correlations
estimating the statistical errors of the signal corrected for trigger delay
by using the Monte Carlo approach described in Sect. .
The natural cubic spline interpolation option should be used only if an
additional smoothing on lidar signals is required.
Background subtraction
A raw lidar signal S(z,λ) can be expressed by Eq. (),
S(z,λ)=Spar(z,λ)+Smol(z,λ)+Satm(λ)+Sel,
where Spar(z,λ) and Smol(z,λ) are the signal contributions
backscattered by particles (par) and molecules (mol) at altitude z
and at wavelength λ. Satm(λ) is the optical signal
background from the atmosphere, i.e. the sky brightness, which is
independent of range, and Sel represents the electronic signal
background, which stems from electronic effects of the signal detection
and data acquisition. Sel can have a temporally constant part and a temporally changing part, i.e. changing with lidar range.
It is fundamental to remove Satm(λ) and Sel from the
measured lidar profiles before applying any optical retrieval algorithm.
The amount of the constant background components
Satm(λ)+Sel can be determined either in the far range of
the lidar signal, far enough that the expected contribution from
atmospheric backscatter is negligible, or in the pre-trigger range
before the laser pulse, where the signal must be free of electronic
distortions, which could influence the determination of the constant background.
In both cases the constant background value is calculated as mean
value over signal ranges, which are large enough so that the residual
standard error of the mean is negligible.
ELPP implements both options for the calculation of the range-independent
contribution in Eq. (), i.e.
the mean of the lidar signal in the
far-range region;
the mean of lidar signal in the pre-trigger region.
The selection can be done in the SCC database or in the input file.
In the case of option 1, the minimum (zmin) and the maximum
(zmax) ranges (expressed in m) for the background
calculation have to be provided in the raw data input file. ELPP estimates
the background value from the mean and the corresponding statistical
uncertainty from the standard error of the mean of the lidar signal between
zmin and zmax.
In the case of option 2, three parameters are needed: a minimum
(imin) and a maximum (imax) range bin index in the pre-trigger
region for the calculation of the background value and the uncertainty as above,
and a first valid range bin index (i0) with i0≥imax explained in the following.
After the background value and the corresponding statistical uncertainty
have been calculated, all points up to i0 are removed from the
lidar signal, because they are not necessary for the further
calculations. Then the background is subtracted from the lidar signal.
Temporally changing and hence range-dependent contributions in Sel
are typically due to electronic distortions, which mainly affect the analog
lidar signals. They can have temporally random components and components
which are synchronal with the repetition of the laser pulse. While the random
components zero out in the average of many subsequent lidar signals, the
synchronal components do not and can contribute a significant distortion to
the lidar signal. The stationary synchronal components can be determined from
so-called dark signals, which are measured, for example, with a fully
obscured telescope so that no light from the atmosphere reaches the detectors
and only the distortions are left. The dark signals have to be averaged over
a long enough time period in order to decrease the random contributions
sufficiently. ELPP automatically subtracts a dark measurement from the lidar
signal if the former is included in the SCC input file as single dark signal
or as dark time series. If a dark time series is provided, an average dark
profile is calculated automatically and subtracted from the lidar signals.
Both dark signal and background subtraction can be applied together.
Numerical values of the parameters involved in
Eqs. () and ()
calculated for the most common lidar wavelengths according to
. The quantity δn represents the molecular
depolarization factor for unpolarized (natural) incident light
scattered at right angle, nS is the refractive index of
standard air, Lmol the molecular lidar-ratio, and
σmol the total Rayleigh-scattering cross section per molecule
given by Eq. () when ρmol=1, and a value of
ρS=2.54743×1025 m-3 for the molecular
number density for standard air in Eq. () is assumed.
λ [nm]
δn×10-2
(nS-1)×104
σmol×1030 [m2]
Lmol [sr]
355
3.010
2.9
2.7549
8.503
387
2.953
2.8
1.9188
8.501
532
2.841
2.8
0.5148
8.497
607
2.784
2.8
0.3010
8.494
1064
2.730
2.7
0.0312
8.492
Molecular Rayleigh-scattering calculation
In both aerosol backscatter
and extinction
retrievals the molecular contribution to the
atmospheric extinction and transmissivity are required as input, which are
calculated by ELPP at the emission and detection wavelengths in terms of
vertical profiles at the same vertical resolution as the pre-processed lidar
signals. These profiles are used by ELDA in the extinction and backscatter
retrievals. The molecular number density profile (ρmol) is
calculated by ELPP from vertical profiles of temperature T(z) and pressure
P(z) using the ideal gas law and assuming as 1 the value of the air
compressibility factor :
ρmol(z)=P(z)RT(z),
where R is the universal gas constant.
The temperature and pressure profiles are either calculated from a standard
atmosphere model, or taken from the measurements of a close-by radiosounding
that can be provided to the SCC as a separate input file. Once the molecular
number density is obtained, the calculation of the molecular optical
parameters, i.e. the backscatter and extinction coefficients, is done
following the procedure reported in and . In
particular, the extinction coefficient (αmol), the lidar ratio
(Lmol), and the atmospheric transmission (Tmol) are
calculated using the following formulas:
αmol(λ,z)=24π3λ4ρS2nS2-1nS2+226+3δn6-7δnρmol(z)Lmol(λ)=8π31+δn2Tmol(λ,z)=exp-∫0z/cosθαmol(λ,ξ)dξcosθ,
where λ is the wavelength (in cm), z is the altitude above
the lidar station, and θ is the zenith angle of the lidar pointing.
The other quantities, which are the molecular number density for standard air
(ρS), the molecular depolarization ratio for unpolarized
(natural) incident light scattered at right angle (δn), and
the refractive index of standard air (nS), are calculated according
to . The integral in the Eq. () is
computed numerically using the trapezoidal rule . The
numerical values of the parameters involved in the Eqs. ()
and ()
calculated for the most common lidar wavelengths are reported in
Table . ELPP writes in its output file the
quantities given by the Eqs. () and
(), and the atmospheric transmission given by
Eq. () at both emission and detection wavelengths.
Gluing
Lidar signals can cover a quite large dynamic range, because the
intensity of the light backscattered from the aerosol-laden boundary
layer in the near range (e.g. at 0.5 km altitude) is several
orders of magnitudes higher than the intensity of the light
backscattered from the rather clean troposphere (e.g. at 10 km
altitude). As it is demanding to cover this large dynamic range with
one data acquisition channel with linear response, several approaches
are used to overcome this problem.
One option is to split the signal output from a single photo-multiplier
into two signals and to record one signal using analog detection mode and the other
with the photon-counting technique .
The analog signal provides good performance for the strong backscatter from the
near range but suffers from the high analog noise and distortions in
the far range.
In contrast, the photon-counting signal is saturated in the near range
but provides a good performance in the far range. Therefore it is
appropriate to use the analog for the near-range signal Sn and
the photon-counting for the far-range signal Sf.
Another option is to split the lidar signal optically
using a beam splitter and to detect the split components with two detectors and subsequent data acquisitions.
Both signals are attenuated, if necessary, with neutral density filters
to match the dynamic range of the data acquisitions for the stronger
near-range and the weaker far-range signal. In general, the
photon-counting technique is used for both signals due to its superior
performance regarding detection linearity compared to analog
detection.
A third option is to use two (or more) telescopes with separate
detection electronics, i.e. one small telescope designed to detect the near-range signal
and the other larger telescope optimized to measure the weak far-range signal.
In either case, the complementary signals need to be glued to get
a single “extended” lidar signal for the signal analysis
.
Before gluing, the near-range and the far-range signals need to be
screened for low-level clouds, corrected for instrumental effects like dead time, trigger delay,
etc., and the backgrounds have to be subtracted as explained above.
For the first two options the signals are glued by ELPP and then analyzed by
ELDA as one signal. Typically, if there are lidar configurations with
multiple telescopes, the gluing is made by ELDA at product levels
.
Work flow diagram of the automatic algorithm for the gluing of
near-range and far-range lidar signals implemented in ELPP.
ELPP contains a fully automatic algorithm for the gluing of analog and
photon-counting signals as well as for the gluing of two photon-counting
signals. The algorithm is divided in three main parts. The procedure starts
with the determination of a first guess of the gluing region as described in
Sect. . After that, the algorithm optimizes the
gluing region performing statistical tests as illustrated in
Sect. . Finally, the signals are glued in the optimal
gluing region as reported in Sect. .
First guess of the gluing region
The first guess of the gluing region uses empirical values.
The lower range (z0) of this region is determined from
the far-range photon-counting signal by an upper threshold for the count-rate
as long as the dead-time correction (see Sect. ) is considered to work reliably. This upper
threshold can be defined in the system configuration for each channel
in the SCC database. Typical values used for that are
10–30 MHz .
The upper range (z1) of the gluing region is determined from the
near-range signal, which can be an analog or a photon-counting
signal. Analog signals are in general measured using pre-amplifiers
with several input ranges. Each input range is characterized by
a minimum level below which signal distortions and/or the signal noise
become significant. This minimum level, which is used to determine the upper
altitude (z1) of the gluing region, is expressed by the
ratio S/F where S is the maximum detectable input
signal level and F is a parameter characterizing the analog to
digital converter (ADC). If we assume, for example, the ADC output is reliable
only for values larger than Nres times its resolution we obtain
F=2nb-1Nres,
where nb is the number of the bits of the ADC. The values of the
parameter F can be defined in the system configuration for each channel. If the near-range signal is detected in photon-counting mode, the upper
altitude z1 is determined by setting a lower threshold for the SNR.
Optimal gluing region
Starting from the values of z0 and z1 determined in the previous
section, ELPP tries to optimize the gluing region using the automatic
algorithm shown in Fig. . Besides z0 and z1, the
algorithm requires the following input data provided in the input file and in
the SCC database, which are explained in detail later:
the near-range and far-range signals Sn and Sf, respectively;
a threshold rth for the linear correlation of Sn and Sf;
the step Δz with which the gluing region is decreased
during the iterations;
the statistical uncertainty limits to
evaluate the slope test and the stability test given in
number of standard deviations m and n, respectively.
First, the algorithm determines the number of range bins N between
z0 and z1. If this number is less than 15, the gluing region is
considered too small to perform a reliable gluing and consequently the
gluing is not done. If N is larger than or equal to 15, the
linear correlation r of the signals Sn and Sf is calculated
between z0 and z1. As Sn and Sf should be highly linear
correlated in the gluing region, only regions where r is larger than
the threshold rth (typically 0.9) are accepted; otherwise the gluing
is not performed.
If r≥rth, a further investigation of the gluing
region is done in order to exclude parts of the region with significant deviations
between the two signals and to minimize the gluing error. This is
made by changing iteratively the region [z0,z1]
until the signals Sn and Sf are consistent according to the
additional tests described below. This procedure is illustrated by the
block “Slope test” in Fig. .
In the optimal gluing region the signals Sn and Sf should coincide,
even in the fine structure due to aerosol layers and photon noise,
and only differ due to the different electronic noise sources with zero means and slopes.
To investigate this the following steps are carried out:
the signal Sn is normalized to the signal Sf in the gluing
region. This is done performing the least square regression Sf= KSn in the gluing
region, and using the obtained K to normalize the signal Sn;
the residuals R= KSn-Sf are calculated in the gluing region;
the slope of R over range z is evaluated making the linear
least squares fit R=kz.
If the signals Sn and Sf are statistically equivalent in the
gluing region, the values of the slope k should not be significantly
different than 0, and the residuals R should be normally distributed around a null mean value. This condition is
considered verified if the absolute value of k is smaller than m
standard deviations (default 2) of the slope resulting from the least square fit.
If the gluing range is large (e.g. if the number of
range bins in the gluing range is greater than 30), there could be
a difference between the first and the second half
of the gluing range. In this case we introduce a constraint on the
absolute value of the curvature C of the residuals, which is estimated from the
difference of the slopes of the residuals of the first and the second half of the gluing range (C=|k1-k2|). The
condition is met if
C<mΔC,
where ΔC=Δk12+Δk22. The integer m
represents the level of confidence of the Eq. () as
exclusive condition. For a Gaussian distribution and for
m=1, there is about the 32 % of probability the two slopes (k1 and
k2) agree (in statistical sense) even if the
Eq. () is not verified . For m=2 the
same probability is reduced to about 5 %.
Figure (block “slope test”) shows the work flow of the
optimization of the gluing region. The starting gluing region [z0,z1] is
changed until the slope test described above is satisfied. First the
algorithm tries to iteratively reduce z1 in steps of Δz while
keeping z0 fixed. In Fig. this phase starts with setting
i=1 and j=0. In each iteration the slope test is evaluated: if the test
is passed, the current region is used as optimal gluing region; if it is not
passed, z1 is further reduced by Δz.
If there is no region in which the slope test is passed, the algorithm starts
to increase iteratively z0 in steps of Δz while keeping z1
fixed at its starting value (i=0 and j=1 in Fig. ). If no
region can be found passing the slope test, the gluing is not done.
If a gluing region has passed the slope test, the stability
test is further applied, which is shown by the block “stability test” of Fig. . The
region, which has passed the slope test, is divided into two equal
subregions, and in each of these subregions the signal Sn is
normalized to the signal Sf, which results in two signals S1=K1Sn and S2=K2Sn, where K1 and K2 are the two
slopes obtained from the two least squares line fits. If the gluing region is
chosen in a proper way, S1 and S2 are indistinguishable taking into
account the corresponding signal uncertainties. To test this, the following
condition (stability test) is evaluated:
|K1-K2|<nΔK12+ΔK22,
where ΔK1 and ΔK2 are the standard deviations on K1 and K2 obtained
from the two least squares line fits, and n is a positive integer
(default value is 1) having the same statistical meaning of the
integer m in the Eq. (). If
the condition expressed by the Eq. () is met, we
assume that the selected interval is the optimal gluing region, otherwise the
interval is progressively reduced increasing (decreasing) the lower
(higher) border in step of Δz until the stability test is
verified.
Example of the results of the automatic gluing algorithm
shown in Fig. . The algorithm is applied to the
analog (near range) and photon-counting (far range) elastic cross
signals measured at 532 nm by the MUSA
lidar of the Potenza station. In blue is shown the photon-counting
signal, in red the near-range signal normalized to the
photon-counting signal in region A, i.e. the first guess of the gluing region,
and in green the near-range signal normalized in region G, i.e. the final
optimal gluing region. Region B represents the gluing region
obtained after the slope test shown in Fig.
and discussed in the text. “Gluing” marks the point at
which the blue and green signals are glued. In the bottom plot
the relative differences of the two rescaled analog signals with
respect to the photon-counting profile are shown.
Signals combination
If the gluing algorithm described in the previous section ends
successfully, the optimal gluing region is returned (z0′ and z1′)
together with the normalization gluing factor K used to normalize the
signal Sn and the corresponding error ΔK resulting from the
least square line fit. Finally, the signals Sn and Sf are
glued calculating first the quantity Sn′= KSn and then
calculating the gluing point (zg) as the range bin, within the
optimal gluing region, that minimizes the square differences of the signal
Sn′ and Sf. The glued signal S(z) and the corresponding
statistical error ΔS(z) are the following:
S(z)=KSn(z),ifz<zg;Sf(z),otherwise;
ΔS(z)=(KΔSn)2+(SnΔK)2,ifz<zg;ΔSf(z),otherwise.
An example of the application of this algorithm to real lidar data is
shown in Fig. . The algorithm is applied to the
analog (near range) and photon-counting (far range) elastic cross-polarized signals measured by the EARLINET reference system MUSA
MUlti-wavelength System for Aerosol,. The
blue curve (upper plot) is the photon-counting elastic cross signal
at 532 nm summed-up over 1 h, which is used as
far-range signal. The first-guess gluing region is indicated as
region A in Fig. , i.e. between
z0= 2445 m and z1= 3917 m, and the red curve represents the
analog elastic cross signal at 532 nm normalized to the
photon-counting signal in region A.
The region indicated with B (extending from 2445 up to 3097 m) is
the region in which the slope test has passed, and region C
(z0′= 2651 m and z1′= 2891 m) represents the optimal gluing region
after the stability test. Region G is used to finally glue the
signals. The green curve in Fig. is the same as the red but normalized in region G.
The improvement in gluing the signals in region C instead of the first
guess interval A is emphasized by the bottom panel of
Fig. in which the relative differences of the
two normalized analog signals with respect to the photon-counting
profile are shown. In particular, in the region between 2 and 3 km
the red signal is clearly below the blue one, which is a clear
indication of an unreliable gluing. On the other hand, above
2.5 km, the green signal overlaps the blue one better than the red signal.
As a final step, the green and the blue signals are glued at altitude
zg= 2775 m.
Error propagation
ELPP propagates the statistical errors in all steps shown in
Fig. . Two different propagation methods are
implemented: one based on the standard formula of statistical error
propagation , and another one based on Monte Carlo
simulations , which is only used when the standard error
propagation is not possible or too complex. This is the case, for example, if
the interpolation or smoothing routines implemented in ELPP have been
applied.
The details of the application of the Monte Carlo method to the error
propagation are given in . In this section only the basic
concepts are briefly discussed. If si is either a raw or a processed lidar
profile, Δsi the corresponding error profile, and F
a generic operator we want to apply to si (for example a smoothing
procedure or a filter) to obtain Si=F(si), the Monte Carlo
method offers an efficient and general procedure to calculate ΔSi,
i.e. the uncertainty of Si. The basic assumption is that each si is
a mean value with an uncertainty width Δsi according to
a statistical distribution. The first step consists of randomly varying all
values si considering their Δsi as standard deviations. ELPP
assumes that analog signals are governed by Gaussian statistic and
photon-counting signals follow Poissonian statistic. In this way a new
synthetic lidar signal si′ can be generated according to the assumed
probability distribution and a corresponding transformed signal
Si′=F(si′) can be calculated. Repeating this procedure
a statistically meaningful number of times, the error profile ΔSi
can be estimated calculating the standard deviation of the Si′. ELPP uses
a default value of 30 variations of Si′=F(si′), which has been
found to offer the best trade off between the calculation time needed and the
accuracy of the retrieved errors. Optionally, the number of Monte Carlo
variations can be also specified in the SCC database for each product.
The random extractor routine implemented in ELPP is based on a so-called
Lehmer random number generator which returns a pseudo-random number uniformly distributed
in the interval 0.0 and 1.0 . This uniform distribution
is then mapped in Poissonian or Gaussian one .
ELPP deals with the error propagation of photon-counting and analog signals
in different ways. As the photon-counting signals are assumed to obey the
Poisson statistic, the statistical error can be evaluated for each
photon-counting raw signal range bin as the square root of the corresponding
count. As a consequence, the uncertainty of photon-counting signals can be
propagated from the beginning to the end of the chain.
On the contrary, the evaluation of the statistical error corresponding to
each single raw signal range bin in the case of analog signals is not so
trivial. For Gaussian distributions the standard deviation can not be
inferred from the mean value like for the Poissonian case. To overcome this
difficulty two options are implemented in ELPP.
The first consists of the possibility to
provide, along with the raw analog signal time series, the
corresponding statistical error time series. This option is applicable only
for systems which are able to measure such kind of values e.g. by storing not only the mean values but also the sum of the square
values. In this case the error of analog time series is propagated in all the operational
blocks shown in Fig. using the standard
propagation formula or the Monte Carlo method.
If the statistical error time series are not provided, ELPP calculates the
statistical errors of analog signals only after the time averaging (block
“time integration” in Fig. ) as the standard error
of the mean of each range-bin value. In all the operations made before the
time integration (i.e. background subtraction and trigger-delay correction)
the error of analog signals is not propagated due to the difficulty to
estimate the statistical error of analog signals without other information.
In this case, the analog signal time series (Sha) and the
corresponding standard errors (ΔSha) after the time
integration are calculated according to the following equations:
Sha(z)=1N∑j=NhN(h+1)-1sja(z)ΔSha(z)=∑j=NhN(h+1)-1sja(z)-Sha(z)2N(N-1)h≤Nt-NN,
where sja(z) is the analog time series before the time integration
with j=0,…,Nt-1, and N is the number of the raw profiles belonging
to the same time window (defined as the larger integer smaller than the ratio
of the integration time window width and the raw time resolution of
sja(z) time series).
To summarize, the statistical error of analog signals, if not
provided directly by the raw data submitter, are first estimated using
Eq. () during the “time integration” stage
and then propagated in all the subsequent blocks shown Fig. .
Finally, in the case of photon-counting detection mode, the signal time series
(Shp) and the corresponding standard errors (ΔShp) after the time integration are calculated using the following
equations:
Shp(z)=∑j=NhN(h+1)-1sjp(z)ΔShp(z)=∑j=NhN(h+1)-1Δsjp(z)2,
where sjp(z) and Δsjp(z) are the photon-counting
time series and corresponding statistical error before the time integration
(j=0,…,Nt-1).
Range-corrected by ELPP for five lidar
systems participating in the EARLI09 inter-comparison campaign (the
same colour identifies the same lidar system in all the plots). All
profiles were taken from 21:00 to 23:00 UT on 25 May
2009. From left to right, upper panel: elastic-backscattered
signals at 355 and 532 nm; middle panel:
N2 Raman backscattered signals
at 387 and 607 nm; bottom panel: elastic-backscattered
signals at 1064 nm.
The dotted grey curves represent the signals backscattered by
atmospheric molecules computed using a close radiosounding. All
signals are normalized in the atmospheric region between 9.5 and 10.5 km, which is assumed to
be aerosol free.