Sampling strategies and post-processing methods for increasing the time resolution of organic aerosol measurements requiring long sample-collection times

The composition and properties of atmospheric organic aerosols (OAs) change on timescales of minutes to hours. However, some important OA characterization techniques typically require greater than a few hours of samplecollection time (e.g., Fourier transform infrared (FTIR) spectroscopy). In this study we have performed numerical modeling to investigate and compare sample-collection strategies and post-processing methods for increasing the time resolution of OA measurements requiring long samplecollection times. Specifically, we modeled the measurement of hydrocarbon-like OA (HOA) and oxygenated OA (OOA) concentrations at a polluted urban site in Mexico City, and investigated how to construct hourly resolved time series from samples collected for 4, 6, and 8 h. We modeled two sampling strategies – sequential and staggered sampling – and a range of post-processing methods including interpolation and deconvolution. The results indicated that relative to the more sophisticated and costly staggered sampling methods, linear interpolation between sequential measurements is a surprisingly effective method for increasing time resolution. Additional error can be added to a time series constructed in this manner if a suboptimal sequential sampling schedule is chosen. Staggering measurements is one way to avoid this effect. There is little to be gained from deconvolving staggered measurements, except at very low values of random measurement error (< 5 %). Assuming 20 % random measurement error, one can expect average recovery errors of 1.33– 2.81 μg m−3 when using 4–8 h-long sequential and staggered samples to measure time series of concentration values ranging from 0.13–29.16 μg m−3. For 4 h samples, 19–47 % of this total error can be attributed to the process of increasing time resolution alone, depending on the method used, meaning that measurement precision would only be improved by 0.30–0.75 μg m−3 if samples could be collected over 1 h instead of 4 h. Devising a suitable sampling strategy and postprocessing method is a good approach for increasing the time resolution of measurements requiring long sample-collection times.


Introduction
Organic aerosols (OAs) comprise 20-90 % of total, dry, submicrometer atmospheric aerosol mass, and therefore have important influences on air quality and aerosol-climate effects Fuzzi et al., 2015). OAs can be emitted directly into the atmosphere (primary organic aerosol, POA), or formed in the atmosphere from the oxidation products of precursor gases (secondary organic aerosol, SOA). It is critical to distinguish between POA and SOA since they result from different (natural and anthropogenic) emission and transformation processes, and therefore require separate control and regulation strategies. This separation is complicated by the fact that OAs are complex mixtures of thousands of different individual organic compounds.
A key feature of OA is that its composition and properties change and evolve continually in time ). These changes happen on timescales of minutes to hours. OA evolution occurs because organic compounds are subject to continual oxidation throughout their lifetime in the atmosphere, while also mixing with freshly emitted OA. Oxidation changes basic OA molecular properties such as size and degree and type of functionalization. These basic molecular properties determine OA volatility, solubility and hygro-scopicity, which in turn determine OA concentrations and the ability of OA to take up water. These effects combined are relevant for assessing aerosol impacts on health and climate. Observation of OA composition over time also permits source resolution important for identifying major contributors to the OA burden in the atmosphere (Corrigan et al., 2013). To capture the evolution of OA composition and properties in the atmosphere it is necessary to measure OA at high time resolution ). We define time resolution here as the number of measured values per unit time.
Due to their complexity, OAs cannot be completely characterized by any single measurement technique. A detailed OA picture can only be captured by combining a range of different measurement techniques. Depending on analytical detection limits, some techniques require long samplecollection times (typically greater than a few hours) to collect enough aerosol mass for analysis; these samples are often analyzed off-line in a laboratory facility rather than in the field. Examples of analytical techniques requiring longer sample-collection times at atmospherically relevant aerosol concentrations include: Fourier transform infrared (FTIR) spectroscopy (4-24 h; Russell et al., 2011;Frossard et al., 2014;Corrigan et al., 2013); and nuclear magnetic resonance (NMR) spectroscopy (8-48 h; Finessi et al., 2012;Matta et al., 2003;Decesari et al., 2006). In contrast, measurement integration times can be as short as a few minutes (aerosol mass spectrometry) to 1 h (online GC-MS), and these are often associated with on-line (or in situ) instruments.
Measurements with longer collection times still provide molecular-and functional-group-level information that are valuable for OA characterization (Corrigan et al., 2013). Therefore, to obtain diverse and detailed chemical information at high time resolution, new approaches are desired. One approach is to develop new instrumentation and hardware for rapid sample collection and analysis. For example, an online GC-MS instrument has been developed (Williams et al., 2006). Additionally, aerosol can be concentrated in a particle concentrator prior to sampling, which can decrease FTIR sample-collection times from a few hours to 1 h (Maria et al., 2002). However, due to the costs, complexities, and practical limitations involved (e.g., aerosol concentrators require very large flow rates and virtual impactors are sensitive to operating conditions), instrument development is not always a viable approach to improving time resolution. As an alternative or complement to hardware design, it is possible to devise sampling strategies and post-processing methods for constructing higher time resolution measurements from a set of low resolution samples. This is the approach that we investigate in this work.
We performed numerical modeling to compare the effectiveness of sampling strategies and post-processing methods for achieving 1 h time resolution with measurements requiring 4, 6, and 8 h of sample-collection time. We modeled two sampling strategies: sequential sampling, where successive measurements are collected one after another, and staggered sampling, where each new measurement is regularly initiated before termination of the previous measurement. The time resolution of a sequentially measured time series can be controlled (and increased) by interpolating between measurements. The resolution of a time series obtained by staggered sampling can be controlled through the choice of the staggering interval between samples. A time series resulting from staggered sampling is a running average of the true time series one seeks to measure. In the ideal case, mathematical deconvolution can be used to retrieve the original time series at the resolution of the staggering rather than samplecollection interval. For actual measurements, the process of deconvolution is complicated by unavoidable perturbations to measurement signals due to random measurement errors. Regularization techniques are required.
We examined two concentration time series with contrasting diurnal patterns. Hydrocarbon-like organic aerosol (HOA) and oxygenated organic aerosol (OOA) are major contributors to OA as identified by AMS (aerosol mass spectrometry) and factor analytic decomposition (Zhang et al., 2011). HOA is generally associated with primary organic aerosol (POA) emissions and follows diurnal trends of traffic patterns in urban areas (i.e., early morning and late afternoons during weekdays). OOA is associated with SOA formed from photochemical oxidation in combination with aged background aerosol (de Gouw et al., 2009), and exhibits a peak close to solar noon. The data set we used are AMS measurements of HOA and OOA reported by Aiken et al. (2009) at a polluted urban site in Mexico City, Mexico (T0 site MILAGRO field campaign; Molina et al., 2010). The data set is described fully in Sect. 2.
Section 3 formerly introduces and describes the different sampling strategies and post-processing methods we investigated. Section 4 describes the numerical modeling used to apply these sampling strategies and post-processing methods to the test data. The modeled conditions were designed primarily to represent the measurement of functional groups representing HOA and OOA by aerosol FTIR spectroscopy, since this is the primary measurement technique of our research group. However, the results should be applicable to any type of environmental sampling that can be characterized with parameters falling within the ranges that we modeled.
The numerical modeling results were grouped into three major categories: sequential (sequential sampling + interpolation), smeared (staggered sampling with no data processing), and recovered (staggered sampling + deconvolution). In Sects. 5 and 6 the best post-processing methods are identified for the sequential and recovered categories, respectively. An overall comparison of the best-case sequential and recovered solutions with the smeared solution is made in Sect. 7. The advantages and disadvantages of each method are discussed, taking into account the attainability of the modeled best-case scenarios and the practical costs involved. Section 8 discusses the differences between the HOA and OOA results. Finally in Sect. 9 we discuss the interpretation of the error results.

Test case: HOA and OOA concentration time series
To test different methods of increasing time resolution we used time series of HOA and OOA concentrations originally measured at high time resolution by aerosol mass spectrometry at the T0 site in central Mexico City in 2006 during the MILAGRO field campaign. The MILAGRO campaign and T0 site are described by Molina et al. (2010). The aerosol mass spectrometer measurements and the positive matrix factorization (PMF) analysis used to derive the HOA and OOA profiles and concentrations are described by Aiken et al. (2009).
The HOA and OOA concentration time series are displayed in Fig. 1a. The original measurements were collected over the period from 10 to 31 March 2006. To avoid gaps in the time series greater than 1 h we only used the measurements from 23:00 LT (local time) 19 March 2006 to 10:00 29 March 2006, which amounts to a total period of 228 h. This period was chosen because 228 has many factors (7 greater than 12), which was desirable for numerically modeling the effect of the time-series period measured (see Sect. 4). The original measurements were averaged over 1 h intervals to generate hourly-resolution data for the inverse modeling and to smooth out some of the high-frequency perturbations due to random measurement uncertainties. The hourlyresolution data certainly still contain measurement noise, but for the purposes of our modeling we assume that these signals represent the true changes in HOA and OOA concentrations at the T0 site over this time period.
Both the HOA and OOA concentration time series displayed strong and regular daily peaks. The diurnally averaged profiles shown in Fig. 1b indicate that HOA concentrations peaked in the mornings around 07:00. These HOA peaks were coincident with the occurrence of a morning vehicle rush hour period and low atmospheric boundary layer heights. This peak timing suggests the HOA was predominantly primary OA emitted from combustion sources that was able to build up to high concentrations in the shallow morning boundary layers . The daily OOA concentration peaks were broader, beginning around 08:00 and extending to 15:00. This peak timing suggests that the OOA concentration peaks were the result of photochemistry and SOA formation ).
The two time series in Fig. 1 were chosen for this analysis because their daily peaks were separated by only a few hours. If these HOA and OOA concentrations (or the concentrations of functional groups or specific molecules representing these OA classes) were measured at poor time resolution (> 4 h), the differences between the daily peaks would not be clearly resolved. In that case it would not be possible to easily recognize that the concentration peaks resulted from two distinct processes: primary particle emission and secondary aerosol formation. Therefore, the ability to clearly resolve the daily HOA and OOA concentration peaks provided an ideal test case for different methods of obtaining hourly time resolution data from measurements requiring longer samplecollection times.
We note that it is not possible to measure HOA or OOA concentrations directly with FTIR spectroscopy. FTIR spectroscopy is used to measure the absorption spectra of aerosol samples. Organic functional group and total OA concentrations can be derived from these measured spectra Takahama et al., 2013). The ideal conditions we have modeled in this study could represent, for example, the measurement of organic functional groups that represent HOA and OOA. Factor analysis can also be used to calculate the FTIR-equivalent of HOA and OOA species (Corrigan et al., 2013). In this case the relevant time series would be multivariate (many wavelengths or functional group abundances considered together) rather than univariate (concentrations of individual species). The theory developed in Sect. 3 can be extended to the multivariate case. The multivariate extension is the topic of future work and is not covered in the present study. For the current, univariate case we chose to model the measurement of HOA and OOA concentrations because these species display contrasting diurnal profiles and because they illustrate the variations in OA that can be captured at high time resolution.
3 Sampling strategies and post-processing methods for increasing measurement time resolution Two simulated sampling strategies were applied to the HOA and OOA test data: sequential and staggered sampling. A variety of different post-processing methods for increasing measurement time resolution were investigated with the two sets of simulated measurements. Figure 2 lists each of the methods applied and each method is explained in further detail below. For each method, the best-case scenario was considered in order to determine the theoretically optimal combination of sampling strategy and data processing method for increasing measurement time resolution.

Sequential sampling
Aerosol samples (and most other environmental samples) are typically collected sequentially, one after another. We refer to this as sequential sampling. Sequential measurements are separated by an interval of time (δτ ) equal to the individual sample collection or measurement integration time ( τ ). Post-measurement, the resolution of sequentially collected measurements can be increased by interpolating between successive points with some chosen function. Here we consider two interpolation methods: step function and linear interpolation (Fig. 3). Although it seems likely that linear in-  terpolation will better represent the original time series we have tested step interpolation as this case is often assumed (at least implicitly). For both interpolation cases we represented a single measurement by the midpoint of a given sample: each measurement occurs at time t mid = t start + τ/2 = t end − τ/2). It is also possible to represent individual measurements by the start (t start ) or endpoints (t end ) of each sample. We do not consider those options here because the modeled results do not represent the original time series as well as the simulations with t mid .

Staggered sampling
Aerosol sample collection can also be staggered, such that each new sample is regularly initiated before termination of the previous sample. By separating successive measurements by a staggering interval δτ less than the individual sample-collection time τ , it is possible to increase measurement time resolution. The principle of combining multiple, overlapping, lower-resolution samples in order to construct higher spatial-and temporal-resolution information has been used extensively for image processing (Borman and Stevenson, 1998;Shechtman et al., 2005).
Staggered sampling effectively applies a running average to a time series of aerosol concentrations, which produces a smeared version of the original signal, denoted here as g(t). If f (t) represents the true change in aerosol concentrations at some point in the atmosphere from time t = 0 to T , g(t) is the product of the convolution of a boxcar kernel function h( τ ) and f (t). This is a specific example of a Fredholm integral equation of the first kind: In the case of measured data a smeared signal is more appropriately represented by a finite series of n measurement points g separated by δτ than by the continuous function g(t). In addition, all measurements are subject to some amount of measurement uncertainty . A discrete formulation of Eq.
(1) that more accurately reflects the actual measurement process is the matrix equation: where H is a convolution matrix and f is a finite series of m data points representing f (t). The temporal resolution of f is the same as that of g (i.e., δτ ). For staggered samples, the convolution matrix H is an n-by-m Toeplitz matrix. Each of the n rows of H contains a shifted copy of a boxcar function with k = τ/δτ non-zero values equal to 1/k. In general, n = m+k−1. Figure 5 displays examples of a true time series f of HOA concentrations and corresponding smeared time series without (Fig. 5a) and with (Fig. 5c) measurement error. Equation (2) suggests the following two post-processing methods for recovering a higher time resolution estimatef of the true time series f from staggered measurements.
1. The measured time series is taken as an approximation of the true time series. No further data processing is applied.
2. One attempts to recoverf through a deconvolution operation. For example, if H + is the pseudo-inverse matrix of H one can solve the following inverse problem In principle, the true aerosol concentrations f can be recovered precisely from a set of staggered measurements g and solution of Eq. (3) (Fig. 5b). However, in practice the problem is ill-posed. The small perturbations to g due to random measurement uncertainty are strongly amplified in f . One can only ever hope to find a solutionf that is a good approximation of f ( Fig. 5d and e).
A variety of different deconvolution methods exist for finding the inverse solution of Eq. (2). For example, the convolution theorem (Arfken and Weber, 2005) states that deconvolution amounts to simple division of the frequency domain representations of f and H (which are typically obtained by Fourier and/or Z transforms). This deconvolution approach has recently been used to improve the time resolution of slow response, broadband terrestrial irradiance measurements (Ehrlich and Wendisch, 2015). However, we choose to frame the deconvolution problem with the discrete matrix-based approach shown by Eq. (3) because it is well suited to the natural, discrete form of measurement data, does not assume periodicity of the time series being studies (as taking Fourier transforms would implicitly do), and allows easy and intuitive implementation of regularization methods (discussed in further detail below). For this work, we use a well-established and tested software package for inverse modeling by regularization (Regularization Tools Version 4.1 for MATLAB Hansen, 2007).
A further limitation of measured data relates to the extra k measurement values at the boundaries of g (recall for an n-by-m H matrix, n = m + k − 1 where k = τ/δτ ). These boundary elements correspond to partial samples with integration times < τ . In some experiments, it may be possible to obtain the boundary values of g by initiating and concluding experiments with partial samples. However, this is not possible in experiments where τ corresponds to the lowest possible sampling time required to exceed the detection limit. Therefore, only a truncated measurement vector g t with n − 2(k − 0.5) elements will be accessible for measurement in most cases (Fig. 4). There are two general approaches for deconvolving a system with g t .
1. Accept that the boundary values cannot be known and solve the resulting system of equations where H has more columns than rows, further adding to the illposedness of the problem. We refer to this as the truncated method for dealing with unknown boundary values.
2. Pad the truncated measurement vector g t so that it has the same number of elements as the ideal, full convolution product g. The resulting system of equations will be overdetermined, but g will contain estimated (or guessed) values as well as actually measured values.
For option (2), a variety of different padding methods exist (e.g., Lane et al., 1997). Simple methods include the repetition of the final boundary values (uniform padding) or a reflection of the values about the boundaries (reflective padding). These padding methods are illustrated in Fig. 4. More refined methods concede that boundary conditions cannot be known a priori (e.g., Aristotelian boundary conditions, Calvetti et al., 2006). Here we consider only the simple methods of uniform and reflective padding and compare the results with those obtained from the truncated method (option (1) above) and also from the ideal scenario where the full measurement vector g is accessible for measurement.
To deal with the sensitivity of the solution to measurement uncertainty perturbations and the loss of boundary measurements some form of regularization is required. Regularization is the introduction of additional information in order to stabilize a solution. In this context, regularization can be achieved by modifying the convolution matrix H so that the components of the matrix that are responsible for explaining most of the variation in the underlying data are emphasized, while the components that are associated with high frequency measurement noise are deemphasized or removed. Regularization methods can be defined through the singular value decomposition (SVD) components of H. SVD is also an important practical tool for solving Eq. (3) (Hansen, 2007) and is defined as where U is an m × m matrix consisting of the left singular vectors u 1 , . . . , u m , V is an n × n matrix consisting of the right singular vectors v 1 , . . . , v n , and is an m × n diagonal matrix consisting of diagonal elements σ i arranged in descending order. The σ i are non-negative values and characteristic of a given matrix. They are known as singular values. Small singular values are responsible for makingf sensitive to perturbations in g (Hansen, 2002). Step interpolation Linear interpolation Sequential measurements Figure 3. An illustrative example of interpolation between sequential samples. An original time series f of HOA concentrations, and the time series resulting from step (red) and linear (yellow) interpolation between successive sequential samples, which are indicated by the circle markers.
For example, truncated SVD (TSVD) regularization is the most straightforward regularization method. TSVD involves retaining the first k SVD components of H, which correspond to the largest singular values σ i , and simply discarding the rest. Tikhonov regularization is another common regularization method (Tikhonov and Arsenin, 1977). It involves minimizing a weighted sum of the residual and solution norms, with weighting parameter λ determining the importance given to the solution norm, or smoothness of the solution. The pseudo-inverse matrix is then defined by each method as (Aster et al., 2012) where the subscript k indicates the number of components retained, and I is the identity matrix. As with TSVD, the effect of Tikhonov regularization is to favor the large singular values and deemphasize small singular values. It can be seen that both regularization methods require the introduction and setting of an additional parameter: k for TSVD and λ for Tikhonov regularization. Figure 5d and e illustrate how critical it is to set the regularization parameter to an appropriate value. If too many singular values are retained (large k) or emphasized (small λ), then the solution becomes highly unstable with strongly amplified perturbations. If too few singular values are retained (small k) or emphasized (large λ), then the solution is overly smoothed. The resulting smeared signal g is the full convolution product of f and a convolution matrix H( τ, δτ ). Since f contains 12 data points, g contains 15 (= 12 + (4/1) − 1) data points. The values at the boundaries of g correspond to partial averages of f (samples with sampling time < τ ). In practice these values are often not accessible for measurement, and one is left with a truncated measurement vector g t consisting of only eight (= 15 − 2(4 − 0.5)) data points. The truncated measurement vector can be padded on its edges by the uniform (g uni ) or reflective (g ref ) methods so that is has the same number of elements as the full convolution product g.

Description of the modeling
Numerical inverse modeling was conducted with the two test time series to compare the different methods of increasing time resolution (Fig. 2). Table 1 lists the model parameters and their values. The model parameters and values were chosen primarily to represent aerosol sampling for FTIR spectroscopy as detailed further below. However, the calculations are more general, and the results of the numerical modeling are applicable to any type of environmental sampling that can be characterized by parameters falling within the ranges indicated in Table 1. We considered filter sampling periods of 4, 6, and 8 h. A minimum sample length of 4 h represents a typical value for the shortest possible sampling period required for aerosol FTIR spectroscopy (assuming the aerosol is not concentrated before sampling; if the sample is concentrated, FTIR samplecollection time can be as brief as 1 h, Maria et al., 2002). Sequential sampling was modeled by averaging the true aerosol concentrations over sequential intervals of τ hours (e.g., circle markers in Fig. 3) centered at the sample midpoints. Staggered sampling with a staggering interval δτ of 1 h was simulated by constructing a convolution matrix H (which depends on τ ) and evaluating Eq. (2).
The period of the time series (T ) measured by sequential and staggered sampling was varied from 12 to 228 h. To en-  Initial testing indicated that the start time of a series of sequential samples affected the ability of the resulting measurement signal to represent the true aerosol concentrations. For example, if a long filter sample is initiated at the apex of a sharp peak in concentration, the resulting measurement does not represent the true changes in aerosol concentrations well. This does not occur for staggered filter samples since more than one sample is collected during a sharp peak (assuming δτ < peak width, which is the case for our test data). Therefore, multiple sequential time series, but only a single staggered time series, were generated for each modeling run. For example for τ = 4 h, four unique sequential sampling schedules were possible as defined by the following filter start times: For τ = 6 h, six unique sequential sampling schedules were possible, and for τ = 8 h, eight unique schedules were possible.
For both the sequential and staggered cases perturbations due to random measurement error ( , see Eq. (2) were added to the simulated measurements. Relative measurement errors (κ m ) of 0, 1, 5, 10, 20 and 30 % were considered. A relative measurement error of 20 % is typical for aerosol FTIR spectroscopy (Russell, 2003). The relative errors were applied to aerosol mass, not concentration, since this is the quantity actually probed by FTIR spectroscopy (we use the subscript m to denote mass units). A sampling flow rate of 10 L min −1 was multiplied by the given sampling intervals τ to calculate the sampling volumes used to convert between mass and concentration. We assumed that the relative error in the measurement of sampling flow rate was 2 %. The relative error in the measurement of the sampling time interval τ was assumed to be so small in comparison to the errors in measured mass and flow rate that it could be neglected. The relative uncertainties in measured mass and flow rate were summed in quadrature to calculate total, relative uncertainty in aerosol concentration, denoted as κ c , where the subscript c indicates concentration units.
The relative error was combined with a fixed error term (σ 0,m ). The fixed error term represents, for example, the standard deviation of masses detectable on blank filter samples. The fixed error term is typically on the order of 0.1 µg for aerosol FTIR samples on Teflon filters. We conservatively set σ 0,m to 0.5 µg, which is at the upper end of the range of blank uncertainty values measured in previous FTIR studies (Maria et al., 2003;Gilardoni et al., 2009Gilardoni et al., , 2007. A fixed error of 0.5 µg is consistent with the selected minimum sampling interval of 4 h (Table 1). Defining detection limit as 3σ 0,m , 4 h of sampling would be required to ensure that almost all (> 97 %) of the organic functional group samples representing HOA and OOA collected during the time period covered by the test time series were above detection limit (Fig. S1 in the Supplement). We also modeled σ 0,m = 0.1 µg. The results were insensitive to this change so are not included here.
Taking the relative and fixed errors, total measurement error σ as a function of concentration c was calculated with the linear error model described by Eq. (7). Linear dependance of total measurement error on concentration is a widely applicable assumption (e.g., Ripley and Thompson, 1987). σ 0,c is in units of concentration and is therefore a function of a given τ and the sampling flow rate. The concentration perturbations due to the total measurement error were assumed to be normally distributed around a mean of 0 with σ representing 1 standard deviation of the distribution: ∼ N (0, σ (c)).
By setting the means of the distributions to 0 we have assumed that the simulated measurements are not affected by systematic measurement artifacts. Systematic measurement artifacts depend strongly on the measurement technique in question and even the specific batch of materials used (e.g., filter lot). They can be positive or negative, and can depend on sampling time (e.g., Kirchstetter et al., 2001;Subramanian et al., 2004). If known, measurement artifacts could be addressed in this modeling framework by the setting the means of the distributions to non-zero, time-dependant values.
For κ m = 0 %, σ (c) and hence were set to 0 to represent the ideal case of absolutely no perturbations due to measurement error. For each modeling run with non-zero κ m , 20 different realizations of the randomly generated error perturbations were generated and added to the measurement signal. Results are reported as averages over the 20 different realizations of each noisy measurement signal.
Hourly resolved time series were constructed from the simulated measurement signals using the post-processing methods outlined in Fig. 2 as follows. The sequentialinterpolated solutions were constructed by interpolating between sequential data points at the chosen resolution of 1 h with step and linear functions. The smeared solutions required no further data processing: the time series g produced by simulating staggered sampling were taken as is. The deconvolution solutions were obtained by first modifying the simulated measurement vectors according to the chosen boundary value method: full -the full measurement vectors were used in subsequent calculations; truncatedvalues at the boundaries of the measurement vectors corresponding to partial samples were removed (and a corresponding truncated convolution matrix H r was calculated by removing rows in H corresponding to these boundary values); uniformly and reflectively padded -boundary values corresponding to partial samples were removed but the measurement vector was then padded back to the original length of g via the uniform and reflective methods, respectively.
Following treatment of the boundary values, deconvolution with TSVD and Tikhonov regularization was performed with the respective functions in Regularization Tools Version 4.1 for MATLAB (Hansen, 2007). These functions utilize the SVD of the given H to find the pseudo-inverse matrix H + and solve Eq. (3). The choice of the TSVD and Tikhonov regularization parameters is critical as illustrated in Fig. 5d and e. Since we aimed to model the best-case scenario and we had access to the true time series, we chose optimal regularization parameters k and λ that minimized the RMSE error between the hourly resolved solution and true time series for each simulation run. In reality, the true time series one seeks to measure can not be known a priori and one must employ an alternative parameter choice method based only on available measurement data. A number of such methods have been devised (e.g., Hansen, 2007Hansen, , 1992 and two of these methods are discussed briefly in Sect. S1 in the Supplement. Inves-tigation of these methods is beyond the scope of this work, but it must be stressed that less accurate solutions would be obtained with these parameter choice methods than with the optimal, RMSE-minimizing method employed here.
The post-processing methods for increasing time resolution were judged according to two criteria: 1. Recovery error (RE): the overall ability to recover the true time series from a set of simulated measurements. We define RE as the mean absolute error (MAE) between a given calculated, hourly resolved time seriesf consisting of n data points and the corresponding true, original time series f : RE is the combination of two types of errors: the error due to the measurement noise simulated by the linear error model described by Eq. (7) (which we denote as Measurement Error, ME), and the error resulting from increasing the measurement time resolution from 4, 6, or 8 h to 1 h via one of the post-processing methods. We denote this latter error as upsampling error, UE (upsampling is a signal processing term used to describe the use of interpolation to increase the resolution of a signal; our use of the term here is not strictly applied to interpolation, but to methods of increasing resolution in general). UE can be calculated by the following equation where ME is defined as the mean absolute error between a true time series f consisting of n data points and a time series f produced by a hypothetical instrument subject to the same random error modeled by our linear error model, but capable of measuring at hourly rather than 4-8 h time resolution. We choose to report the bulk of the results as RE to represent the total error resulting from the upsampling of noisy measurements. In the final discussion Sect. 9 we also report typical UEs to illustrate how much of the total error can be attributed solely to the upsampling process.
2. Peak capture: the specific ability to recover the magnitude and timing of the daily concentration peaks (indicated by the circle markers in Fig. 1). The ability of a method to accurately capture peaks in concentration is important for health and regulatory concerns (e.g., for identifying exceedances of particulate matter air quality guidelines). We assess peak capture through a peak plot, which displays the mean difference between the daily peak concentrations in a calculated hourly resolved time series and the corresponding peak concentrations in the true time series, against the mean difference between the times that the peaks occur in the calculated time series and in the corresponding true time series.
In the discussion of the modeling results we pay particular attention to the measurements of 57 h-long time periods with 4 h samples subject to 20 % measurement error. This represents a typical FTIR experiment. However, the dependance of recovery error on time-series period, filter sample length, and the level of measurement error is also discussed.

Sequential sampling results
This section identifies the best representation (step or linear) of atmospheric concentrations using sequential samples and discusses the issue of sequential sampling schedule. These questions are answered with reference to overall recovery error (RE, Sect. 4) since the ability to capture peak concentrations with sequential samples does not depend on the interpolation method employed (unless higher order interpolation functions are used).
Figure 6a-f shows the dependance of RE on the start time of the second sample of the day for HOA and OOA time series that were constructed by step and linear interpolation between sequential samples of sampling length ( τ ) 4, 6, and 8 h (T = 57 h and κ m = 20 %). The start time of the second sample of the day represents sample schedule. For both HOA and OOA, RE is generally lower for the linearly interpolated solutions than the step interpolated solutions, and RE increases with increasing τ . Figures S2 and S3 indicate that linear interpolation results in lower recovery error than step interpolation over the full ranges of simulated timeseries periods and relative measurement errors, respectively. Therefore not surprisingly, linear interpolation is a more effective method for post-processing sequential measurement than step interpolation. Figure 6g plots the maximum difference in RE between two different sampling schedules (designated as maximum RE) against τ . Maximum RE can be thought of the extra error that may be incurred if a bad sampling schedule is chosen for a particular type of time series. For τ = 4 h, RE is relatively independent of the particular sampling schedule employed. Additional error of 0.13 to 0.20 µg m −3 is possible if the suboptimal sampling schedule is chosen. This compares with mean REs of 1.49 for HOA and 1.85 µg m −3 for OOA time series constructed with linear interpolation. Maximum RE increases with τ . For τ = 8 h, additional error of 0.42 to 0.90 µg m −3 is possible if the suboptimal sampling schedule is chosen. In comparison mean REs were 1.96 for HOA and 2.51 µg m −3 for OOA time series constructed by linear interpolation. Since the optimal sequential sampling schedule cannot be known a priori, the additional error that may be incurred due to this scheduling effect must be kept in mind when interpolating between sequential samples, particularly for measurements requiring sample-collection times > 6 h. This scheduling effect is not as important for staggered samples, assuming the staggering interval is small enough, since measurement data points are collected more frequently.

Deconvolution results
Eight different combinations of regularization and boundary value methods (Fig. 2) were used to recover time series by deconvolution for each set of simulated staggered measurements. For T = 57 h and κ m = 20 %, Fig. 7 displays the mean RE of deconvolution solutions recovered by TSVD and Tikhonov regularization as a function of the boundary value method employed (tiled by τ and time series type), and Fig. 8 displays a peak plot for each combination of regularization and boundary value method. At this relatively high level of measurement error, only a small reduction in RE is gained from having access to the full measurement vector (which would require the collection of partial samples, Sect. 3). Furthermore, there is little difference in the mean RE of the three methods that assume boundary values are not accessible for measurement: no clear and consistent advantage can be discerned between the truncated, uniformly, and reflectively padded methods for this T and κ m . Assuming the boundary values are known, the average RE of HOA time series sampled with 4 h filters and recovered with TSVD regularization is 1.16 µg m −3 . If the boundary values are not known, the corresponding value averaged over the three other boundary value methods is 1.34 µg m −3 .
The corresponding OOA-TSVD results tell the same story: RE of 1.42 µg m −3 with the full measurement vector vs. an average of 1.65 µg m −3 over the three methods without. The results are similar over the full range of time-series periods simulated (Fig. S4).
In addition, at this level of measurement error similar recovery errors are obtained with TSVD and Tikhonov regularization. It is only for the OOA time series measured with 4 h samples that a difference between the two regularization methods can be clearly discerned, with TSVD regularization resulting in lower recovery error than Tikhonov regularization. Although the REs are similar, concentrations recovered with Tikhonov regularization are generally lower than the true concentrations. As a result, the overall average concentrations of time series recovered with Tikhonov regularization are 10-20 % below the corresponding averages of the original time series. The average concentrations of the time series recovered with TSVD regularization are very similar to the true values (Fig. S6).
The peak plots (Fig. 8) indicate that in terms of peak capture no boundary value method is clearly better than the others for κ m = 20 %. Solutions with TSVD regularization are marginally better at capturing peak concentrations than solutions with Tikhonov regularization, although the differences are still well within 1 standard deviation of all the modeled solutions (vertical bars in Fig. 8  2 µg m −3 . The daily HOA and OOA peak times can generally be reproduced to within 1 h. If the level of random measurement error is very low, less than approximately 5 %, recovery error is strongly reduced if one has access to the full measurement vector (Fig. S5). If partial samples cannot be known, solving the system of equations with a truncated measurement vector results in lower error than padding the measurements out via the uniform or reflective methods. Taking all of these together we recommend TSVD regularization with the truncated method for dealing with boundary values if partial samples cannot be known. In addition to the analysis presented in this work, further advantages of TSVD regularization are that it is conceptually simple and intuitive, and it is straightforward to apply through the SVD products of the convolution matrix H.

Overall comparison of methods
Based on the findings of the previous two Sects. 5 and 6 we now make an overall comparison of methods for increasing measurement time resolution in the context of the practical considerations and limitations of each method. Interpolation between sequential measurements is the least sophisticated, cheapest and easiest of the methods for increasing time resolution out of those that we have investigated. Staggered sampling requires multiple sampling lines to collect multiple samples at once. More staggered samples are required to cover a given time period than would be required to cover the same time period with sequential samples. This extra cost of staggered sampling compared to sequential sampling is illustrated in Fig. 9. For example, to measure a time series of period 64 h, 61 staggered 4 h samples would be required compared to only 16 sequential 4 h samples. The sample number difference is even greater for larger τ . To measure a time series of period 64 h, 57 staggered 8 h samples would be required compared to only 8 sequential 8 h samples.
Attempting to recover the true time series from a set of staggered measurements by deconvolution requires even further effort and analysis time and expertise. Although tried and tested deconvolution and regularization algorithms are readily available (Hansen, 2007), the choice of a reasonable regularization parameter may not be straightforward. If a bad regularization parameter is chosen, a substantial additional error could be added to a solution (Fig. 5). Given the extra cost of staggered sampling and the error risk associated with regularization, it is necessary to establish precisely what, if anything, can be gained from the use of these more sophisticated tactics for a variety of different experimental conditions. Figure 10 displays the mean recovery error as a function of κ m for HOA and OOA time series processed by the sequential, smeared, and recovered methods (T = 57 h and τ = 4 h). Two sequential cases are displayed. Both were obtained by linear interpolation. "Sequential low" corresponds to the sampling schedule that resulted in the low-  Figure 9. The number of filter samples N of length 4, 6, and 8 h required to measure time series of period T h sequentially and by staggering the samples at an interval δτ of 1 h. The number of sequential samples is given by T / τ and the number of staggered samples is given by (T − τ + 1)/δτ . est RE, and "sequential high" corresponds to the sampling schedule that resulted in the highest RE. The RE difference between these two cases is the sequential sampling effect identified in Fig. 6g. The recovered solutions were produced by deconvolution with TSVD regularization and the truncated method for dealing with inaccessible boundary values (Sect. 6). As expected, in the absence of measurement error, recovering a time series through the deconvolution of staggered measurements is the best method for achieving high time resolution. On average, true concentrations can be reproduced to within 0.25 µg m −3 for HOA and 0.48 µg m −3 for OOA with this method (RE is not zero because of the truncated measurement vector). However, measurement error is unavoidable, and the presence of only 5 % error is sufficient for the recovered method to lose its RE advantage over the less sophisticated sequential and smeared methods. At the 20 % level of relative measurement error characteristic for aerosol FTIR spectroscopy, the differences in mean RE between the optimally scheduled sequential, smeared, and recovered are very small. For HOA, mean RE is 1.49, 1.39, and 1.33 µg m −3 for the sequential low, smeared and recovered time series, respectively. However, if a suboptimal The "sequential high" and "sequential low" time series are constructed by linear interpolation between suboptimally and optimally scheduled sequential measurements, respectively. The recovered solutions were obtained with TSVD regularization and the truncated boundary method.
sampling schedule is chosen, mean RE for the HOA time series could be as high as 1.58 µg m −3 . In a real experiment there would be no way of knowing what the optimal sequential sampling schedule was (unless a complementary independent measurement was available), and therefore whether a sequentially measured time series would be subject to the higher amount of error or not. Collecting staggered samples is one option for avoiding the sample scheduling effect. The peak plots corresponding to the REs shown in Fig. 10 for κ m = 20 % are displayed in Fig. 11. Both the optimally and suboptimally scheduled sequential solutions are slightly worse at capturing peak concentrations then the smeared and recovered solutions. For example, peak HOA concentrations are underestimated by an average of 4.28 µg m −3 in the optimally scheduled sequential solution compared to 3.32 and 2.74 µg m −3 for the smeared and recovered solutions respectively. For the OOA time series, peak concentration values OOA Sequential high Sequential low Smeared Recovered Figure 11. Peak plots for time series of period 57 h measured with 4 h samples subject to 20 % measurement uncertainty processed by the sequential, smeared and recovered methods. The "sequential high" and "sequential low" time series are constructed by linear interpolation between suboptimally and optimally scheduled sequential measurements, respectively. The recovered solutions were obtained with TSVD regularization and the truncated boundary method. The peak plots are explained fully in the main text in Sect. 4. are reproduced, on average, very accurately in the smeared and recovered solutions, being overpredicted by only 0.85 and 0.43 µg m −3 , respectively. The same peak concentrations are underestimated by 1.94 µg m −3 in the optimally scheduled sequential solution.
A key variable included in our numerical model is the filter sample length τ . Figure 12 displays mean RE against τ for the same cases shown in Figs. 10 and 11. Again T = 57 h and κ m = 20 %. It is interesting to note that mean RE does not depend strongly on τ for the optimally scheduled sequential, smeared and recovered cases. For example, if 4 h samples are used to construct an hourly resolved OOA time series using the smeared method, true concentrations can be reproduced to within an average of 1.81 µg m −3 . If 8 h samples are used to construct the same hourly resolved time series via the same smeared method, the reproduction The "sequential high" and "sequential low" time series are constructed by linear interpolation between suboptimally and optimally scheduled sequential measurements, respectively. The recovered solutions were obtained with TSVD regularization and the truncated boundary method.
error is only slightly greater, 2.15 µg m −3 . However in the case of suboptimally scheduled sequential measurements the increase in RE with τ is considerably greater because the sequential sampling scheduling effect increases with increasing sample-collection time (Fig. 6g). Whether or not the differences between the sequential, smeared and recovered methods are significant depends on the specific aims of a given experiment. If the priority is to achieve low overall error over long time periods when measuring a concentration time series with 4 h samples subject to 20 % relative measurement error, linear interpolation between sequentially collected samples is likely to be a suitable enough choice for achieving hourly time resolution. Additional error may be inadvertently introduced through choice of a suboptimal sampling schedule but the extra practical costs of staggered sampling (Fig. 9) would be avoided. On the other hand, if one was particularly interested in accurately measuring peak OA concentrations and had the ability to run multiple sampling lines at once, then staggered sampling with no further data processing would be the best option for achieving hourly time resolution (Fig. 11). A combination of sequential sampling during stable OA concentration periods and staggered sampling during peak periods (e.g., morning rush hours, afternoon peak in photochemistry) could be an excellent strategy for intensive field campaigns.
Our analysis suggests that in scenarios similar to the case studied in this work there is little benefit to be gained (in terms of both overall error and peak capture) by running staggered measurements through a deconvolution algorithm. This is surprising given that in the absence of perturbations to a measurement signal, true concentrations can be recovered precisely from a set of staggered measurements (Fig. 5b). However, once non-ideal, practical realities such as random measurement error (even as low as 5 %) and the inability to collect partial samples are taken into account, signals recovered by deconvolution approximate true concentrations only as well as smeared and interpolated signals, even with optimal choice of regularization parameter. Considering that in a real experiment the optimal regularization parameter is not known, we do not recommend the deconvolution of staggered measurements as a method for increasing time resolution, unless the level of relative measurement error is extremely low (< 1 %).

Comparison of HOA and OOA results
Differences between the HOA and OOA test time series were reflected in the modeled recovery errors and peak concentrations. The absolute concentrations averaged (±1 standard deviation) 4.99 ± 4.85 µg m −3 in the HOA time series compared to 8.09 ± 5.66 µg m −3 in the OOA time series. The daily HOA concentration peaks were sharp and occurred early in the mornings, while the daily OOA concentrations peaks were broad and generally extended throughout the full afternoon (Fig. 1). For all post-processing methods, the HOA REs were ∼ 0.5 µg m −3 less than the OOA REs, which is likely because average HOA concentrations were lower than average OOA concentrations. However, OOA peak concentrations were captured more precisely than HOA peak concentrations. On average for 4 h samples, HOA peak concentrations were underestimated by 2.34-4.16 µg m −3 more than OOA peak concentrations (Fig. 11). We speculate that sharper peaks are more difficult to reproduce by upsampling low time resolution measurements than broader peaks. Systematic studies are required to further explore how time series characteristics (e.g., average concentrations and peak widths) affect various metrics of recovery.

Interpretation of errors
The REs (Eq. 9) we have reported indicate to within what concentration range one can measure true aerosol concentrations, on average, with hourly resolved time series constructed from noisy measurement samples of length 4-8 h. These REs are a combination of random measurement error (ME, which we modeled with the linear error model described by Eq. (7) and upsampling error (UE), as explained in Sect. 4. UE represents the error associated solely with the increase in time resolution from 4-8 to 1 h. UE can be calculated with Eq. (10).
To illustrate how the errors break down for the case T = 57 h and τ = 4 h, Fig. 13 displays the upsampling errors, and the UE fractions of the total error as a function of κ m for HOA and OOA time series constructed for the sequential high and low, smeared and recovered cases. In each case, the UE/RE fraction decreases substantially with increasing κ m from 76-84 % at κ m = 1 % to 10-27 % at κ m = 30 %. For the sequential and smeared cases this is because UE decreases and ME increases with increasing κ m . For the recovered case, absolute UE is less dependent on κ m (it is always less than 0.83 µg m −3 ), and the decreasing UE/RE fraction results mainly from the increase in ME with increasing κ m . The inverse relationship between UE/RE and κ m indicates that although total recovery error decreases with an increase in analytical accuracy (decrease in κ m , Fig. 10), the fraction of the total error resulting from the upsampling process increases.
For FTIR levels of relative measurement error of 20 %, UEs represent only 19-47 % of total RE in the sequential, smeared and recovered cases. In absolute terms, 0.30-0.75 µg m −3 of error can be attributed specifically to the process of constructing an hourly resolved time series from a set of 4 h samples. This means that if FTIR sample collection was improved so that it was possible to collect samples over 1 h instead of 4 h, the precision of the resulting hourly resolved measurements would be improved by only 0.30-0.75 µg m −3 , relative to hourly resolved time series constructed from 4 h samples (the accuracy of the measurement will depend on the analytical bias and measurement artifacts of the technique in question). This statement is true even for the simple case of linear interpolation between suboptimally scheduled sequential measurements. This absolute upsampling error range represents only 1.7-4.7 % of the average daily HOA and OOA peak concentrations, and 3.7-15.2 % of the average of all HOA and OOA concentrations in the test time series (Fig. 1).
One way to frame these errors is to consider each combination of noisy 4-8 h measurement samples and postprocessing method as a self-contained measurement technique or instrument that measures OA concentrations at hourly resolution. For example, submicrometer size distributions measured with a scanning mobility particle sizer (SMPS) are typically considered as a standard, self-contained measurement. In fact, SMPS measurements are a combination of particle electrical mobility measurements and an inversion algorithm. SMPS inversion algorithms are analogous to the post-processing methods we have tested here, and are even based on the same underlying mathematics of deconvolution (e.g., Pfeifer et al., 2014), although it is not necessary for the modern SMPS user to know this fact. In this framing, the total error of each hourly resolved OA concentration measurement (RE) can be considered as a combination of random error in the underlying measurement (ME) and error introduced by the processing algorithm (UE). UE is the error cost of increasing the measurement time resolution.
Taking this interpretation further, one can also use estimated concentrations to characterize the equivalent bias and error of the hourly-resolution measurements as a whole, analogously to the way bias and error would be characterized for any new instrument. An example of equivalent bias and error characterization is provided in Sect. S5 for the sequential high and low, smeared, and recovered cases considered in Sect. 7. We have not quantitatively characterized equivalent errors for these cases because Fig. S7 indicates that the post-processing methods alter the structure of the errors in the estimated concentrations, and the linear error model described by Eq. (7) is no longer applicable. Therefore, further work would be required to find a more suitable error model and to quantify equivalent error. However, the example still demonstrates how the hourly resolved outputs of the postprocessing methods that we have tested can be treated in the same manner as the output of any given instrument or measurement technique.

Conclusions
Aerosol measurement techniques with high analytical detection limits require long sample-collection times at atmospherically relevant concentrations, which results in poorly time-resolved measurements. We investigated combined sampling and post-processing methods for increasing the resolution of time series produced with 4-8 h-long samples. The absolute concentrations we sought to recover ranged from 0.13 to 29.16 µg m −3 with mean values of 4.99 (HOA) and 8.09 µg m −3 (OOA) (Fig. 1). Linear interpolation between sequentially collected samples is cheap, simple and surprisingly effective in terms of both overall recovery error and daily peak capture. However, sequential samples are subject to a sample schedule effect, which can add up to 0.56 µg m −3 to overall recovery error (Fig. 6). Staggered sampling avoids the sample schedule effect and it is up to the experimenter to decide if the extra practical costs of staggered sampling (e.g., Fig. 9) are worth this benefit. Recovering a time series through deconvolution of staggered measurements is only useful at low values of relative measurement error. For κ m > 5 % the recovery errors of recovered solutions are comparable to those obtained via the smeared method (Fig. 10). Since deconvolution costs extra analysis time and expertise, and there is a risk that further error can be added to a solution through the bad choice of regularization parameter, we do not recommend this approach for post-processing staggered measurements in scenarios similar to the case studied in this work. If a deconvolution algorithm is applied, we recommend using TSVD regularization because it resulted in more accurate average concentrations over full sampling periods, and marginally better peak capture and REs than Tikhonov regularization.
Our numerical modeling has indicated that for κ m = 20 %, one can measure concentrations to within a range of 1.33-2.25 µg m −3 , on average, with hourly resolved time series constructed from samples of length 4-8 h using the best-case sequential, smeared or recovered methods. Daily peak concentrations can be reproduced to within an average of 0-4.3 µg m −3 and peak times can be reproduced to within an hour. Surprisingly, for the case T = 57 h and τ = 4 h, only 19-47 % of the overall recovery error can be attributed to the actual upsampling process. In absolute terms, this indicates that measurement precision would only be improved by 0.30-0.75 µg m −3 if samples could be collected over 1 h instead of 4 h.
The total and upsampling errors we have reported represent only small fractions of the average daily peak concentrations in the HOA and OOA test time series. There-fore, post-processing methods are effective techniques for increasing the time resolution of OA measurements requiring long sample-collection times. Application of these methods should be considered as a good alternative or complement to other methods of achieving high time resolution, such as instrument redesign for rapid sample collection, which in many cases may be prohibitively expensive.
These conclusions are based on the two time series we have investigated, which included sharp (high gradients), broad (low gradients), large magnitude, and relatively flat regions (Fig. 1). However, further work is required to test the generality of the conclusions by applying these sampling strategies and post-processing methods to different time series types (e.g., cooking organic aerosols, which may display even sharper peaks in concentrations). The theoretical and modeling frameworks provided in Sects. 3 and 4 do not depend on the specific test case in question and can be applied to time series of any variable.
The Supplement related to this article is available online at doi:10.5194/amt-9-3337-2016-supplement.