Interactive comment on “ Improving automated global detection of volcanic SO 2 plumes using the Ozone Monitoring Instrument ( OMI ) ”

This paper presents a new technique in order to detect volcanic SO2 eruptions from satellite UV data, that allows monitoring of volcanoes even in remote regions. A set of volcanic eruptions detected in OMI data together with a control data set has been selected and a logistic regression model has been trained, that is able to distinguish real volcanic eruptions exceeding an integrated SO2 emission mass of ∼400 tons of SO2 in a 2x2deg region around the volcano. The advantage of this model is that even weak eruptive or degassing events can be detected and that no background correction needs to applied or that threshold values need to be defined above which an eruption is detected. The paper is well written and the procedures and methods applied are well


Introduction
Volcanic eruptions pose a global hazard due to the potential for emissions to be entrained into the upper atmosphere and transported globally.The addition of these particles can result in significant impacts locally as fine particulate matter in the atmosphere can cause significant health problems (Delmelle et al., 2002, Hansell & Oppenheimer, 2004) and impacts to the aviation industry (Miller & Casadevall, 1999;Prata, 2009) in addition to alterations to the radiative transfer rates through the atmosphere on a global scale as seen following the eruption of Mt Pinatubo (Self et al., 1993), in order to mitigate the possible impacts of volcanic eruptions timely warning of events are essential.The installation of a global network of ground-based monitoring stations would be both costly and impractical however satellite-based monitoring provides the spatial and temporal coverage necessary to facilitate the near-real time (NRT) monitoring of global volcanism (Brenot et al., 2014).Existing techniques employ a threshold approach in order to identify volcanic eruptions however this limits the capabilities in regards to smaller events and can be susceptible to the effect of high background noise levels.The following work outlines a method for the identification of volcanic plumes utilising a background correction factor.The resulting output was processed with a binary classification algorithm in order to identify the strength of the developed methodology to distinguish volcano events from control samples.
The Ozone Monitoring Instrument (OMI), launched on NASA's Aura satellite in July 2004, provides near global daily monitoring of multiple atmospheric trace gases with absorption bands in the ultraviolet (UV) spectral band, and was designed to supersede the Total Ozone Monitoring Spectrometer (TOMS) instrument.Due to its strong absorption bands in the UV, sulphur dioxide (SO2) can be discerned by instruments designed to measure ozone (Krueger, 1983).The ability of satellitebased ozone monitoring instrumentation to detect and monitor volcanic SO2 emissions was first demonstrated by the identification of the eruption plume of El Chichón in 1982 (Krueger, 1983), which led to the implementation of satellite based UV measurements as a volcano monitoring tool (Schneider et al., 1999;Krueger et al., 2008).The low spatial resolution of the TOMS instruments precluded the measurement of SO2 in all but the largest volcanic eruptions (Carn et al., 2003), but OMI's higher spatial resolution (13 x 24 km at nadir) permits detection of smaller eruptions and passive volcanic degassing of SO2, whilst providing daily global coverage (Krotkov et al., 2006;Carn et al., 2013Carn et al., , 2016)).This work utilises the continuous global coverage of OMI to identify and automatically classify volcanic eruption events based on common characteristics.

Existing alert systems
An operational alert system known as the Support to Aviation Control Service (SACS) is currently employed in the assessment of SO2 and ash emitted from volcanoes (Brenot et al., 2012).This service provides near real time (NRT) alerts of anomalously high SO2 amounts and ash indices recorded by three UV instruments; OMI and Global Ozone Monitoring Experiment-2 (GOME-2; maintained on-board two meteorological satellites MetOp-A and MetOp-B) and three infrared (IR) instruments; the Atmospheric Infrared Sounder (AIRS) and Infrared Atmospheric Sounding Interferometer (IASI; also flown on the MetOp-A and B platforms).The method of SO2 alert generation used by SACS (Brenot et al. 2014; http://sacs.aeronomie.be/info/index.php)involves the initial identification of an anomalously high SO2 column amount (>2 DU).When a pixel is flagged the area is analysed in greater detail and an alert is only generated if more than half of the neighbouring pixels also display high SO2 values (>2 DU).The technique developed by Brenot et al. (2014) is subject to certain limitations when utilising UV data including; the systematic noise in the data leading to false alerts and the restriction of retrievals to those that assume a SO2 plume altitude in the lower stratosphere (STL).Therefore, in the development of an algorithm based on OMI data we aim to account for variable background SO2 levels and systematic noise, in addition to using SO2 retrievals assuming a lower plume altitude in an attempt to resolve plumes with lower SO2 amounts, lower injection altitude and more diffuse characteristics.

Data collection
OMI Level 2 total column SO2 (OMSO2) data are publicly available from NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC; http://disc.sci.gsfc.nasa.gov/Aura/data-holdings/OMI/omso2_v003.shtml).These data provide global coverage with a temporal resolution of 1 day at low latitudes and increasing daily observations towards the poles, where measurement swaths overlap.OMSO2 data currently provide volcanic SO2 total column amounts calculated using a linear fit (LF) algorithm (Yang et al., 2007) for three distinct layers of the atmosphere; corresponding to centre of mass altitudes (CMA) of approximately 3 km (lower troposphere; TRL), 8 km (mid-troposphere; TRM) and 17 km (lower stratosphere; STL).These altitudes are based upon atmospheric pressure levels and therefore can display slight variations depending upon the local conditions such as temperature profile (Carn et al., 2013).In order to obtain an accurate estimation of the SO2 column amount the appropriate retrieval must be selected based upon the known or inferred injection altitude of the volcanic plume (Yang et al., 2007), which can be poorly constrained particularly in remote regions with minimal or no monitoring capabilities (Sparks, 2012).Differences between the altitude assumed in the LF algorithm and the true altitude of the plume can lead to errors of up to 20%, provided the assumption is approximately correct (Yang et al., 2007).Due to the focus of this work on a variety of plumes displaying diverse eruptive characteristics, we use SO2 data from the TRL OMSO2 product in order to facilitate identification of eruptions confined to the lower troposphere.The use of one retrieval altitude reduces the need for user input or prior knowledge of the injection altitude of the plume however results in the overestimation of plume mass for features injected above the retrieval altitude therefore this method is for identification and alert purposes as opposed to accurate plume mass calculation.Previous works have provided in depth descriptions of the OMI retrieval algorithms (Carn et al., 2013;Krotkov et al., 2006;McCormick et al., 2013;Yang et al., 2007) with a proven track record in the assessment of volcanic and anthropogenic emissions including identification of volcanic plume sources (e.g., Carn et al., 2008;McCormick et al., 2012;Carn et al., 2013Carn et al., , 2016;;McCormick et al., 2013), volcanic plume tracking (e.g., Carn and Prata 2010;Krotkov et al., 2010;Lopez et al., 2013) and identification of copper smelter emissions (Carn et al., 2007) and other large SO2 emission sources (e.g., Fioletov et al., 2011Fioletov et al., , 2013)).OMI data collected since 2008 are influenced by a row anomaly (the OMI row anomaly; ORA) which results in data gaps in particular rows along the OMI measurement swath.Information on the status of this anomaly is provided by the Royal Netherlands Meteorological Institute (www.knmi.nl/omi/).The ORA data gaps combined with the variation in viewing angle produced by the 16-day orbital cycle of the Aura satellite results in varying influence on OMI SO2 measurements (Flower et al., 2016).Any eruptions identified after the formation of the ORA were investigated with greater scrutiny and excluded where the effect was significant.

Volcanic plume quantification
As a test dataset for our plume identification technique, we identified 79 volcanic eruptions at 27 different volcanoes (Table 1) using the Volcanoes of the World (VOTW) database curated by the Smithsonian Institution's Global Volcanism Program (GVP; http://volcano.si.edu/).Note that, as a result of the way in which eruptions are defined in the VOTW database, several of the eruptions listed in Table 1 actually correspond to the onset of extended periods of volcanic activity, rather than discrete eruptions.For each identified eruption, total SO2 mass detected by OMI was obtained for the registered day of the eruptive event (or the start of the period of unrest) with the preceding and subsequent days analysed where no corresponding plume could be identified on the reported day of eruption.This allowance accounts for any inaccuracies in the assigned eruption date, and allows for the identification of eruption plumes generated after the Aura overpass time (~1345 local time) resulting in a delay in detection.Identification and quantification of volcanic SO2 emissions is complicated by the presence of variable biases and noise levels in the data.These variations are influenced by several factors including the latitude of the volcano, time of year, proximity to pollution sources, and the presence of meteorological clouds (Krotkov et al., 2006;Yang et al., 2007).
In our analysis, three methods (M1, M2, and M3; Table 1) were used to quantify the SO2 loading detected at each location, with the goal of distinguishing volcanic SO2 from background noise.The procedures were developed with the intention of allowing the calculation of volcanic SO2 loading with minimal user input, reducing the possible effects of human error in the classification of what constitutes the bounds of an identified plume.
Method 1 (M1) and Method (M2) differ only in the geographic extent over which OMI SO2 columns are integrated to obtain total SO2 mass (Fig. 1).For each eruption analysed, M1 calculates integrated SO2 mass in a 4°×4° box centred over the volcano location (thus capturing plumes regardless of wind direction).The 4°×4° box encompasses an area which captures most smallmoderate volcanic plumes with few instances of dispersion of emissions outside the region; however, this relatively large sample area also potentially includes increased background noise, which could generate false alerts in locations with higher noise levels.As an alternative to M1, M2 uses a 2°×2° region which, whilst more susceptible to possible plume dispersion beyond the defined limits, is less influenced by noise contamination (Fig. 1).Manual inspection indicated that plume dispersion beyond the defined geographic limits was only an issue for the largest eruptions in Table 2. Figure 1 shows an example of a small volcanic SO2 plume at Piton de la Fournaise volcano (Réunion); here, the M2 region captures most of the SO2 plume that is visually apparent, only excluding some very diffuse SO2 further downwind that is included in the M1 region.
A third method (M3) was developed in an attempt to intrinsically account for the variable noise levels in SO2 data collected in different geographic regions (Carn et al. 2013).We posit that in order to effectively develop a volcanic plume detection methodology without a significant number of false alerts a background noise correction may be necessary.Our technique is analogous to contextual thermal infrared (TIR) anomaly detection procedures used at active volcanoes, where a background radiance value is calculated as a reference against which anomalously high radiance values can be compared (e.g., Wright et al. 2002;Murphy et al., 2011).In the M3 method, the 2°×2° region (M2) is considered the active emission region with a background SO2 offset value derived from the total SO2 mass in the 4°×4° M1 region (Eq.1).Eruptive events that post-date the appearance of the ORA were manually assessed in order to identify whether the ORA data gap significantly impacted the detection of SO2, such as complete masking of the plume in extreme cases (Flower et al., in prep).Additional factors impacting the selection of eruptive events are the presence of meteorological clouds, which can effectively mask any volcanic plume at lower altitudes from a satellite-based sensor (Carn et al., 2013;Krotkov et al., 2006), and the seasonal variation in UV radiation at high latitudes.Cloud masking is due to the high UV albedo of clouds and this, coupled with low UV irradiance, can make SO2 detection at high latitudes during winter months particularly challenging (Telling et al., 2015).Hence the majority of the eruptions analysed here are located at latitudes below 30°.

Control samples
A control group is required to assess whether volcanic eruptions can be distinguished from background SO2 levels.Therefore, for each volcanic eruption analysed (Table 2) a control SO2 mass was calculated using each of the three incorporated methodologies (M1, M2 and M3) for a second date at the same volcano.Assignment of control group analysis dates was limited to a period between 1 st January 2005 and 31 st December 2009.The 2009 cut-off date was employed due to the increasing influence of the ORA after this time, in an attempt to reduce the influence of data gaps on the model output.Control dates were assigned for comparison with each identified volcanic eruption, using an online random number generator (Haahr, 2015; http://www.random.org) to assign a value between 0 and 1825 to each data point.These random values were used to determine the number days from the beginning of the analysis period at which to assign a control date (Table 2).The identified dates were then assigned to each target volcano alphabetically, with a corresponding number of events assigned to each location as number of volcanic eruption analyses performed (Table 2).

Modelling techniques
Modelling procedures were conducted with the Weka 3 software package; a collection of algorithms that can be implemented for data mining tasks (Hall et al., 2009) provided by the University of Waikato (http://www.cs.waikato.ac.nz/ml/weka/).Significant differences in measured SO2 mass were found between the samples due to variations in eruption magnitude, background noise levels and SO2 emission strength displayed by the incorporated volcanoes preventing the calculation of a flat emission threshold for the classification of the eruptive events.
Within the Weka 3 package, a simple logistic regression analysis (Eq. 2) was found to be an effective technique for the classification of volcanic and non-volcanic events.Simple logistic regression is a binary classification technique, here defining volcanic (v) and non-volcanic control (c) events facilitating the development of a linear model constructed from a transformed target variable (Witten & Frank, 2005).The logistic regression equation used here assigns the probability P of the occurrence of a volcanic eruption or degassing event; where e is the base of the natural logarithm, a is the probability when the independent variable (X) is equal to zero, and b represents the rate at which probabilities vary with incremental changes in X, which in this case is the volcanic plume SO2 mass measured in tons.
Output of a logistic regression analysis is assessed against a series of validation statistics that test the accuracy of the generated model.These statistics include overall accuracy, precision and recall, in addition to Receiver Operating Characteristic (ROC) curves.In this analysis, the overall accuracy relates to the percentage of c,orrectly classified events in both the volcanic and control (non-volcanic) samples; however, this statistic alone cannot account for preferential classification of one sample over another (Oommen et al. 2010).Hence precision and recall statistics, characterised by values between 0 and 1, are incorporated in order to identify whether preferential classification is occurring.Precision relates to the accuracy of prediction of a single sample group (volcanic or non-volcanic) whilst recall measures the effectiveness of the predictions themselves (Oommen, et al. 2010).In the context of this study, if a volcanic classification has a precision of 0.9, then 90% of the events predicted as being volcanic in nature are volcanic events, whilst the remaining 10% are misclassified as non-volcanic and will be termed here as 'missed alerts'.In contrast, a recall value of 0.8 would correspond to 80% of observed volcanic events being correctly classified, but this does not take into account any non-volcanic events which are misclassified as volcanic, referred to here as 'false alerts'.The final validation statistic used here is the ROC curve, which represents a method for assessing the rate of accurately classified events against possible falsely classified events.ROC values relate to the accuracy of the classification system implemented with a value of 1 indicating accurate prediction of all events (Oommen et al., 2010;Witten & Frank, 2005).
Logistic regression model calculation was conducted using the k-fold cross validation technique incorporated into the Weka 3 software package.This method segregates the data into k partitions, allowing k-1 folds of the data to be used as a training set with the remaining data used for validation purposes.This method is then repeated with each of the k partitions being used to validate the corresponding model from which it was withheld, with the final statistics comprising an average of the output of all k models (Oommen et al. 2010).We implement a k value of 10 due to the associated reduction in bias compared to k values <5 (Rodríguez et al., 2010;Witten & Frank, 2005).

OMI SO2 measurements
Of the 79 volcanic eruptions analysed, 13 displayed low SO2 amounts (<100 tons), following application of the SO2 correction (M3), on the identified day of eruption.Two eruptions produced very large amounts of SO2: Nyamuragira (Nov 2006;46 kt) and Rabaul (Oct 2006; 550 kt), although use of the OMI TRL SO2 columns is likely to overestimate the actual SO2 amounts in these upper tropospheric or lower stratospheric plumes (Carn et al., 2013).
Excluding the aforementioned very high values, the average M3 plume contained 680 t SO2, approximately 60% of the average of the M2 analysis and 25% of the M1 average (Table 3).The control dataset displays significantly lower SO2 loadings than the volcanic events with an average corrected SO2 mass of 90 t and a maximum corrected SO2 mass of 1040 t.This variation indicates that the volcanic data displays generally higher SO2 levels than the control data, as would be expected.In all of the selection methodologies the SO2 mass detected on control dates was 14-17% of the average mass detected in the volcanic dataset.Box plots were generated to assess the general dynamics of the volcanic and control datasets (Fig. 2).Comparison of these plots confirms the pattern identified in Table 3, with the SO2 measurements on 'eruption' days displaying significantly higher values than the control data.

Initial review
Of the three SO2 mass calculation procedures employed (M1, M2 and M3), the most success was achieved with the background-corrected dataset (M3).None of the logistic regression model investigations undertaken with the M1 and M2 datasets produced more than 55% overall accuracy in the classification of volcanic events, and therefore these data were not investigated further.However, the M3 technique provided the best results with a 77% overall accuracy, with no additional data pre-processing required, therefore this technique was employed for all further assessments and model development.

Model output
The most accurate model consisted of a simple logistic regression applied to the M3 SO2 dataset with an overall accuracy of 76.6% and an ROC of 0.843.This model favoured volcanic precision (volcanic precision of 0.83 vs control of 0.72) at the expense of control recall (control recall of 0.86 vs volcanic of 0.67), which indicates that the model preferentially classifies alerts as control samples, therefore reducing the number of false alerts generated relative to missed alerts.Investigations were undertaken to identify characteristics of volcanic events that facilitated classification and to elucidate the likely cause of the 23% error associated with the model.
Removal of volcanic plumes containing less than 50 t SO2 from the M3 dataset resulted in a ~6% increase in model accuracy.
Eight data points produced false alerts with control events classified as volcanic eruptions, whilst 18 volcanic events were misclassified as controls, producing missed alerts (Table 4).The misclassified alerts were isolated to assess if any common characteristics of these events could be identified, with each individual alerts incorporated into Figure 2  Investigation of the incorrectly classified false alerts (Fig. 2; Table 4) revealed that, due to the random selection procedure used for assigning control sample dates, some of the control SO2 values corresponded to periods of ongoing volcanic activity.
These anomalous control values relate to stronger, persistent plumes, despite not being associated with large or 'initiating' events as reported in the VOTW database; this was the case for five of the nine false alerts (C1, 8, 32, 54 and 57; Table 4).
Two additional alerts were generated as a result of a data gap in the OMI measurements (C10 and 24); this indicates that missing values (characterised by a blank cell to differentiate these from days with data available but no recordable SO2 emissions) are likely to be incorrectly classified by the incorporated model as volcanic events and therefore screening of samples for data gaps prior to incorporation into the model is required to prevent the classification of missing values as volcanic events.The one remaining false alert (C29) was the result of increased noise levels preferentially affecting the M2 over the M1 region, resulting in an artificially high SO2 mass derived from the M3 calculation and a false alert.

Missed alerts
Missed alerts occurred at a higher frequency than false alerts, but a common characteristic of all missed alerts is an SO2 plume mass below 325 t (Fig. 2; Table 4).We attribute the misclassification of volcanic events to four main causes.The first influenced eight of the volcanic events (V3, 13, 20, 23, 28, 32, 33 and 48; Table 4) and is the result of eruptions producing diffuse plumes containing low SO2 amounts close to the OMI detection limit (e.g., small eruptions and/or eruptions to low altitudes).The second cause of misclassification affecting eight samples (V5, 17, 21, 24, 34, 43, 64 and 67; Table 4) is the drifting of the volcanic plume out of the geographic area of analysis (M2) into the region utilised for background classification (M1), causing signal suppression in the M3 methodology.One event (V19; Table 4) was impacted by increased noise in the background classification region, also suppressing the plume SO2 loading in the M3 calculation.The final factor preventing the correct identification of a volcanic eruption (V53; Table 4) occurred at Popocatepetl (Mexico), through the masking of a moderate eruption plume when a large SO2 cloud from another volcano (Soufriere Hills, Montserrat) drifted into the M1 region causing an anomalously high background SO2 mass in the M3 calculation.

Optimisation of event classification
We assessed the impact of varying the maximum SO2 plume mass included in the logistic regression model, to investigate whether the use of a threshold SO2 loading improved the classification capabilities of the model.The volcanic dataset was incrementally filtered to remove a proportion of the data, to identify how this influenced the validation statistics.Each reduced volcanic dataset was incorporated into a logistical regression model with a k-fold validation system; however, the control sample was maintained throughout all of the analyses.The variation in class size produced by the removal of volcanic data actually provides a more accurate representation of the natural system (Oommen et al. 2011), with more control samples than volcanic, as more days are characterised by quiescence than volcanic activity.In each instance the overall accuracy, precision and recall statistics were tracked (Fig. 3) to assess the changes in the model as the minimum incorporated SO2 mass varied.
The linear correlation between control recall and volcanic precision is evident in the comparison of these statistics (Fig. 3b) as well as that between the control precision and volcanic recall.
When all data are incorporated, the model appears to favour volcanic precision and control recall resulting in a model that will display a larger number of missed volcanic alerts than false classification of control samples.When 60% of the dataset is used, the volcanic precision and recall are equal as are the control precision and recall, all displaying values greater than 0.9.The threshold SO2 loading in this case is 360 tons, i.e., if this model were to be implemented any volcanic plume containing less than 360 tons of SO2 would not be identified as a volcanic event.The use of 75% of the volcanic dataset appears to represents a good compromise between variation in the statistics and the elimination of smaller plumes (Fig. 3).The volcanic and control precision are almost equal, indicating that this model is equally effective at predicting volcanic and non-volcanic events respectively, with a higher control recall than volcanic recall (Fig. 3) indicating the tendency of the model to miss smaller volcanic events rather than falsely classify control samples displaying moderate noise levels as volcanic eruptions.Favouring missed over false alerts is a characteristic of the MODVOLC automatic volcanic alert system designed to detect volcanic thermal anomalies (Wright et al., 2002(Wright et al., , 2004)).Comparison of these models could not be conducted as assessment of the MODVOLC system was performed in a qualitative manner, assessing whether alerts were identified in locations where they would be expected (e.g.lava flow fields).
Figure 4 shows the variation of ROC values associated with each of the logistic regression models and minimum SO2 plume mass with the percentage of the total dataset analysed, with the total change in each normalised.The trends in both ROC and SO2 mass threshold show 2 nd order polynomial characteristics with R 2 values of 0.985 and 0.993, respectively.The intersection of these trend lines represents model optimisation, offering the greatest gain in accuracy (ROC) combined with the least impact on the identifiable SO2 plume mass.This optimisation point corresponds to the removal of 22% of the volcanic data, resulting in a minimum incorporated SO2 mass of ~150 t and correlates with that inferred through the comparison of precision and recall statistics (Fig. 3).Application of a 150 t SO2 mass threshold prevents the resolution of smaller plumes, but the original assessment (Fig. 2; Table 4) indicates that SO2 loadings below this value tended to be misclassified anyway.
The model based on 78% of the volcanic dataset has an overall accuracy of 85.7% and an ROC of 0.95, producing 8 false alerts that correspond to those identified in the original assessment, with the exception of C8 (Table 4) which was accurately classified with this model.In contrast, 27.8% of the missed alerts originally identified were no longer flagged; of these five instances, four were eliminated due to their low SO2 loadings with the remaining alert correctly classified as a result of improvements in event classification by the optimised model.
Parameterization of Equation 2 using the 78% model output facilitates the validation of individual records and allows the incorporation of new data points (Eq. 3) through the substitution of X with measured volcanic SO2 mass in tons:

Independent validation
A secondary testing procedure was employed to assess the efficacy of the developed logistic regression models on an independent test dataset consisting of 12 volcanic eruptions (Global Volcanism Program, 2013) not initially identified and displaying variable plume characteristics, and 12 corresponding control samples resulting in 24 data points (Table 5).
The incorporation of an independent investigation allowed the data characteristics isolated in the original analysis to be tested against data not utilised in the training of the model.Classification of the data with the original model containing all data points resulted in an accuracy of 75%, whereas analysis with the optimised model (78% of the data) produced an overall accuracy of 79.2%; a detailed overview of the validation statistics of each model is given in Table 6.The optimised model resulted in no false detections although four volcanic events were missed; these consisted of one sample in which the SO2 plume had drifted out of the analysis area (Soufriere Hills), two weak plumes with SO2 loadings below 60 tons (Cleveland & Lascar) and one moderate plume with SO2 loadings of 255 t (Colima).All SO2 plumes exceeding 390 t were correctly classified as volcanic, therefore we conclude that events emitting less than 390 t SO2 are likely to be misclassified with this methodology.Taking into account the thresholds of the incorporated methods (Table 6) and solving Eq. 3, we find that the minimum SO2 mass that would be classified as volcanic in origin by this model is 378 t.

Limitation
This analysis has indicated that prior to implementation of the incorporated classification technique (logistic regression), prescreening of data samples is required to account for the influence of missing data points and meteorological cloud cover.The incorporated modelling technique automatically interpreted missing values as volcanic alerts subsequently influencing the alert threshold and therefore data gaps must be removed prior to linear regression analysis.Persistent meteorological cloud cover can mask SO2 plumes at lower altitudes from satellite sensors, precluding detection.This effect can be significant at higher latitudes, particularly in winter, and therefore the methodology described here may be limited in these locations.Where high latitude data were available and incorporated into this trial (Bezymianny, Okmok and Cleveland) correct classification occurred in all but one of those days where data was available (one additional control sample characterised by no available data was misclassified) indicating the robust nature of the M3 pre-processing technique employed, however further investigation is required to accurately assess the capabilities in high latitude regions particularly regarding the influence of persistent cloud cover.
The main constraint on SO2 plume detection using this methodology is the detection limit of the satellite measurements used as input (here, the OMI TRL SO2 columns).Indeed, this analysis indicates that the minimum SO2 mass that could be reliably classified as volcanic in origin using the OMI TRL SO2 data is on the order of 400 tons.The lack of a-priori knowledge of volcanic SO2 plume altitude restricts the classification technique to SO2 retrievals corresponding to a single CMA, and our use of the TRL SO2 product does not imply any knowledge of SO2 altitude (which is not required for eruption detection).However, the use of OMI SO2 products with lower noise (e.g., STL columns) or more sensitive SO2 algorithms (e.g., Li et al., 2013) should result in lower detection limits.Furthermore, future UV satellite instruments such as the Tropospheric Monitoring Instrument (TROPOMI; http://www.tropomi.eu/),with better spatial resolution than OMI, should also have lower SO2 detection limits.Another limitation of this methodology relates to issues in the original data, in particular those restricting the minimum plume that can be resolved.This factor is related to the spatial resolution of the original data which cannot be overcome through processing techniques.In order to resolve smaller plumes an instrument with a higher spatial resolution would be required however existing higher resolution instruments sacrifice temporal resolution in order to facilitate the identification of small features and therefore do not provide the daily coverage necessary in the creation of a global near real time alert system.

Conclusion
Through the analysis of operational OMI SO2 measurements (TRL SO2 columns) for 79 volcanic eruptions, a simple logistic regression model allowed classification of volcanic from non-volcanic control events with an accuracy of 80%.Optimisation of the model by progressive removal of input data enabled volcanic plumes containing at least 400 tons of SO2 to be consistently resolved and correctly classified.With an appropriate training dataset, this technique could form the basis of a near real-time volcanic eruption detection scheme, with minimal user input necessary.Individual assessment of specific regions could provide more accurate plume classification, however this would require a significant number of eruptions to facilitate training of the data and therefore would only be feasible where persistently degassing volcanoes are present such as Vanuatu or Indonesia.
We identified some common factors resulting in misclassification of control or volcanic events, including; contamination of the background analysis region with SO2 emissions from a separate volcano; low SO2 emissions and/or low plume altitude (i.e., resulting in emissions below detection limits); advection of SO2 emissions out of the analysis region prior to the satellite overpass; and data gaps.
The implementation of a NRT volcanic eruption alert system based on the technique described here would represent an advance over current systems, such as SACS, which use a simple threshold SO2 column amount to identify significant volcanic degassing events (Brenot et al, 2014).In dispersed volcanic clouds, SO2 column amounts may be low yet the total SO2 loading could be high; hence alerts based on SO2 mass rather than column amount may be more effective in certain situations.However, techniques based on a threshold SO2 column amount would be more effective at identifying drifting volcanic clouds far from the source, since a reference background region is not required and elevated SO2 amounts may be detected regardless of location.Hence some combination of both approaches would likely yield an optimal NRT volcanic cloud detection system.
For example, drifting volcanic clouds could be located based on elevated SO2 column amounts, with SO2 loading then quantified using the approach described here.
for comparison with the overall dynamics of the data.The comparison of missed alerts indicates that each one falls within the lower quartile of the volcanic dataset, whilst the false alerts displayed values consistent with the upper quartile of the control data range (with one exception; Fig.2).The potential causes of the misclassification of events are discussed further in section 4.1.Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2016-206,2016   Manuscript under review for journal Atmos.Meas.

Figure 2 :
Figure 2: Box and whisker plots displaying the spread and distribution of volcanic and control data with lines indicating upper and lower quartiles of the data and the remainder represented by the box region.Additional data points indicate the individual missed alerts in the volcanic data and false alerts in the control data detailed in Table4.

Figure 3 :
Figure 3: Result of the application of a threshold SO2 loading to the volcanic dataset on; a. accurately classified events and b. the precision (no false alerts) and recall (no missed alerts) values for both the volcanic and control datasets.

Figure 4 :
Figure 4: The effect of proportional removal of lowest data points on minimum incorporated SO2 mass from the volcanic dataset and the ROC (receiver operator characteristic) statistic of each model where ROC = 1 implies all events correctly classified.

Table 1 : Characteristics of the methods incorporated in the development of an automatic classification technique Method Sample area size Position Correction technique
Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2016-206,2016Manuscriptunder review for journal Atmos.Meas.Tech.Published: 4 July 2016 c Author(s) 2016.CC-BY 3.0 License.