Interactive comment on “ FATES : A Flexible Analysis Toolkit for the Exploration of Single Particle Mass Spectrometer Data

This is a great piece of coding and work. Owing to the great variety of data formats of particle mass spectra and the copious numbers and types of clustering and relational information that is sought, having the flexibility provided by FATES is critical. The authors have included every iteration of data analysis and presentation that I could think of. Nonetheless, I am confident that the user base for this wonderful analytical tool will grow quickly and with the added insight of many users will become even more useful.


Introduction
Single-particle mass spectrometers (SPMSs) yield the size and chemical composition of individual aerosol particles in real time.SPMSs can generate tens of single-particle mass spectra per second, utilizing laser desorption-ionization (LDI).However mass spectra generated by LDI exhibit ion signals only qualitatively dependent on particle chemical composition (e.g., Ge et al., 1998;Gross et al., 2000;Hinz and Spengler, 2007) and also can exhibit large particle-toparticle variation even for chemically uniform particles (e.g., Steele et al., 2005;Wenzel and Prather, 2004;Zelenyuk et al., 2008a, b).Thus SPMSs generate both large and highly complex data sets, requiring sophisticated data analysis techniques for exploration and distillation of information.
As Table 1 illustrates, individual laboratories have independently developed a variety of SPMSs, and two commercial versions have also been produced.Due to the many iterations of SPMSs that exist and the lack of a standard data format, individual laboratories have had to build their own data analysis software, though these toolkits are often not reported in the literature (Table 1).Only two of these data analysis toolkits have been made publicly available, YAADA (www.yaada.org)and ENCHILADA (www.cs.carleton.edu/enchilada).YAADA is specific to the lab-built and commercial versions of the aerosol time-of-flight mass spectrometer (ATOFMS), a version of SPMS (Allen, 2005).ENCHI-LADA is reported to be compatible with three SPMS designs: SPASS, PALMS, and TSI ATOFMS (Gross et al., 2010).However, the authors could only find reported use of the ENCHILADA toolkit for TSI ATOFMS and SPASS data sets.Despite their age these toolkits are still utilized, with YAADA being the toolkit of choice for the burgeoning SPMS community in China.The differences and limitations between these two software tools have been extensively described previously (Gross et al., 2010), but a brief summary is given here.YAADA is an object-oriented frame-Published by Copernicus Publications on behalf of the European Geosciences Union.Not reported RSMS j (RSMS-II k , RSMS-III l ) Not reported SPASS m ENCHILADA n,o SPLAT p (SPLAT II q , mini-SPLAT r ) SpectraMiner s , ClusterSculptor t

Commercial instruments
Guangzhou-Hexin ATOFMS/SPAMS (currently manufactured) u YAADA v TSI ATOFMS (discontinued) w YAADA x , ENCHILADA y a Brands et al. (2011).b Klimach (2012).c Gard et al. (1997).d Su et al. (2004).e Allen (2005).f Hinz et al. (1994).g Trimborn et  al. (2000).h Hinz et al. (2011).i Thomson et al. (2000).j Carson et al. (1995) work implemented in MATLAB that allows user-developed script-based data exploration and can also leverage the extensive set of built-in functions within MATLAB.This allows a degree of flexibility in creating graphical outputs and exploring ATOFMS data in tandem with other data types.However, the extensive amount of code required at the time of development to create the object-oriented framework for YAADA has made the toolkit highly susceptible to updates and changes in MATLAB.Thus continued use of YAADA requires either using outdated MATLAB versions or extensive maintenance of the scripts underlying the toolkit.Also considerable knowledge of YAADA-specific data classes and framework in addition to general MATLAB understanding is required to be able to manipulate the data.Additionally, YAADA's accessibility is limited for novice users as there are no graphic user interfaces (GUIs) for data exploration.
In comparison ENCHILADA is a software package with a graphical user interface.Therefore data analysis functions and workflows built into ENCHILADA are leveraged by interacting with the GUI, without the need to create scripts or interact in a command line interface.However any addition of functionality requires modifying the underlying source code and rebuilding the software.ENCHILADA relies primarily on SQL for accessing and storing the mass spectral database and Java for implementation of the GUI, though a number of other drivers, toolkits, and C++ are also integrated into its implementation.Thus modifications are a significant programming task and likely infeasible for scientists not highly experienced in programming and computer science.
Motivated by the continued use of SPMS and the limitations of the currently available software, we have developed a new flexible analysis toolkit for the exploration of single-particle mass spectrometer data (FATES).To encourage the widespread adoption of this toolkit, it was purposely designed in an extensible manner to adapt to the ever-evolving and varied implementations of SPMS.It is clear that building open-source tools in a standard, well-known platform and creating a work flow with user-defined parameters for data analysis would be beneficial to the SPMS community, increasing the rate of knowledge discovery and enabling collaboration between researchers.For example, maintenance and alterations of the software should be easily accessible to chemists and aerosol scientists without extensive training in computer science.In addition, any new toolkit should not be explicitly limited to expected common analyses, which may be built into GUIs, but should give the user complete freedom to access, explore, and utilize SPMS data and also integrate with other temporally and spatially resolved data sets.Finally any framework needs to make careful consideration of both memory and speed constraints imposed by the possible large size of SPMS data sets.Given these constraints, the FATES toolkit (Sultana et al., 2017) was developed completely in the MATLAB environment, and an extensive manual was written and is provided in the Supplement.MATLAB is a popular language for numerical data analysis by scientists because it has an extensive library of well-documented built-in functions, utilizes libraries optimized for speed in matrix manipulation, and can support both graphical and script-based exploration of data.By taking advantage of native MATLAB data types, FATES is easier to maintain and computationally more efficient than YAADA, the previous publicly available MATLAB toolkit for SPMS analysis.The FATES framework allows users to creatively explore their data without previous assumptions or constraints with simple scripts and by leveraging built-in MATLAB functions.Additionally FATES of-fers a suite of GUIs for interactive visualizations which can aid both novice and expert users in calibration of raw data; exploration of data sets using temporal, size, and mass spectral filters; and investigations of clustered data sets.FATES is the first publicly available SPMS toolkit to allow creative, efficient script-based data mining along with GUI-based visual data exploration and calibration all within a single programming environment.

FATES software description
FATES is implemented completely in MATLAB.No other languages, drivers, or software are needed to utilize FATES.In addition FATES was purposely developed in a manner that demands few presumptions about the instrument, particle, and spectral variables collected by the SPMS.For example one SPMS may only record the speed and time of detection for each particle, while another SPMS may also record the power of the desorption-ionization laser pulse.These differences are handled easily as FATES allows users to specify, define, and change the instrument, particle, and spectral variables they would like imported into and saved to a study.To make these alterations, users only need modify simple scripts where the desired variables are listed, and then these changes are carried over throughout the entirety of the source code.This flexible but simple design gives high utility for the SPMS community because it prevents users from needing expert knowledge of any language and having to search for and make line-by-line or structural changes within the source code.Detailed instructions for making these simple modifications are included in the FATES manual (Supplement M-5) and commented within the code.As distributed, the FATES source code already contains the necessary modifications to read in data sets from three SPMS designs: ATOFMS, AL-ABAMA, and TSI ATOFMS.In addition FATES avoids the explicit creation of new class objects, which minimizes the lines of source code and number of scripts by over an order of magnitude when compared to YAADA.This greatly minimizes the maintenance needed to keep FATES compatible with future versions of MATLAB.FATES has been tested for compatibility with MATLAB versions 2014b through 2016b.

FATES data architecture
SPMS data imported within FATES is stored within separate variables for the experiment description, the particle data, and the spectral data.A SPMS data set imported into MATLAB via FATES is referred to as a FATES study, the data architecture of which is comprehensively detailed in the FATES manual (Supplement M-4).Logically, the data mostly consist of one-to-many relationships from study to experiment, experiment to particle, and particle to spectral peaks.The data are most typically loaded once and then accessed and filtered in bulk.Therefore, it is more efficient to organize the observed measurements into denormalized matrices for particle and spectral data, where key information is duplicated in each matrix.
Each FATES study stores a data structure that contains a number of user-defined fields (e.g., instrument name, operator, location) to describe the experiment in which the data within the study were collected.Each row of the structure describes a unique experiment, which pertains to a unique experiment identifier (ID).All particle data (e.g., speed, power of desorption-ionization laser pulse) are stored in a MAT-LAB matrix.More specifically, each particle within a FATES study has a unique two-column particle ID.The first column of the particle ID is the experiment ID, previously described, to which the particle belongs.This framework allows users to easily select for particle or spectral data collected during a specific experiment within a FATES study that contains data from multiple experiments.The mass spectral data for all particles in the FATES study are held in an external binary file.Users can easily and quickly retrieve spectral peak data (e.g., m/z, area, height) for user-selected particles using functions provided by the FATES toolkit (Supplement M-6).The spectral data when imported are then stored in MATLAB cell arrays or matrices.Each peak for all of the spectra within a FATES study has a unique four-column peak ID.The first three columns of the peak ID are the experiment ID, particle ID, and polarity indicator of the spectrum to which the peak belongs.Note that each FATES study contains auxiliary data structures that list the name of the variable (e.g., particle speed, peak area, peak ID) that each column in a data matrix holds.Thus all data within a FATES study are self-contained and self-described, from experimental conditions to peak information.Therefore despite the flexibility of the FATES framework, users can still share FATES studies without confusion or need for external README files to determine the source and identify of the data.

FATES optimization
Considerable work has been completed to optimize the FATES framework for memory demands, speed, and ease of use.An ATOFMS data set collected at Bodega Bay, CA, in February and March of 2016 is used throughout this paper to illustrate the speed of data analysis within the FATES toolkit.This data set contains 1 386 042 dual-polarity singleparticle mass spectra as well as particle data for an additional 11 454 356 particles that were detected in the light-scattering region but did not generate spectra.All FATES analysis is performed in MATLAB 2014b with an Intel Core i7-4930K CPU running at 3.4 GHz with 16.0 GB of RAM.Run time comparisons, summarized in Table 2, are made using the same computer utilizing a version of YAADA, which had been maintained by Kim Prather's research group to be compatible with MATLAB 2013a.
To begin working with a SPMS data set, a new FATES study has to be created (Supplement M-2).This process only  2).Note the version of YAADA maintained by Kim Prather's research group is not able to import these data sets into MATLAB for comparison.FATES has also been designed so that additional data can be added to an existing study without having to re-initialize the entire data set (Supplement M-A).This is especially useful for field studies, where daily examination of the data is required, but initialization of increasingly large data sets can become onerous and time consuming.
Once a FATES study is initiated, it is crucial to efficiently handle the spectral data.Users may desire to examine data sets with millions of mass spectra, and each spectrum can contain hundreds of peaks.SPMS spectra data formats usually contain mass-to-charge (m/z) ratio and area for each peak, but they may also specify peak width, peak height, and other values.This amounts to many gigabytes of data, and therefore the trade-off between making all the spectral data available and managing memory requirements had to be taken into consideration.MATLAB facilities for tables were considered, but they are more appropriate for heterogeneous data, whereas in our case all the spectral data are numeric or binary indicators.We also found MATLAB mem-ory mapped files to have unpredictable performance, and it was difficult to append data rows because matrices are stored in column order.We determined the best way to build up and maintain a large matrix of spectral data, without keeping it in memory, was to create a single external binary file, append to it as needed, and provide a lightweight interface so that FATES programs, or other users, could easily execute functions against the file.Essentially, this interface is an API (application programming interface), which takes a regular MATLAB command or script, shuffles data in/out of memory in blocks of rows, executes the commands against the data in memory, and gathers results.The block sizes are set to default values that are reasonable for current workstation capacities but can also be changed as appropriate in the future.The possible commands are unconstrained, but summaries and filtering operations are most appropriate and most likely to be called for.
In addition, the binary format minimizes both the time required to write and retrieve spectral data and the storage requirements for the file.Retrieving all 1 386 042 dual-polarity mass spectra in a single call from the external binary file created for the Bodega Bay study and loading it into a MATLAB array only took 3.3 min.It is important to note that this example is used for benchmarking purposes, but rarely would users need or choose to load into and hold all spectra information for entire large data sets within memory at the same time.The FATES framework automatically employs data pointers so that the whole binary file does not need to be read if the user is only attempting to retrieve spectra from particles which make up a subset of all the data in the FATES study.
Run times for retrieving all and contiguous subsets (i.e., the raw data files from which the study was created were contiguous) of the dual-polarity mass spectra from the FATES and YAADA studies are summarized in Table 2. Retrieving a subset of 50 000 mass spectra from the FATES study (2.7 s) was over 6 times faster than in the YAADA study (17.3 s).Searching through and sorting data by particle information is also quickly performed in the FATES framework.By holding all hit particle data in memory, any operation querying the particle data does not require any data input/output calls and therefore is nearly instantaneous in MATLAB.For example retrieving the particle IDs for all submicron particles from the Bodega Bay study only took 0.01 s, while performing a similar analysis on the much smaller YAADA study required 0.6 s.
The quickness of the FATES framework depends partially upon minimizing retrieval calls to external files outside of the MATLAB workspace.Thus formatting of the data held within the MATLAB workspace has been carefully considered to minimize the memory demands of the FATES framework.Because spectral data are held in an external binary file, users can choose to store spectra data in the study at a high resolution without increasing the study's working memory.When retrieving spectra from the external binary file, users may specify the resolution to hold the data in the workspace.This feature allows users to tailor the resolution of the spectra in the workspace to its application and therefore the memory requirements.Mass spectral data loaded into the MATLAB workspace are stored in a singleprecision floating-point format, saving memory compared to the standard MATLAB double-precision format, which requires twice the space.Particle data stored within a FATES study have also been formatted to minimize memory demands.If the user loads data into a FATES study for both detected particles that generated mass spectra (hit) and detected particles that did not generate spectra (missed), only hit particle data are stored in the particle matrices in MAT-LAB.Most data analyses utilize spectra, and therefore only hit particle information is necessary, but hit particles usually make up a small fraction of total particles detected by the light-scattering region of the SPMS.Therefore storing missed particle data in MATLAB memory would take up large amounts of space needlessly.All missed particle data are written to an external binary file and can be loaded by the user into MATLAB using a script provided in the FATES toolkit.Furthermore particle data stored in MATLAB memory are split between a single-precision and double-precision matrix.It is not necessary to store most data collected for particles (e.g., speed, laser power) in a double-precision format, so this choice further relieves the space required to store all particle data in memory.Therefore storing data for 1 million hit particles in memory where three variables require double-precision format (particle ID, time) and three variables only need single-precision format (speed, size, laser power) only requires 0.036 GB, which is very feasible for most modern desktop computers.Finally because all SPMS data when loaded into a FATES study are held in native MAT-LAB data types, interacting with the data requires very few FATES-specific functions.Almost all common analyses can be patterned off a basic script, provided with demonstration data in the FATES toolkit and relying on a handful of MAT-LAB built-in functions and matrix indexing, which makes the FATES framework accessible and powerful for both expert and novice users.

Data analysis within FATES
In this section we provide a brief overview of common analyses that can be performed on SPMS data within a FATES study.However it should be mentioned that it is impossible to describe or predict all data analyses and plotting options easily available to FATES users due to the extensive library of built-in and user-developed MATLAB functions.A large array of analyses can be performed using concise code (Supplement M-6), with only a few examples quickly discussed here.By utilizing logical indexing, particles and spectra can be filtered using any single or combination of particle and mass spectral characteristics (e.g., particle size, peak area at a certain m/z, etc.).Binning of particles and spectra by these characteristics, such as binning data based on time, can be accomplished in a single line with the built-in function histc.Additionally lists of particles can be compared with the built-in function intersect.Grouping data based on algorithmic clustering of the spectra is also easily performed.Clustering methods commonly used by the SPMS community such as k means, hierarchical clustering, and k medoids are built in to MATLAB, and ART-2a, a fast adaptive resonance algorithm popular among ATOFMS users, is supplied in the FATES toolkit.Clustering data, which necessitates a large number of matrix operations, can be performed quickly even with naïve user scripts because MATLAB utilizes BLAS, LAPACK, and proprietary libraries which speed up common linear algebra computations.Clustering 100 000 particles from the Bodega Bay study with ART-2a (vigilance factor = 0.80, learning rate = 0.05) in the YAADA study required 70 min; however improvements in the ART-2a scripts in FATES allow the same analysis to be completed in only 2.1 min.With the built-in MATLAB k means function, the same data was grouped into 15 clusters in 2.9 min (77 iterations) in FATES.Finally other types of data can be easily loaded into MATLAB and examined along with the SPMS data.While the FATES toolkit allows flexibility in script-based SPMS data analysis, graphical tools can also be an effective way to explore the data and quickly identify trends and patterns.To this end the FATES toolkit includes GUIs, built within MATLAB, which allow users to easily examine trends in spectra based on particle metrics such as size and time, and cluster and spectral characteristics.Figure 1 is a screen capture of the FATES spectra explorer guiFATES, displaying data for 46 432 particles.This spectra explorer has been modeled after ClusterSculptor, a SPMS data analysis GUI developed by Zelenyuk et al. (2008a) that has not been made publicly available.To initiate guiFATES, the user provides the function with the mass spectra, two user-selected particle metrics, and cluster data for a set of particles.A description of the functionality and abilities of guiFATES is given below.

Average mass spectrum
The main panel of the guiFATES display is the heat map of the individual particle mass spectra.Each row is an individual mass spectrum with peak intensity indicated by color.The user can choose to display the provided mass spectra peak intensity utilizing a linear or log10 scale.The logarithmic scale makes it easier to visually detect relatively small peak intensities in the spectra, while the linear scale helps users visualize absolute differences between peak intensities.In Fig. 1 the logarithmic scale has been selected.Users can choose to provide any two characteristic particle metrics, such as particle size, time of detection, laser pulse energy, or total ion intensity, which are displayed in the left panels.In Fig. 1 particle time and size have been provided.Clustering information is displayed in the right panel.The cluster or group assigned to each particle is indicated by the color of the points on the right, while the location on the x axis is a user-provided clustering statistic for each particle.The clustering statistic provided for display in Fig. 1 is the dot product of each normalized particle spectrum with the normalized representative spectrum of the cluster to which the particle had been assigned.However, the user can provide any clustering or neighbor statistic they feel is effective for exploring their data set.The top plot in guiFATES is the average of all the provided spectra, and immediately below is plotted a select average cluster spectra, specified by the user in the display parameters.The line color in the average cluster spectra plot matches the colors used to indicate the assigned cluster for each particle in the right vertical plot.The bottom of the guiFATES windows contains all the display, sorting, filtering, and grouping parameters that the user may select and change.
guiFATES provides the user with many options for displaying and exploring the data, and all functionalities are thoroughly detailed in the manual (Supplement M-7).A check box allows the user to display all data with or without grouping by cluster.In addition the user can select to sort the data by any of the particle metrics in the vertical side panels or by a m/z value in the spectra.In Fig. 1 the data are displayed by cluster and sorted by size.Figure S1a in the Supplement is a screen capture where the same data are not

Population metrics for selected node Cluster dendrogram Output Select
Selected node

All clusters in node
Figure 2. Screen capture of a dendroFATES window showing the cluster tree or dendrogram for 30 input clusters.The cluster contributions to the user-selected node are shown in the plot on the left.The particle data for the selected node are automatically plotted in a guiFATES window (Fig. S1).
grouped by cluster and have been sorted by peak intensity of m/z −35.While users may initially provide guiFATES with a large amount of data, they will likely desire to display smaller selections at a time to enable better visual exploration.This can be accomplished in a number of ways within guiFATES.Users can use mouse clicks to quickly zoom in and out of a single plot using MATLAB's native figure handling capabilities.guiFATES is designed so that when this occurs all plot axes within the GUI are scaled appropriately and instantaneously.Figure S1b is a screen capture where the user utilized this functionality to select the bottom half of the particles in Fig. S1a and also decreased the range of the m/z values displayed.For more complex selections users can enter in filtering parameters so that displayed particles only fall within a desired range of particle metrics, peak intensity of a certain m/z value, or any combination thereof.Figure S1c is a screen capture where the data, sorted by cluster, have been filtered by size (1-2 µm), m/z −35 peak area (0-3000), and clustering statistic (0.8-1).Lastly users can also choose to only display select clusters.Figure S1d is a screen capture utilizing the same filters as in Fig. S1c albeit limiting the display to only clusters 2 and 5.These visual sorting and filtering methods enable users to efficiently inspect data sets and visually discover mass spectral trends, differences, and similarities both between distinct particle types and within populations of chemically similar particles.Due to the high variability and qualitative nature of single-particle mass spectra generated by laser desorption-ionization techniques, clustering algorithms utilized to group SPMS mass spectra within a data set often do not generate a one-to-one relationship between the number of chemical particle types in the population and spectra clusters generated (e.g., Giorio et al., 2012;Murphy et al., 2003;Rebotier and Prather, 2007;Wenzel and Prather, 2004;Zelenyuk et al., 2006Zelenyuk et al., , 2008a)).Therefore it is necessary to leverage expert knowledge either to combine multiple spectra clusters, generated algorithmically, into a single chemical particle type or to further split clusters into smaller groups as has been noted in many SPMS studies of unconstrained aerosol populations (e.g., Dall'Osto and Harrison, 2006;Pratt et al., 2009;Qin et al., 2012).The authors emphasize that there is not a consensus on the most suitable algorithms and thresholds for SPMS analysis and suggest users investigate the previously listed references before embarking on mass-spectral-based algorithmic analysis.However, despite the conditions of initial clustering, guiFATES aids this process by allowing users to visualize all clustered particles at once and combine any number of clusters or split any cluster in any location during the data exploration process.Users can choose to output the particle identifiers of any cluster in the guiFATES window to the MATLAB workspace.All plotting, sorting, filtering, and grouping applications of guiFATES have been tested on a set of 100 000 particles with dual-polarity mass spectra, and at this size all updates to the displayed plots occurred nearly instantaneously, making guiFATES an appropriate and efficient tool for the large data sets common to SPMS analysis.

Scatter plot of input particle metrics
Output Select The advantages and benefits of this general method of data visualization and exploration for refining particle clusters have been discussed at length previously (Zelenyuk et al., 2008) and with the publication of FATES will be available to the SPMS community at large.A specific detail of note is that Zelenyuk et al. (2008) demonstrate that discontinuities in the particle cluster size distributions were characteristic of misclassifications of their mass spectra.Because this technique is not dependent on specific ion markers, it has the potential to be effective for a broad range of particle types but is yet to be extensively explored.guiFATES also enables future investigations of the extension of this cluster-discriminating technique to other common particle metrics, such as total ion intensity.Finally many studies have examined the influences of particle and experimental characteristics on the mass spectra generated from particles of uniform composition (e.g., Neubauer et al., 1998;Reinard and Johnston, 2008;Steele et al., 2003;Zelenyuk et al., 2008b).guiFATES can also be utilized in the exploration of these data sets consisting of a single particle type, where algorithmic grouping of particles utilizing mass spectra is unnecessary or even inappropriate.

dendroFATES: hierarchical cluster relations
FATES also includes two supplementary GUIs which allow the users to graphically select the particles to feed into the guiFATES spectra explorer.dendroFATES is a GUI where the user supplies the clusters and representative cluster mass spectra output from any clustering algorithm of the user's choice.The clusters are then automatically grouped into a cluster tree by a hierarchical analysis performed within MAT-LAB which is displayed in the dendroFATES GUI window.Hierarchical analyses have been utilized previously with SPMS data sets (Giorio et al., 2012;Hinz et al., 2006;Murphy et al., 2003;Rebotier and Prather, 2007;Zelenyuk et al., 2006), but a brief description is given here.The dendrogram links clusters in a binary fashion, creating new groups which are then further linked.Lower linkage heights indicate a higher degree of similarity between groups, and large distances between levels in the dendrogram are indicative of natural divisions in the data set.Figure 2 is a screenshot of the dendroFATES window with a dendrogram generated from the 30 most populous clusters generated using the ART-2a algorithm to cluster a subset of 166 666 particles from the Bodega Bay data set.Zooming in and out of the dendrogram is handled by MATLAB's native graphics functionality and makes it possible to supply dendroFATES with hundreds of clusters and still explore the cluster tree quickly and intuitively.Because the dendrogram allows the user to easily visualize similarities and natural groupings of clusters generated, it is an excellent tool to select clusters for further exploration of the particle and spectral data using the guiFATES tool.Clicking linkages in dendroFATES automatically opens a guiFATES window displaying all particles belonging to the selected node.When a linkage is selected, the fractional cluster contribution to the selected node is dis- played on the right in the dendroFATES window, and the fraction of the selected node to the total population is also displayed in text.Figure S2 illustrates the guiFATES window generated with the node selection made in Fig. 2 when the user chooses to display particles by their cluster label (Fig. S2a) or grouped by the left and right branch (Fig. S2b).As illustrated in Fig. S2a, when guiFATES is populated by dendroFATES, the clusters are displayed in the same order as displayed in the dendogram.Therefore very similar clusters are adjacent in the guiFATES window, assisting intuitive visual comparisons and combinations of data.Because all FATES GUIs are in MATLAB and the user can also access the data programmatically, it is straightforward and fast for the user to iteratively select clusters from the dendrogram in dendroFATES, refine them in guiFATES, output new clusters to the workspace, and feed the new cluster results back into dendroFATES until the user is satisfied with the grouping of the data set.

scatterFATES: user-defined particle relations
The complexity of SPMS data sets means there are numerous relationships that could be explored, and predicting all desired comparisons is impossible.scatterFATES is another GUI used to populate guiFATES with user-selected particles.However, rather than grouping particles via clusters as in dendroFATES, scatterFATES creates a scatterplot of particles using any two particle data metrics the user supplies as the axes.The points are then color-coded by cluster or group.Figure 3 is an example scatterFATES window, where the −35 to −93 m/z ratio is plotted against particle size for the 166 666 particles that had been previously clustered.Once a scatterplot is created in scatterFATES, the user can click on the figure to draw regions within the scatterplot as shown in Fig. 1.All particle data within a created region can then be selected and automatically populated into guiFATES for spectra visualization and exploration.

calibFATES: raw spectra calibration
FATES has been designed so that all aspects and functionalities of SPMS data analysis and exploration are contained within a single programming environment and language.To this end we developed calibFATES, a GUI to quickly scan through raw spectra data files before importation into FATES and generate calibrations to convert raw time-offlight spectra to mass-to-charge spectra.calibFATES allows SPMS users to quickly visually examine generated spectra on the fly without any time-consuming processing, even during data acquisition, to ensure the quality and consistency of the data being acquired.While calibFATES is currently written to be able to read the raw spectra files generated by the ATOFMS and TSI ATOFMS, it could be easily modified to read in any raw spectra file (Supplement M-B). Figure 4 is a screenshot of a calibFATES window displaying a single uncalibrated raw spectrum.Users can scan through and display spectra contained in any raw spectra files within the folder.A calibration can be generated by setting selected times to entered m/z values.To generate as accurate a calibration as possible, it is suggested that users choose peaks with a diwww.atmos-meas-tech.net/10/1323/2017/C. M. Sultana et al.: Exploration of single-particle mass spectrometer data verse set of m/z values that span the SPMS mass spectral range and utilize multiple raw spectra to generate a single calibration.Generating calibration parameters from 20 peaks selected from five spectra has been found to produce generally satisfactory results for ATOFMS data sets.Calibration parameters can be output to a text file for future reference, and any calibration file generated can be loaded into and applied to the raw spectra in calibFATES so that the spectra are displayed as calibrated mass spectra rather than time-offlight spectra.

Conclusions
FATES is the first software package for SPMS data sets to include flexible script-based data analysis and graphical user interfaces for data exploration integrated within a single programming language.Because FATES is designed to be easily extensible to diverse input data formats and implemented completely in MATLAB, a highly documented language popular among scientists, it should be accessible and employable across the SPMS community despite the many independent instrumental designs.SPMS data importation and programmatic and graphical data analyses can be performed quickly in FATES even for large data sets thanks to both speed and memory optimizations and utilization of native MATLAB data types and built-in functions.Within a FATES study data are structured so that complex analyses can be performed using concise code with little reliance on FATES-specific functions.In addition a set of GUIs with many display, sorting, filtering, and grouping functionalities have been developed to assist both expert and novice users to intuitively visualize a complex SPMS data set and create robust particle groupings.For these reasons we believe FATES will greatly improve the efficiency of data processing and knowledge discovery from SPMS data sets.

Figure 1 .
Figure 1.Screen capture of a guiFATES window with data from 46 432 individual particles.

Figure 3 .
Figure 3. Screen capture of a scatterFATES window showing the −35 to −93 m/z ratio plotted against particle size for 166 666 particles.Any two particle metrics can be input into scatterFATES.Two regions have also been created by the user for further inspection in guiFATES.

Figure 4 .
Figure 4. Screenshot of a calibFATES window displaying a single-particle uncalibrated mass spectrum.Calibration data are input and displayed on the right, and particle size and time are displayed on the bottom.

Table 1 .
Summary of SPMSs developed and data analysis packages used.

Table 2 .
Comparison of run times for various operations in YAADA and FATES.
needs to occur once for any data set, but the source code was still designed to minimize the time for study initialization.Despite the large size of the Bodega Bay data set, the creation of the FATES study only took 28.4 min.Even initiating a subset of the Bodega Bay study roughly one-tenth of the FATES study (127 077 dual-polarity mass spectra) in YAADA still required 20.8 min.Small ALABAMA and TSI ATOFMS data sets were also initiated expediently in FATES (Table