Comparison of nitric oxide measurements in the mesosphere and lower thermosphere from ACE-FTS , MIPAS , SCIAMACHY , and

We compare the nitric oxide measurements in the mesosphere and lower thermosphere (60 to 150 km) from four instruments: the Atmospheric Chemistry Experiment– Fourier Transform Spectrometer (ACE-FTS), the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), the SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY), and the SubMillimetre Radiometer (SMR). We use the daily zonal mean data in that altitude range for the years 2004–2010 (ACEFTS), 2005–2012 (MIPAS), 2008–2012 (SCIAMACHY), and 2003–2012 (SMR). We first compare the data qualitatively with respect to the morphology, focussing on the major features, and then compare the time series directly and quantitatively. In three geographical regions, we compare the vertical density profiles on coincident measurement days. Since none of the instruments delivers continuous daily measurements in this altitude region, we carried out a multi-linear regression analysis. This regression analysis considers annual and semi-annual variability in the form of harmonic terms and inter-annual variability by responding linearly to the solar Lyman-α radiation index and the geomagnetic Kp index. This analysis helps to find similarities and differences in the individual data sets with respect to the inter-annual variations caused by geomagnetic and solar variability. We find that the data sets are consistent and that they only disagree on minor aspects. SMR and ACE-FTS deliver the longest time series in the mesosphere, and they agree with each other remarkably well. The shorter time series from MIPAS and SCIAMACHY also agree with them where they overlap. The data agree within 30 % when the number densities are large, but they can differ by 50 to 100 % in some cases.


General comments
In this work, the authors compare nitric oxide concentrations in the mesosphere/lower thermosphere (MLT) retrieved from four instruments: three limb sounders and one solar occultation spectrometer, using different spectral ranges: infrared, sub-mm, and ultraviolet.Tracing the NO concentrations in the MLT is important for distinguishing the roles of anthropogenic factors and solar activity impacts on the atmosphere.Continuous monitoring of the MLT is necessary for proper understanding of the observed effects, but difficult to achieve from the technical point of view, so the intercomparisons of NO retrievals in the overlapping periods are invaluable for estimating long term trends.
I think that the work fits the scope of the journal and is definitely worth publishing.However, the way the data is presented does not allow me to recommend publishing the paper "as is".At the moment, the manuscript is overloaded with figures (I counted 56 panels in the main text and 60 panels in the Appendix while the total volume of the paper is 23 pages including references).At the same time, certain comparison plots, which one can expect to see in this kind of study, are not shown.I believe that the readers will benefit from the changes suggested below.These should not take much time, but the results will be clearer, and the overall impression will be better.
First, it is obvious that the data are of the same order of magnitude and one does not expect large differences.In this case, showing absolute values on the majority of plots is not recommended since human eye can not tell 10% difference coded in the shades of red (or blue).From my point of view, difference plots or "anomalies" are much more representative.At the beginning, one can show a plot with spatio-temporal coverage of different datasets to illustrate the problems of finding a good area for one-to-one comparisons, but the rest should be done in differences.It is also difficult to make any conclusion when the plot like Fig. 3-5 is cluttered with points and their error bars.On the other hand, differences from the averages in the running mean will clearly show, which of the instruments do group and which ones do deviate from the mean.The same is true for vertical profiles: instead of showing 3x4 =12 panels with large error bars, one can present the same information on 3 panels in the form of deviations of running mean from average over 4 instruments.This will remove the redundancy and make the presentation clear.
We agree that the number of figures is large and we have removed Fig. 4 and Fig. 6 from the main text in an updated version of the manuscript.Instead, we refer to the corresponding figures in the appendix.The comparison at hand, however, is difficult for several reasons and we wanted to make sure not to omit any possibly important figure.
We are dealing with sparsely sampled data, e.g., every 10 days (MIPAS) or every 15 days (SCIAMACHY) only.In the case of MIPAS and SCIAMACHY, their MLT measurements were scheduled to coincide one day every month.But these are not many days and even less when looking for coincident days of all four instruments.Therefore, one-to-one comparisons are not meaningful because of the insufficient number of matching pairs.Furthermore, because of the large gaps, a running mean is not meaningful as deviations from it can have other sources than measurement or retrieval errors.We tried to address both issues, the low coincident statistics by averaging the profile differences over a whole latitude range and the low time statistics by using the multi-linear regression.Thereby, the multi-instrument regression fit is a kind of "running mean".Using a mean profile to compare against is not feasible because, as mentioned above, this exists only on coincident days of all four instruments.These are only a few and would then give no reasonable statistics.
Second, the manuscript lacks a discussion of the sources of the differences while the work is not supposed to be purely descriptive.The authors represent different groups and they know their retrievals from inside; it is not a third-party analysis of a "black-box data".I believe, a sentence or two can be added to each significant difference discussed in the manuscript (i.e."we explain the differences of MIPAS by uncertainty/bias of the radiative input from below, which affects the non-LTE populations and, therefore, interpretation of the radiation" or "the SMR has large error bars in this area due to...").The physical reasons like diurnal or day-to-day variation can also be added to the discussion.Partially, they are present in the manuscript, but this is not sufficient for creating a sound picture.
The sometimes large error bars on some individual daily zonal mean data points result from a low number of measurements from which the daily zonal mean value is derived.
The assumption about the MIPAS bias does not apply: the impact of the upwelling radiation flux on the non-LTE populations is negligible.As a possible explanation, the following sentence was added to the third paragraph of the conclusions: "It is likely that this MIPAS high bias around 110 km is introduced by an inappropriate temperature a priori (MSIS, known to be too low in this region) which maps onto the retrieval."Third, seasonal variations are shown and discussed, but correlation coefficients for the series and correlation between NO concentration retrieved from different instruments, Kp index, and Lyman-alpha are not present while it would be interesting to see these links in a form of a table with correlation coefficients built separately for each instrument and for three latitude zones.If the coefficients appear to be negligible, this should be mentioned and explained.
In a multi-variable case like here, the correlation coefficients contain no useful information.Without considering partial correlation, any correlation coefficient would be small, even if there is a clear relationship.We have, however, improved the regression coefficient figures by omitting non-significant values instead of marking them only.
Maybe, instead of showing the maps of regression coefficients, one can use the regressions to retrieve the comparable NO distributions for the same time and show difference maps for them.One-to-one probability density plots, typical for the intercomparisons of this kind are missing while they provide information both about the biases and about the spread of the retrieved parameter in the compared datasets.
The information on the biases and the spread is shown in the lower panels of Figs.11-14.As mentioned above, this composite regression fit is as close as we can get to a reasonable multi-instrument mean.The residuals shown in the lower panels of Figs.11-14 are our measure of the different instrument data.The map in Fig. 15 then summarises the mean differences of the individual data from this regression fit.Concerning the significance of the fit, we have updated the discussion about the ACE-FTS and MIPAS relative mean residuals in paragraph 7 (the middle part) of Sect.6 to read: "The values shown in Fig. 13 are above the 95% significance level determined by the F-test of the regression fit (Brook and Arnold, 1985;Neter et al., 1996).We find that the ACE-FTS number densities agree with the composite within about ±30% with only a few large negative values at low latitudes between 80 km and 90 km.The MIPAS number densities are consistently larger than the composite fit by about 10% to 30% above 105 km.Between 70 km and 105 km at middle to high northern latitudes, the MIPAS number densities are smaller by about the same amount.In the southern hemisphere at these altitudes, the MIPAS number densities agree with the composite fit within ±30%."

Minor comments
Besides an excessive number of plots, which is discussed above, there is also a problem of technical organization of the information on the plots themselves: the columns in figures like Fig 1,2,15-19 share the same color scheme and OX axis that is absolutely logical.What is not logical is duplicating the color bar and OX axis labeling.I believe, the readers will benefit from larger plots and a single color bar at the bottom of a plot.If the scientific software does not allow this kind of changes, this can be done in any graphical software afterwards.
This will be taken into account in an updated version of the manuscript.
The color scheme used in the maps shown in Fig. 15-17 differs from the one used in Fig. 1,2 and Fig. 18,19.I understand that this is defined by the software used for building the maps, but I think it is worth using the same scheme for consistency (for example, both Fig. 17 and Fig. 18 show regression coefficients, while the representation differs, puzzling the reader).
Figure 15 is a difference plot and the intention of the color scheme is to easily identify positive (red) and negative (blue) deviations.The color scheme of Figs.16 and 17 will be changed to be consistent with the other figures mentioned in an updated version of the paper.
The OY axis in Fig. 7-10 should be zoomed to 70-120 km.Otherwise, more than a half of useful space is lost.
The OY scale in the mentioned figures was chosen to leave room for the legend on top.If the reviewer insists on changing the scale, we will do so in an updated version of the manuscript.