Suggestions for revision or reasons for rejection  I would like to thank the authors for considering the comments by both
reviewers, and for addressing my main concerns. I think the paper can
be published after addressing three minor points:
* The treatment of uncertainties and the conclusions drawn from
comparisons of numbers with or without uncertainties needs more care (my
reply to points 2.11 and 2.16);
* why I think yx vs x plots are more useful than y vs x (point 2.14);
* and that specific Jacobians for Manus and Sodankylä should be in the
paper (reply to points 2.12 / 2.13).
See below for a detailed outline of what I believe needs to be addressed
prior to publication.
2.1 / General:
A clarification: There are two relevant components of physics here:
atmospheric physics and instrument physics. To me, a physicsbased
calibration means a calibration that takes the instrument physics
into account. To the authors, it means a calibration that takes the
atmospheric physics into account, as radiative transfer indeed does.
The authors take into account explicitly atmospheric physics, but not
instrument physics. I think this needs to be clarified early in the
paper, including in the abstract, as some readers may (like me) expect
an explicit consideration of instrument physics when it says ``based
on physical considerations'', such as some instrument calibration
papers have.
I have noticed that the authors reason in several locations, in replies
to comments of both reviewers, draw conclusions from the expectation that
otherwise the ``the data could not be used in any meaningful way.''.
I find this reasoning problematic. HIRS is an operational instrument
and was designed for weather forecasting. Producing climate data
records is a climate application. It is possible that instrument
behaviour poses problems for climate but not for weather applications.
Therefore, the conclusion ``HIRS is used, thus it can't be too bad''
is not generally applicable. A changing SRF is an example: 4DVAR /
reanalysis might cope with that through semiautomatic bias correction,
but it may pose a problem for climate data record development.
Although I don't expect that the authors undertake a full physics based
recalibration/intercalibration/harmonisation of HIRS as this is a massive
undertaking, I appreciate that the authors now state explicitly in the
conclusions that they assume spectral and radiometric calibration is
constant over the study period. This assumption is only stated in the
very final line of the conclusion; this should additionally be stated
early in the paper.
2.11 / Section 4.2:
I assume those are 1sigma uncertainty estimates like earlier
uncertainties (it would not hurt to state that here, too). That means
(0.5 ± 1.1) K is a good result, and (0.8 ± 0.5) K is still consistent.
However, (1.2 ± 0.4) K is statistically significantly different from 0,
which means there is some residual that the regression fails to correct
for. There are good reasons why this is the case, but this should be
explicitly noted.
2.12 / 2.13 / Section 5.1:
I understand the author's reasoning, and a generic weighting function is
useful for the states purposes. But specific weighting functions for the
relevant atmospheric conditions are relevant to interpret the differences
in brightness temperatures. As your reply shows, even at Lindenberg with
its relatively mild winters, NOAA14 observes the ground in some cases
(which, btw, is not at 0m at either Lindenberg or Sodankylä, I don't
know for Manus, so the Jacobian should stop above 0m). I would expect
that to be common in the Sodankylä winter, which is why I asked for
a Sodankylä winter Jacobian, which unfortunately the authors have not
provided either in the reply or in the revised paper.
2.14 / Section 4.14.2 / Figure 67
As the reviewers are unwilling to repeat their arguments from the
Author Comment (Gierens, 2017), I will repeat them here myself for the
benefit of other readers:
``As UTHi cannot be negative, the difference
UTHi(N15)  UTHi(N14) strongly tends to negative values (i.e. UTHi(N14)
> UTHi(N15)) when UTHi(N15) is small. Given that UTHi(N15) is small, it
is quite improbable that UTHi(N14) is even smaller. At the other end of
the distribution we have a similar phenomenon, as values exceeding 115%
do not occur in our data sets. Thus, given that UTHi(N15) reaches the
upper extreme, it is much more probable that UTHi(N14) remains smaller
than that it would be even larger. This means that, unless all data pairs
agree perfectly, a scatter plot like that in figure 1 must have a rhombic
shape with an surplus of negative ordinate values at small abscissa values
and a surplus of positive ordinate values at large ordinate values. It is
clear that a linear fit through a such shaped cloud of data points must
have a positive slope. It might be that the slope depends on the width
(standard deviation) of the individual distributions but our statement
that it “differs quite substantially from the ideal value of zero” is,
albeit true, meaningless for the problem at hand. Therefore we replace
such plots by simple y vs x plots, see the new figure 1 in the revised
version. In such a plot the problem becomes evident through an unequal
number of points above 1 and below the y = x diagonal line.''
My role here is to review the current paper and not the GE17 paper, but
the reasoning is both incoherent and irrelevant. The authors argue why a
regression line for a UTH difference should be expected to be positive.
That may be, and is equivalent to stating the regression line for a
UTH y vs x is expected to be larger than one. Whether one plots y vs x
or yx vs x makes no difference to this reasoning. And then the GE17
reasoning does not even apply for the present paper, because here the
quantity is in brightness temperature, not UTH, and the authors already
point out the slope is 1 and intercept near 0. So, their GE17 reply is
both incoherent and irrelevant.
In figures 2, 3, and 5, the authors plot yx vs x. They should do the
same for figures 6 and 7. This will make it much easier to read. As
currently presented, it is hard to tell the range of yvalues where the
xvalue is 240 K.
2.16 / Section 5.2:
I disagree that uncertainty estimates are not required to conclude
that numbers are similar. Evidently the authors have calculated
those uncertainties, so it should be easy to add them, along with a
note of how they were calculated. With the estimated uncertainties,
there is a significant difference between the GE17 estimate and
the present estimate. This should be admitted and commented upon.
I suspect the uncertainties in both cases are underestimated due to
the use of a simple linear regression, and that a more sophisticated
form of regression such as errorinvariables models would be required
to get realistic uncertainties. If two independent estimates (GE17 and
the present) of the same value differ by more than one would expect from
their uncertainties, then either the estimates or their uncertainties are
inconsistent/incomplete. There are almost certainly good reasons that
the estimates are different, so what this shows is that the uncertainty
from the linear regression underestimates the actual uncertainty on the
parameters, which includes many other aspects. I don't expect that the
authors go through the effort of a complete uncertainty calculation,
as this is a major undertaking. However, I do think the authors need
to comment that the two estimates differ by more than their uncertainty
and that additional work would be needed to determine the cause.
2.18 / Section 6:
My comment on ``modest and unsurprising'' referred to my conclusion from
your results, which was that it is impossible to produce a homogeneous
data series (I think there is no need for sarcastic remarks in the
response to reviewers). If the authors were to show that the results
using two independent methods were consistent within traceable uncertainty
estimates, which I don't believe they have, that would be more impressive.
But thank you for drawing a less strong conclusion, that addresses the
main objection I had to the paper in its previous form.
2.23 / Figure 4:
It has improved but I still think it's hard to tell what's going on. The
authors may want to experiment a bit more with lines that are thinner yet
but perhaps the nature of the data simply does not allow for a
visualisation that can be followed.
I think it would be useful to add a note to Section 3 of the paper
that relative humidities are reported as integers, as this explains the
substantial digitisation seen in the figures.
2.25 / Figure 8:
I believe averaging kernel shows the derivative of the retrieved parameter
to the true parameter, whereas the weighting function or Jacobian is
the derivative of the retrieved parameter to the measured quantity.  

Please see the comments from the reviewer. The referee has made a significant effort to carefully read the revised manuscript. I have also carefully read the manuscript as well as the comments from the referee, I believe the recommendations made by the reviewer can significantly improve the quality of the manuscript, so please consider the comments and revise accordingly. 
Suggestions for revision or reasons for rejection  Thank you for the constructive conversation. I recommend that the manuscript can be published as it is now.  

Thanks you for your patience and comprehensively responding to the reviewers' comments and of course Congratulations!
Isaac Moradi, Ph.D.
AMT Editor
NASA Global Modelling and Assimilation Office
Greenbelt, MD 20771 