Reply on RC1 (Specific Comments)

Line 13 – 14. The author states that “the available methods to determine energy and scalar fluxes from terrestrial land surface are relatively imprecise due to a multiscale of irregularities in the land surface and the turbulent transport mechanisms”. I think that this is statement is not very clear. This imprecision originates from limitations in the precision of the measuring methodologies or does the imprecision refer to the need for spatially distributed measurements?

Response: I agree that this concept is not properly introduced and suggest the following.
'During intensive field campaigns at DE-Fen, additional experiments were conducted for the investigation of scale interactions between the atmospheric boundary layer and the surface, as well as validation of measurement techniques (ScaleX; …).
Line 74: Why is the period between 18 -22 Jul 2016 considered as a reference period?
Response: This is indeed not a necessary qualification. I suggest to remove 'reference'.
Line 75: What was the purpose of the UAV use? And how could they have an impact on this study?
Response: UAVs were used for mapping surface brightness temperature at a larger spatial scale and for in situ measurement of wind field and air quality properties. Those studies were also part of ScaleX and referenced in the paragraph text. In some instances, horizontal transects were flown above and upwind of the setup. Particularly the heavy airborne platforms, e.g., carrying a gas analyzer, generated a downwash jet that could be sensed at some distance. Also, some UAV operations required teams of people moving in and out of the field, e.g., for hourly off-site charging of batteries. The flight tracks were recorded in detail, the movement of people and vehicles not.

Line 78: What does EC stand for?
Response: The definition used in the preprint is 'Ultrasonic anemometer (EC)' (line 77). EC refers to the Eddy Covariance technique in which these instruments are used. No changes are made to the text.
Line 78: I think that it would be very helpful for a reader if the author specify that is the figure 1c and table 1.
Response: I fully agree. The references should be updated as suggested.
Line: 82: What was the reasoning for the number of sonic anemometers used, the selection of the locations of the tripods and the heights of the sonic anemometers?
Response: The number of sonic anemometers was limited by available hardware at the time. Ideally, only 3-axis sonic anemometers would be used. A compromise had to be made for the number of 3-axis sonic anemometers deployed here and elsewhere during the campaign. The height of the 3-axis type instruments on the tripods was chosen to be similar to the ICOS station and other studies using the Eddy Covariance technique on permanent grassland (including DE-Fen; see also the study by Mauder and Zeeman cited on line 80). There are sensitivity limitations for working with current model ultrasonic anemometers close to the surface. A level of 2-axis sonic anemometers closer to the ground (0.25 m) was planned but the appropriate mounting hardware could not be arranged in time for deployment.
Also, from the Figures 1 and 2 it is visible that the sonic anemometers were located between the supporting poles of the DTS mast. Could there be any interference to the sonic anemometers measurements acquired during the period selected in this study from wakes generated from the supporting poles?
Response: Small-scale interference in the wake is possible. The distance from the DTS masts to the EC profiles (on tripods) was 3 m. DTS masts had a diameter of 0.1 m. Increasing the distance would have required for the suspension cable to be mounted higher and with larger tension force to keep the steel cable straight. This could not (safely) be realized during the deployment.
Line 97: Where was the TIR system pointed to?
Response: The TIR system was pointed to the ground at a slanted angle to include as much surface within the DTS box as well as static objects for georeferencing. The guyed lattice mast was planned to be taller, but had to be kept below 10 m for safety of nearby glider planes. This limited our options for the camera viewpoint during the deployment.
Line 101: What is meant that the location was determined in post-processing?
Response: Thank you for the comment. This means that it required a georeferencing step as described in the text below the line and Appendix A. I suggest to change the text to make this clear.
'Each EC, DTS and TIR record was stored with an accurate time stamp and locations were georeferenced in post-processing. The calibration and georeference details are provided in Appendix A.'

Line 116. The air temperature (Ta) is mentioned here, but it is only discussed how it is measured in Appendix A3.3. I would suggest a brief statement about those measurements also in section 2.4.
Response: I agree. This is an issue and I agree with the suggested solution.
'Reference air temperature measurements were made using resistance temperature devices in fan-aspirated enclosures (Table 1

; Appendix A)'
Additionally, regarding Figure 3. What is the sampling frequency of the time series presented in Figure 3?
DE-Fen station indicated wind direction. The north wind sector is frequently observed in summer due to the proximity of the Alps to the south. The situation is maintained for several half-hour periods during the day. Assumed was, that wind from this sector would have limited wake effects on any of the sonic anemometers by mast structures or topography.
Lines 369 -380. Why is this paragraph in the appendix? Isn't this part of the results?
Response: Thanks for pointing this out. Yes, I agree that this paragraph should be in the results section.
What is the physical meaning of the grouping of the clusters presented in Figure A7?
Response: The original study on the TED method discusses the extraction of key variance features from idealized data for each cluster (e.g., a sine wave, or more a ramp shape). As far as I can tell, it did not suggest the same clusters for the application on real-world data, just use of the same number of clusters. This is a shortcoming of a non-supervised machine learning methods. Some outcomes are not easily translatable or transferable. In this preprint we do see that TED clusters can be shown to appear with different spatiotemporal patterns (Figure 8), which I thinks highlights further promise. In order to explore the physical meaning would require the development of an appraoch to reliably aggregate (and/or normalize) data corresponding to each cluster. Figure A7?

Also, what is the impact of variations of atmospheric stability in the results presented in
Response: This is a good question. The preprint does not specifically explore possible correlations between the spatiotemporal patterns in stability ( Figure 4) and patterns in TED classes (Figure 8 and Figure A7). Table 2. What is the reason for mentioning the different ways of parameterizing the atmospheric stability? How is this used in this study?

Response:
The different parameterizations of atmospheric stability are used as background information for the reader. I am not sure at this point if or how any of the parameterizations can help improve the classification of turbulence events. Personally, I found the differences between a lower (1.0 m) and higher (3.0 m) location in the gradient intriguing and indicative, without exploring possible explanations.   Response: Thank you for the suggestion. The colors were picked using recommendations for color blindness safe color scales (see, e.g., Colorbrewer by Cynthia Brewer). An online Daltonism simulator reveals a diverging gradient with distinguishable colors between blue/purple and yellow. I suggest to leave this aspect of the figures unchanged.