Reply on RC1

significant estimate ( P < 0.05), corresponding standard error (± 0.65‰), of the null model, considers the random variability across studies and is also weighted by the sample size of each study. This is now briefly indicated in the abstract: “(P < 0.05 according to estimates of our linear mixed model and weighted by sample size within studies)”. Calculated values of SW-excess depicted a normal distribution, with a mean (±SE) of -3.56 ± 0.33‰, a median of ‑2.58‰ and with the lower and upper limits of the 95% confidence interval being ‑4.22 and ‑2.9‰, respectively, hence also suggesting that overall there was a significantly negative SW-excess.

The paper claims that "Globally, stem water was more depleted in 2H than soil water (SWexcess< 0)", but the study finds that "SW-excess was negative in 184 campaigns (out of 642 campaigns)". Thus, SWexcess values below zero are actually globally rare. When we look at Figure 2, which shows +/-1SE, it is apparent (e.g., imagine a doubling of the error bars to represent an approximation of a 95% CI) that very few sites would have mean SWexcess values significantly below zero. The presentation of the discussion, abstract, and conclusions are framed around a claim that is not entirely consistent with the findings.
AC: In the previous version of our manuscript we had not succeeded at transmitting our main findings while being truthful to the results of the statistical model. In this revised version, we have changed important sections of the results, discussion and conclusions following the comments by this reviewer and those in the same line by the second reviewer. Below, and in line with these comments, this reviewer points out that we had incorrectly stated that our calculated negative SW-excess was "ubiquitous". As pointed out by this reviewer, this would have led the reader to interpret that the significantly negative SW-excess was present across all studies. Instead, we found that significantly negative estimates of SW-excess have been measured in many different types of ecosystems across the globe (but not all), regardless of the prevailing climate. We have now clarified this issue in this revised version by changing the word "ubiquitous" for "widespread" and by stating: "Our meta-analysis revealed that the isotopic composition of plant water did not always faithfully reflect that of its most likely source and this was evident from results from many different types of biomes. The isotopic composition of stem water varied substantially in size and direction of deviation from soil water, but on average was slightly lower than soil water". RC: While I realize that the authors are referring to the mean SWexcess of -3.02 +/-0.65 permil, this is a case where the average is not especially representative of the global behavior, but it is instead driven by outliers (again, see Figure 2). What is the median SWexcess? The authors need to also explain how the "+/-0.65" was calculated, because it is unclear whether this uncertainty value only reflects the variation among the campaigns; does it also include the error from the calculation itself (eq. 3)? The manuscript needs to include a more nuanced interpretation of the findings, recognizing the wide range in values observed rather than over-relying on the mean value.
AC: Our global estimate of SW-excess is not merely the calculated average of all SWexcess. This value (-3.02‰) is the significant estimate (P-value < 0.05), and its corresponding standard error (± 0.65‰), of the null model, which considers the random variability across studies and is also weighted by the sample size of each study. This is now briefly indicated in the abstract: "(P < 0.05 according to estimates of our linear mixed model and weighted by sample size within studies)". Calculated values of SW-excess depicted a normal distribution, with a mean (±SE) of -3.56 ± 0.33‰, a median of -2.58‰ and with the lower and upper limits of the 95% confidence interval being -4.22 and -2.9‰, respectively, hence also suggesting that overall there was a significantly negative SW-excess.

RC:
The closing statement of the abstract, "Our results would imply that plantsource water isotopic offsets may lead to inaccuracies when using the isotopic composition of bulk stem water as a proxy to infer plant water sources' ' still remains true even if only 184 campaigns support it. Perhaps more importantly, the inconsistency and non-ubiquity of negative SW excess values allows the authors to find the climatic effects, which may hint at ways to anticipate or predict plant-source water isotopic offsets.
AC: We thank this reviewer for this constructive comment. We have edited the last part of our abstract (L33) which now reads: "Contrary to previous expectations, we argue that these potential biases are not restricted to saline or arid environments but should extend to many other ecosystems, notably those from wet and cold environments." RC: Lines 30-31: It is speculative to suggest in the abstract that these findings "support the idea that these offsets are caused by isotopic heterogeneity within plant stems". No data were used to directly test this and thus this statement has the potential to mislead readers. I recognize that the authors use their data to argue that they can rule out alternative explanations, and I know that this message is one made and supported in other works by members of this research team, but partially ruling out alternatives does not automatically lead to support for one specific explanation. If the authors choose to keep this line, it should be appropriately framed, for example, as prefaced with a phrase that explains their logic, such as "Because of a lack of alternative explanations, we suggest that these data support". In my opinion, the abstract is more robust without this sentence at all, as the prior sentence is the one that can be more robustly defended.
AC: We agree that this statement could have come across as speculative and therefore, we have removed this sentence from the abstract, also in line with a similar suggestion by reviewer 2.
RC: Lines 124-125: "We expected that in the case where these offsets were the result of methodological artifacts, we would not find any correlation between the magnitude of this offset and environmental or biological variables." I appreciate the authors lay out their forthcoming interpretation, but I do not understand the logic here. There could be methodological artefacts but also biological or environmental controls that are strong enough that they show up despite the influence of methodological artefacts. It also might help to be more specific about which methodological artefacts are being referred to here, because effects of using a mass spec vs a laser spec are directly tested and discussed later. Also, how does this relate to the statement on line 164, "our database did not allow for robust analysis of the potential effects of the water extraction methodology".

AC:
We have substituted this sentence by the following: "We expected to find significant correlations between these offsets and environmental or biological drivers, which should help us identify possible mechanisms underlying these offsets. In contrast, a lack of significant correlations could suggest that methodological artefacts (mainly due to CVD, see Chen et al. 2020) would be more likely to be the main cause of these offsets." This change was aimed to emphasize that we did expect to find significant correlations that would help us unveil the most plausible mechanisms underlying these offsets. In the case where we could not find such significant correlations, this latter result could hint that artefacts related to the process of cryogenic distillation would be the main cause of these offsets, as detailed in the previous paragraph (L87-98). In the methods section, we provide a detailed explanation of why potential artefacts associated to CVD could not be explicitly tested here (L158-165).

RC: Line 141: What would happen if the analysis was restricted to a larger "n" value? The uncertainties in "a-s" and "b-s" must often be very large when n is only equal to 3. Is the number of values that are not significantly different from zero a consequence of this analysis including studies where soils were under sampled? I am not suggesting that the authors change the threshold, but also doing the analysis with a higher threshold might support useful further insights into the meaning of the findings.
AC: Here, we gathered the soil water isotopic compositions reported from 508 campaigns, of which only four were discarded for having less than three observations. Among the remaining 504, there were 20 campaigns that only had three observations. The linear regressions for all these 20 campaigns were non-significant and therefore, although in theory our criteria allowed for estimates of the slope and intercept term with only three points, in practise, this threshold was less than four observations. Furthermore, the number of campaigns for which we estimated the slope and intercept of the SWL with less than five observations was low (<10%). Therefore, we do not believe that the limited number of observations for a very small number of campaigns could have biased our estimates. Indeed, if we ran our analyses to calculate SW-excess excluding all campaigns with less than ten observations for soil water isotopic composition, we obtain an overall estimate of SW-excess, according to the null linear mixed model, that is similar: -3.47 ± 0.69‰. We have added a sentence in the Material and methods section of manuscript that provides this further information: "The number of slopes and estimates of SWL estimated with less than five observations (n < 5) was low (<10%), and we ran parallel analysis limiting fitting the SWL with higher observations (n ≥ 10) and obtained similar results as with n ≥ 3". RC: Line 147: Because "mobile water" is so frequently used to refer to soil water extracted by suction cup lysimeters in the isotope ecohydrology community, this term should be changed. Given that "mobile" is only used a few times throughout the paper, I suggest simply saying "precipitation, groundwater, and stream water" each time.
AC: We thank the reviewer for this comment and we have modified the text accordingly (L27, L147, L278 & L494).

RC: Line 176: Were the SWLs and LMWLs calculated by orthogonal least squares fitting? They should be because both the X and Y data have uncertainty, and fitting by standard least squares can artificially reduce slopes, which could have consequences for the findings.
AC: For those campaigns for which the regression of the SWL was significant, there were no differences between slopes estimated with total-or orthogonal-least squares (Fig. A). Finally, we did not calculate the slope and intercept of the local meteoric water line (LMWL) for each study site, instead, these parameters were obtained from the corresponding studies (L150-154). We have added a sentence in the Material and methods section of manuscript that provides this further information: "The fitting method used was standard least squares fitting, orthogonal least squares fitting was tested parallelly and there were no differences in the estimation of slopes and intercepts of the SWL between both methods"

Fig. A.
Comparison between slopes of the soil water line (SWL) estimated with either orthogonal-lest squares (OLS) or total-least squares (TLS). The line depicts the significant linear relationship between estimates (P < 0.001) with a slope not significantly different from 1 (0.92, with a 95% of 0.77-1.06).
RC: Line 194: What are the criteria for plant and soil sampling to be considered "concurrent"? AC: By "concurrent", we meant that soil and plant samples had been collected during the same sampling day. We have clarified this in the text which now reads "We discarded 14 campaigns because simultaneous observations for plant water collected on the same day were lacking" RC: Line 201: Is this equation missing a term for the product of sigma-as and delta18O? Also, please check your citations -it refers to a book review in Physics Today rather than the book itself. Another citation issue is on 670.
AC: There was a typo in the manuscript (but not in the code used for data analysis). This equation is now corrected in the revised manuscript, and the appropriate citation for it has been updated.
RC: Line 282: The logic underlying this attribution of these results to "evaporative enrichment affecting stem water" needs to be more clearly explained. Evaporative enrichment could lead to positive or negative SW excess values, depending on the slope of that evaporative enrichment. More generally, the conceptual model and assumptions that are guiding the interpretation of the LC-excess values should be more clearly stated.
AC: We agree with this reviewer that this concept is too succinctly explained here, and this statement has been removed from the results section.
RC: Line 282: The logic underlying this attribution of these results to "evaporative enrichment affecting stem water" needs to be more clearly explained. Evaporative enrichment could lead to positive or negative SW excess values, depending on the slope of that evaporative enrichment. More generally, the conceptual model and assumptions that are guiding the interpretation of the LC-excess values should be more clearly stated.
AC: We agree with this reviewer that this concept is too succinctly explained here, and this statement has been removed from the results section.

RC: Line 304: Is this SE a pooled standard error of the by-campaign SWLs, or is it just of the variation among the individual values.
AC: The text now clarifies that this estimated standard error is the SE of the null model. RC: Line 377-385: As written, this statement is not true because plant water generally (by study or by campaign) did overlap with the corresponding soil water, even if the average value is significantly different from zero. It would be more accurate to say "the isotopic composition of stem water varied substantially in size and direction of deviation from soil water, but on average was slightly lower than soil water".
AC: We agree that our original wording would benefit from this clarification that better reflects our results. We have changed this sentence to better agree with the statistical results: "Our meta-analysis revealed that the isotopic composition of plant water did not always faithfully reflect that of its most likely source and this was evident from results from many different types of biomes. The isotopic composition of stem water varied substantially in size and direction of deviation from soil water, but on average was slightly lower than soil water"