General comments:
The authors made significant improvements to the manuscript. In my view, there remain four areas of concern, which limit the quality of this work.
1) The language still appears to be overstating in some parts of the manuscript, when toning it down would achieve more. It is well understood that it is challenging to develop and maintain a robust analytical framework to measure isotopes in CO2 in small volume samples. It is beneficial to be open and transparent with remaining challenges and imperfections.
2) The use of language amounts to misleading information when the small, yet existing offset between the target values of the applied WS and the experiments are ignored as indistinguishable. The authors themselves acknowledge these offsets in a following discussion, which is received as a contradiction of earlier statements by the reader.
3) Measurement details: It is great that the authors provide a few more details on the composition of the gases. However, the provided information falls short of the information isotope papers are required to provide (i.e., Camin et al., 2025), which was requested in the initial comments. The traceability of the measurements is not clear, the authors state that they use NOAA values. However, to my best knowledge, NOAA doesn’t measure isotopes, these are measured at INSTAAR in all NOAA samples and reference gases NOAA provides. Are the isotope values from NOAA/INSTAAR used to assign the final isotope values to the samples? What is the nature of the pure CO2 for the square peaks in Figure 2B? Does this cylinder contain liquid CO2 and how is this calibrated? Is this gas used for value assignment of the final samples? What is the traceability chain, the reference material these measurements are based on? What is the estimated uncertainty/bias resulting from the WS gases having a composition of 20% O2 in N2, other than a natural air matrix (78% N2, 21% O2, 1% Ar)? What is the applied 17O correction? The measurements potentially cover a wide isotopic range from NBS19 and the atmospheric range at INSTAAR, to the range of –35‰ of the WS. How is scale compression accounted for?
4) The discussion section does not fulfil the common goal of a discussion section, where data should be discussed to demonstrate the performance and the shortcomings of this method. Only after that could it potentially seem appropriate to list potential applications of the method and future work. In my view, at least half of the text in the discussion sections would be better placed in the conclusions. Therefore, the discussions and the conclusions sections should be re-structured or re-written.
Specific comments:
L15: …using a malleable… to minimise gas leakage… (it doesn’t totally prevent leakage or diffusion, just makes it less significant and thereby workable over practical time scales).
L17: “Standard material” is misleading, can equally refer to “measurement standards” or “commonly used materials”, or both. It would be useful if this was more precise.
L22: What are low impact d13C-CO2 measurements?
L23ff: Here and in other places in the manuscript: The authors still appear to try hard to maximise impact, when toning it down might be more helpful to allow the reader to make up their mind based on the presented facts. This point has been made by the other reviewers as well and should be followed. This writing style risks to achieve the opposite effect of what the authors intend to achieve here. “Toning it down” feels appropriate for the degree of novelty of this paper and is strongly recommended. After only reading the first paragraphs of the revised manuscript, it feels appropriate to suggest going through the manuscript again to do that. (I might be misled by a few incidences at the early part of the manuscript). The word “accessibility” is listed twice, all studies are constrained by time and costs. I’d suggest stopping that sentence after “…constrained by sample volume.”
L44: The statement “remain unsuitable” is only true for off the shelf instruments, and it should be mentioned appropriately. Laser-based instruments have been developed that measure single air samples of 1 mL for d13C-CO2 with precisions of 0.04 ‰, published 5 years ago (https://doi.org/10.5194/amt-13-6391-2020). Again, the writing style creates the feeling that the paper could be improved by toning it down.
L65: Here and everywhere else: the term “concentrations” should be avoided and mole fractions used instead (as done in L108).
L65f: “maintaining sufficiently high analytical performance”, or “while achieving our data quality objective for measurement precision of 0.1 ‰.”
L87: This is quite repetitive; the analytical precision achieved with this method was mentioned only 20 lines before.
L91: Change ecologicak to ecological
L122: The blank analysis is a great addition! Looking at Figure 2B, it seems the m/z44 amplitude is around 2V. If this was a typical sample measurement, this could be used to state that the blank peak of max. 15 mV would result in a blank contribution of less than 1%, which would be a valuable statement here?
L136: Please specify what “a simple septum vial” is, or change the sentence to something like “injected through the septum into the target vial.”
L137: Thanks a lot for adding this sentence. It is a lot easier to understand and follow.
L138: Can any gas leakage really be effectively prevented? Or would “to minimise gas leakage” be more accurate?
L148: I am confused. This paragraph begins with the reference to Figure 2A, which shows that a needle with He flow is inserted into the sample vials to flush the sample through the Nafion and into the cryotrap, thereby removing N2, CH4, most of the O2, etc. The purified sample is then taken through a GC column with a He carrier gas, followed by a second Nafion before injection into the open split/IRMS. Thus, the CO2 from the air sample is separated and carried to the IRMS in a He stream for d13C analysis. This seems consistent with the description of the protocol in L154-L168. However, the authors say here that the analysis is directly performed on the atmospheric gas matrix, rather than flushing the sample vial with He etc., which seems to me is exactly what is shown in Figure 2A. I might be getting this wrong, but I feel it is important to clarify this as it seems very contradictory.
L151: I am not sure I understand what “optimise separation peak shape” means. Lowering the temperature of the GC column increases peak separation, but to my understanding, will reduce the sharpness of the peaks. Please clarify.
L160: These rectangular peaks are typically generated from a cylinder with liquid CO2. The authors have stated in their response that no cylinder with liquid CO2 is used in this system. Please include this statement here, especially as it is mentioned that the isotopic composition of this gas is somehow calibrated to the VPDB scale. It is of course OK for this gas to be in the cylinder in the liquid phase, but it needs to be very clearly stated that this gas is only used as mediator and not for final value assignment of the samples. Alternatively, if it is used for final value assignment, this needs to be expanded on, to include how the isotopic composition was assigned to the VPDB scale, and what materials were used etc. Follow Camin et al., (2025), which the authors refer to in their response.
L169: It seems to me that most GC-IRMS systems produce a single peak per sample injection, so this statement is confusing. The wording that this is in contrast to “traditional” systems seems misleading, as there is no novelty with a single injection per sample to be claimed. Rather, it confirms the feeling that this may be fishing for novelty in an attempt to make this manuscript seem stronger and the method more novel than what they really are. There is no loss in saying that “This configuration produces a single, well-defined CO2 peak for isotopic analysis from each sample.”
L171-175: Informative addition, very nice!
L176: Please be more specific and follow Brand 2014 and Camin et al., 2025 in the description of the traceability chain. This is not clear to me. It has to include the fundamental RM this is based on (NBS19, …) it’s assumed isotope value and date. Later on, it seems that isotope values are assigned to the samples based on NOAA standards. As far as I understand, this would be INSTAAR for CO2 isotopes, which has recently migrated from their original scale to JRAS-06, with implications on the isotopic composition of dependent gases. It is important to be absolutely clear.
L178: Here and everywhere else, please ensure consistent use of – and - in d13C–CO2 or d13C-CO2 throughout the manuscript.
L180: The reference gases have no Argon and only 20% O2, which is different from a natural air matrix with 21% O2 and 1% Ar. This may cause a significant offsets, especially if these gases were calibrated using optical instruments for mole fractions or isotope ratios. The authors should clarify to what degree the calibration of these gases and the dependent measurements may suffer from this effect. It should be stated if there is no effect to be expected and a reason for that be provided. Otherwise, an estimate for the degree of uncertainty resulting from this effect should be provided in the discussion of the analyses.
L182: RAMCES platform needs to be explained or referenced. What does this mean? What type of instrument is used for that? Laser-based? Can the magnitude of this effect on the sample results be estimated that stems from the difference in the matrix of the working standards WS1-WS3 and the matrix of natural air?
L192: Figure 2B would be more relevant if it included a zoom to the baseline or the 45/44 ratio trace, to show that separation and stability are achieved, as stated in L237ff.
L232: The red area in Figure 3 can either indicate the measurement precision, or the uncertainty of the d13C-CO2 value for WS1, but not the precision of WS1.
L235: Not sure if “described” is the right word here? Could this be changed to “…is shown in Figure 2B”?
L237: See above comment on L192, these statements could be demonstrated in Figure 2B.
L263: What bias is this referring to? The lack of accuracy resulting from storage at room temperature? More clarity on this is needed here.
L 265: Can “while” be changed for “because”? Or can this sentence be broken in two: “These tests suggests that the septum configuration is not the cause of the 1‰ offset. Additional sealing and sample preservation strategies must therefore be explored to improve the stability of small-volume air samples.”
L279: What is “simple sealing” in Figure 5? Using the butyl-rubber at the septum or the threaded end of the cap only, in comparison to using it at the septum and the threaded end of the cap? I also assume simple sealing cannot refer to the septum+cap on the vial without any butyl-rubber, as this would be the same as shown in Figure 4, and should lead to a 1‰ offset? Please be more specific and relate the terminology in the text to the terminology used in the figures.
L287: “…at room temperature.” It seems the word temperature is missing.
L303: Please add a sentence on the small offset between WS2 target and the test results in the -80C data shown in Figure 6.
L334: I am not fully convinced that these are fully indistinguishable. There seems to be a consistent pattern in Figure 6 and 7 suggesting that the d13C-CO2 averages are shifted towards 13C enrichment by about 0.1 ‰. I strongly feel that this cannot be ignored and has to be mentioned. The authors themselves start to discuss this in L397, confirming this suspicion. However, this acknowledgement comes out of the blue and is a stark contradiction to the claims of “indistinguishability” the authors communicated throughout the manuscript.
L340: Similar to comment on L17, please be more clear when using terminology such as “standard material”, when referring to commonly available materials.
L343: What is “straightforward” vial preparation? Can “straightforward” just be deleted?
L340: I’d suggest using the discussion section to discuss experimental results and potential limitations or unknowns. Section 4 sounds like a sales pitch or a conclusion section.
L378: I am not fully convinced that the presented method “effectively prevents isotopic drift”, but that it “effectively reduces isotopic drift” to enable high quality measurements when following the described protocol. The authors state this themselves in section 4.3, when they recognise that the offset patterns were sometimes bidirectional, hence most of the time they were not. Therefore, please tone the statements of indistinguishability etc down, they add no value, instead, the opposite might rather be the case.
L405: A systematic rationale also expressed in this sentence could be included in the introduction: The smaller the measured sample volume and the lower the CO2 mole fraction, the stronger the impact of systematic analytical errors on the results. As this method seeks to optimise a method for analysis of very small analyte amounts, it is of paramount importance to identify and eliminate all factors that contribute bias and uncertainty, including those that seem irrelevant in the analysis of large sample volumes.
L419: It was suggested that the authors might want to consider showing their d18O-CO2 data. However, they decided not to do, providing the lack of a reliable d18O-CO2 calibration as the reason to not show those data. This is unexpected, as INSTAAR, the laboratory that probably provided the isotope calibration scale, definitely measures d18O-CO2 very well. However, even without a solid calibration, d18O-CO2 data could help to diagnose a wide range of potential problems, which the authors state themselves in L420. In the presence of an unresolved offset, is a mystery that this data opportunity is not even considered for exploitation.
L431: Again, this paragraph, starting with a suggestion for future work, sounds much more suited for the conclusions than the discussions.
L440: The conclusions appear as a shorter version of the discussion section. These sections have different purposes and should therefore have a different rationale.
L444: Standard based… Please see comments on confusion with standards above.
L456: …without significant isotopic drift… |
Comments on the manuscript “High-precision δ13C-CO2 analysis from 1 mL of ambient atmospheric air via continuous flow IRMS: from sampling to storage to analysis.” from Joana Sauze et al., 2025.
Sauze et al., (2025) developed a method for the preparation and handling of small air sampling vessels with a volume of 1 mL STP for the analysis of stable carbon isotopes in CO2. The authors describe a series of experiments that lead to a proposed protocol for the preparation and handling of the air sample containers. They find the preparation steps are critical to achieve their best precision. The authors suggest this method opens a door to new analyses of stable carbon isotopes in CO2 from environments where the amount of available sample volume is critically limited, such as rhizosphere and chamber studies. Those applications hold the potential for important improvements of our understanding of carbon cycle processes, which is of great importance to atmospheric science and biogeochemistry, and the future of life on Earth as we know it in general.
I share the view that sampling volumes that small for high precision analyses is technically challenging. The topic of the manuscript is well suited for publication in AMT. However, I’d suggest significant rewriting, including further analysis existing data, and potentially making additional measurements. Some of the results do not seem to be of sufficient quality. The manuscript lacks fundamental basics on conventions and guidelines in isotope research. I will list a few examples and refrain from commenting on details.
Accuracy
The elephant in the room seems to be the lack of accuracy in the presented experiments. The authors focus entirely on “precision” as a quality objective (I can’t remember seeing a definition of precision, but assume standard deviation of a set of experiment results?) Almost all figures, including all of the results underpinning the finally suggested protocol (Figure 7) show significant offsets between target values and the achieved average values of a series of experiments. To me, the lack of accuracy suggests that something is not quite right with the method, and a critical experimental process not sufficiently controlled. The manuscript does not seek to explore the causes for the inaccuracy. I may be convinced otherwise, but the lack of technical details leaves a lot of room for speculation on the lack of accuracy. I suggest the accuracy problem to be fully explored, ideally with additional measurements, and the details to be provided.
Protocol and gases used to prepare sample vials
The authors report a difference when flushing the vials for 8 s with CO2-free air. Unfortunately, the flow rate of that flushing is not stated, which should be reported to understand the protocol. Afterwards, the protocol includes four cycles of evacuation to 0.1 bar, followed by filling with pure N2. The authors find a substantial reduction in variability of d13C-CO2 values when the initial flush with CO2-free air is included in the protocol (Figure 2). This suggests to me that the residual CO2 is not sufficiently removed by the four evacuation-flush cycles alone. CO2-free air and pure N2 should lead to the same result, if both have sufficiently low CO2 blank, and if both gases were used to quantitatively remove residual CO2. Have the authors tested the effect when pure N2 or CO2-free air are used interchangeably for flushing or evacuation/fill cycles under the same conditions? Demonstrating the effectiveness of the sample vessel preparation with increasing initial flush flow rate and/or flush time, as well as numbers of evacuation-fill cycles would be useful to determine a protocol that leads to accurate and precise values.
Standard practice in atmospheric science
It should be noted that preparing vessels for atmospheric sampling and isotope analysis by evacuating, flushing and even thermal treatment is absolute standard practice. A comprehensive protocol for glass flasks was previously published (Steur et al., 2023, DOI:10.1080/10256016.2023.2234594). Successful protocols work based on a large number of gas exchanges within the vessel to eliminate remains from previous samples (memory) as well as dealing with the small but significant amount of surface water on the internal surfaces.
Use of d18O-CO2 as indicator of analytical performance
The protocol described by Steur et al., 2023 is especially relevant for the analysis of oxygen isotopes in CO2, which are not considered in this manuscript. d18O-CO2 can be a very useful indicator for analytical problems. Therefore, I wonder if the d18O-CO2 data could help to identify the cause of the inaccuracy? For the purpose of method refinement, the precision (standard deviation) of d18O-CO2 from different experiments might be useful.
Control of residual CO2 and possible impact on d13C
The authors evacuate their sample vessel to 0.1 bar. In other words, around 10 % of the previous gas would still be present in every following preparation cycle. This seems to include the internal volume of the manifold (Figure 1), which is substantial in comparison to the volume of individual sample vials. This could potentially result in different gas compositions across the vials, especially as the manifold is filled with N2 from one side, pushing the gas from the previous filling towards the other side (pump side) of the manifold, where the vials at the pump end may potentially receive larger fractions of the previous gas filling. This contains some degree of speculation on my side, but I am not convinced that four cycles of evacuating to 0.1 bar and filling with N2 are a guarantee for quantitative replacement of the previous gas. It only takes 1 % of atmospheric CO2 with a d13C of around –8 ‰ and 99 % of a working standard CO2 with a d13C of around –36 ‰ to cause an offset of 0.3 ‰. Even a 1 % blank of a small sample peak might appear relatively small in the blank test the authors performed (line 93). Because of the small sample volume the presented method is targeting, system banks are very important, which the authors are well aware of. Vessels with septa can easily be evacuated to fractions of a mbar. Why did the authors choose not to evacuate to much lower pressure levels?
Blank test data
The authors performed blank tests (i.e., line 93), but do not present blank data, instead stating they didn’t find a detectable blank signal. I haven’t yet seen a system that has virtually no blank. There has always been some blank, and that blank is ideally smaller than the defined limits, above which Isodat/Qtegra automatically identifies a peak. Just because the software does not report the peak, this doesn’t necessarily mean there is no blank. Especially when measuring isotopes in small sample quantities, a small blank can have significant impact. I am not totally convinced that the variability in the “no flush” scenario shown in Figure 2 is not resulting from incomplete removal of the previous gas in the vial (memory). The d13C range in the “no flush” experiment is almost 2 ‰. A quick back of the envelope calculation suggests that around 6 % of ambient air CO2 with d13C of –8 ‰ would be needed to shift a CO2 with d13C of –38.7 ‰ by 2 ‰. This amount might well be detectable in the peak sizes of the measurements. It doesn’t seem that insufficient memory and blank control are fully explored in the manuscript and the underlying experiments. These information or experiments should be delivered in a manuscript that seeks to establish a new sample vial preparation protocol as the primary objective.
Undisclosed modifications to the IRMS instrument
The authors state their IRMS method includes modifications from Fiebig et al, (2005) to work for ambient CO2 mixing ratios (line 57), and “adapted here for high precision at trace levels of CO2” (line 112). However, there is no reference, no description or proof of what is done differently from Fiebig et al., (2005) and how that improved the analysis. I regard this as essential information. What are “trace-level CO2” in an atmospheric research journal? A fraction of lower tropospheric mole fraction averages?
Insufficient specification of used components
A large part of the success of the suggested sample vessel preparation protocol seems associated with the use of Terostat. Terostat seems to be a brand name for a range of sealing products and not a unique product. At no place do the authors explain what specific product they use and what it is made of, so it is impossible for a reader to gauge, or even to follow and adopt. Also, the description of how this is applied to top, and bottom could be more detailed, as the authors suggest this is a significant part to achieving high data quality.
Accurate description of system performance in context of literature
The authors state that the method is of “high” precision in title and throughout the text (i.e., line 113). However, 0.1 ‰ precision in an air sample of 1 mL STP is not particularly high or novel, i.e., Brand et al, (2016), DOI:10.1002/rcm.7587, achieve 0.04 ‰ for d13C in CO2 on 1 mL or air with GC-IRMS. Schmidt et al, (2011), https://doi.org/10.5194/amt-4-1445-2011, achieve 0.05 ‰ on around 3 mL pre-industrial air sublimated from an ice core sample. These methods have demonstrated accuracy to within their much smaller measurement uncertainty or better. The additional challenge and potentially the source of the additional uncertainty/inaccuracy of the method described by Sauze et al., (2025) may thus be associated with the sample vials, not really with the sample amount.
Data quality objectives and indicators for instrument performance
The authors seem to develop their quality control criterion around the precision value of 0.1 ‰. At no point do they provide an explanation why this value is needed for any analytical purpose. The smaller the precision, the better the protocol seems to be the paradigm. Given the precision criterion is the only data quality objective the authors seem to apply, the rationale for the choice of this value should be presented. A good example to question that approach is shown in Figure 6: The values from storage temperatures of –20C show a precision of 0.24 ‰ and are distributed around the target value with good accuracy. In contrast a precision of 0.1 ‰ is found at –80C, but then the values appear inaccurate. Yet, the storage at –80C is preferred because of the better precision, accepting inaccurate results. I’d suggest being very cautious of that rationale. Accuracy and reproducibility are at least as important as precision.
Composition of applied gases and gas equipment
Unfortunately, the authors do not disclose the compositions of the applied gases. Besides “ambient” there is no statement on the CO2 mole fractions in any of the applied gases. In a technical manuscript on CO2 isotope analysis, with particular focus on small sample sizes, knowing CO2 mole fractions of the applied gases is essential to understand experimental processes and results. Including data on the composition of used gases is obligatory for such a manuscript. For all experiments, the compositions of the gases and especially the CO2 mole fractions must be stated. The manuscript should be clear on mole fraction scales, impurities, calibration uncertainties, etc., as well as manufacturers and models of pressure regulators on those cylinders.
Isotope conventions
The manuscript ignores basic isotope conventions. All isotope reference gases used need to be stated with uncertainty and traceability chain (Camin et al., 2025, https://doi.org/10.1002/rcm.10018). This is important best practice, even though this manuscript does not show atmospheric data that a reader can compare to other measurements. The authors may have used a cylinder containing liquid CO2 as one of the reference gases (Figure 3), and possibly for some of the gas mixing etc. It should be stated whether this contains liquid CO2 as well as gaseous CO2, as this may affect the isotopic composition over time or with different use, i.e., when used for mixing. When referring to an isotope ratio, the isotope (d13C) is combined with the molecule (CO2) as d13C-CO2 or d13C(CO2) when referred to isotope values, where the “delta” is italicised. Negative isotope values are expressed with a long dash, rather than simple dash (Coplen 2011, https://doi.org/10.1002/rcm.5129)
References
The manuscript includes a lot of old references. There is nothing wrong with old references and credit should be given to original ideas, but in many cases, things have moved on and improved over several decades.