Reply on RC2

Thank you for providing your review, which we considered very helpful to strengthen the presentation of our study. Most of your comments request an extension of the manuscript with additional information, which we are happy to implement. In particular, we aim to add a dedicated discussion section addressing the biogeochemical interpretation of our NCP estimates. Several points you raised were also in agreement with RC1, and we cross reference our answers where this applies.

Enabling reliable NCP estimates for the mid-summer cyanobacteria bloom is the core aim of this study. A reliable hindcast of the mid-summer NCP will only be possible when the findings of this study are applied to almost two decades of available SOOP pCO 2 data. In the absence of this information, the assessment of the importance of the spring relative to the mid-summer bloom is highly uncertain. Nevertheless, we agree that some more information on the springbloom and a rough approximation of its contribution to the annual NCP should be given. Accordingly, we will extend section 1.3 of the introduction with following information: "The first production event is the spring bloom, which is controlled by the availability of nitrate and shifted from being dominated by diatoms to dinoflagellates in the late 1980s (Wasmund et al., 2017;Spilling et al., 2018).

After a so-called bluewater period with close-to-zero NCP rates, the second type of production events are mid-summer blooms of nitrogen-fixing cyanobacteria that develop in most years depending on meteorological conditions. Although cyanobacteria NCP is yet poorly constrained, its relative contribution to the annual NCP in the Eastern Gotland Sea in 2009 was estimated in the order of 40% (Schneider and Müller, 2018; Schneider et al., 2014), though the uncertainty is high. This preliminary estimate further needs to be interpreted with care, as cyanobacterial NCP varies significantly between years and regions."
We intend to address aspect (2) as follows: We will add a dedicated section in L399 of the discussion to describe the biogeochemical relevance and interpretation of our NCP estimates. Among others, this section will address the general relation between NCP, organic matter export and deoxygenation as follows: "Our best-guess of cumulative NCP on July 24 (~1.2 mol m −2 ) represents the net amount of organic matter that was produced throughout the bloom event in the surface waters above the compensation depth at 12 m. After subtracting ~20 % dissolved organic carbon (DOC) production, our NCP estimate equals the produced particulate organic carbon (POC) that is potentially available for export. [ (Fig. 4), likely fueled by the remineralisation of organic matter. However, our measurements do neither allow to constrain the budget of this C T * accumulation, nor could we attribute the source of organic matter." Section 1.3 -It is not very clear how this study will "disentangle" the multiple stressors on cyanobacteria blooms in the Baltic Sea. The way this section is written promises the reader that the results and discussion will address this challenge, even though it does not clearly do so. I would instead focus the background here more on the relationship among cyanobacteria blooms, hypoxia and NCP, which better relates to the study's aims.

...] However, the potential POC export constraint by our NCP estimate is not equivalent to the supply of organic matter to the deep waters of the Gotland Basin, because POC might be (partly) remineralised before sinking beneath the permanent halocline. Remineralisation of POC that occurs during the bloom event above the compensation depth is -according to our definition of NCPalready included in our estimate. In contrast, any additional remineralisation of POC that occurs between the compensation depth and the halocline, or above the compensation depth after the end of the bloom event, reduces the organic matter supply to the deep waters and thereby mitigates deoxygenation. Indeed, our profiling measurements indicate a steady accumulation of C T * beneath the compensation depth
This issue was also raised by RC1 and we agree that this study itself does not explicitly address the controlling factors of the blooms. However, we expect a major contribution to this question when applying the new NCP reconstruction approach to almost two decades of SOOP observations. In order to clarify this, we will add the following sentence in L62:

"A long-term hindcast of cyanobacteria NCP and the attribution of its strength to prevailing environmental conditions in particular years could improve our understanding of controlling factors and facilitate more reliable predictions of the blooms. However, such a hindcast of cyanobacteria NCP was so far impossible due to missing vertically-resolved observations that would allow to constrain their organic matter production."
We will further clarify how hindcasts based on our findings will support the disentangling of drivers by adding the following information to our conclusions (L437):

"The application of this approach will allow for the detection and attribution of trends in cyanobacteria NCP across decades. In particular the comparison of NCP estimates of bloom events that occurred under different environmental conditions will provide a better understanding of the controlling factors. Factors to be tested include the environmental parameters used to constrain NCP (pCO 2 , SST, and TPD), but also additional observations of nutrients and phytoplankton composition routinely determined on SOOP Finnmaid and in the framework of the Baltic Sea monitoring program. The recently started initiative to deploy biogeochemical ARGO floats in the Baltic Sea will further aid to link surface NCP estimates and deep water deoxygenation, and thereby constrain biogeochemical budgets in the Baltic Sea."
Line 71-73 -It is not clear, in the context of this study, why using oxygen measurements to estimate NCP is inferior to using pCO2 to estimate NCP, except that pCO2 data are perhaps more readily available from ships of opportunity. There are multiple examples of NCP being estimated using dissolved oxygen time series (e.g., O2/Ar time series) on the time scales of phytoplankton blooms. If O2 equilibrates more quickly than pCO2, would it not be preferable for studying Baltic Sea dynamics over a few weeks? In any case, in this study, the authors report cumulative NCP estimates over time, which suggests that, despite the different equilibration time scales for O2 and CO2 in the mixed layer, cumulative NCP estimates based on each parameter should approximate each other over the time scale of a bloom.
We think that an essential piece of information was missing to make our statement misunderstandable. NCP estimates based on either CO2 or O2 time series require a correction of the observed concentration changes for the air-sea flux of either gas. The calculation of this air-sea flux is associated with uncertainties. As a consequence, the higher flux rates of O2 lead to a higher uncertainty in the derived NCP estimate. We will try to clarify this by modifying Line 71-73 as follows: "In principle, NCP could as well be estimated from O 2 time series. However, the equilibrium reactions of carbon dioxide (CO 2 ) in seawater result in slower re-equilibration of CO 2 with the atmosphere compared to O 2 (Wanninkhof, 2014)

. This results in substantially longer preservation of the C T signal and a lower uncertainty contribution of required air-sea CO 2 flux corrections, and makes C T the preferred tracer for NCP."
Section 2.2.3 -It would be useful here to state the different phytoplankton that were collected by name, rather than in Appendix B2 only.

This information will be included in Line 164 as follows: "Phytoplankton samples were fixed with Lugol solution, and cyanobacteria community composition and biomass were determined by microscopic counts of the genera Aphanizomenon, Dolichospermum and Nodularia according to the Utermöhl method (HELCOM, 2017)"
Section 2.4 -It would be very helpful to provide the equations used to calculate the "bestguess" NCP values, including how the integration depth was determined in this approach.
In the results, the authors imply that they used a depth of 12 m (lines 308-310), but this should be explained in the Methods rather than in the results.
In agreement with a comment from R1, we will clarify that our determination of the integration depth aligns with the traditional concept of the compensation depth, i.e. the depth at which primary production and respiration balance out. Accordingly, we will clarify in Line 180 that our NCP best-guess is constrained to the compensation depth:

"... we derive the column inventory of incremental changes of ΔC T * (iΔC T *) between two cruise events through vertical integration of ΔC T * from the sea surface to the compensation depth (cd), i.e. the depth (z) at which no net drawdown of CO2 was observed"
This clarification in the methods section will further be supported by two equations summarizing our NCP calculation.
Lines 184-191 -It would be useful to clarify here that applying an average alkalinity to derive Ct* is valid because, as the results will show, the biogeochemical variability across stations of interest was low.

The information given in lines 190 -191 was rephrased and now reads: "The uncertainty in the determination of changes of C T * is below 2 μmol kg −1 when the mean A T is constrained within the observed standard deviation of ±27 μmol kg −1 (see Appendix C1 for a detailed assessment)."
Section 2

. that could be removed)
Line 249 -Is it fair to say that, because the BloomSail data exhibit little regional biogeochemical and physical variability across the stations of interest, the cruise tracks encompassing this same region should not be variable as well? If so, perhaps the authors could clarify that here, as well.
This is absolutely fair to say. It will be clarified in line 250 that the regional variability of averaged physical parameters from the GETM model was low.
Lines 259-265 -While the general idea is there, the authors need to better explain how they obtain TPD values. It was unclear in this paragraph and in Fig. C4A how they choose an actual depth. Is there a threshold for change in temperature between cruises that helps one select the right depth (e.g., the depth when change in temperature is 0.2 °C, according to Fig. C4A?), analogous to using a density difference criterion for deriving mixed layer depth?
No, for TPD no temperature threshold is required. We will try to clarify this and the approach in general by rephrasing the respective section to:

"TPD characterises the mean penetration depth of surface warming that occurred between two sampling events. TPD was defined as the SST increase divided by the integrated warming signal across the water column, i.e. the sum of all positive temperature changes within 1m depth intervals (for a graphical illustration see Fig. C4A). According to this definition and in contrast to MLD, TPD takes gradual changes of temperature across depth into account and does not require a fixed threshold value. TPD is only applicable when SST increases and has units of metres. To illustrate the TPD concept, it should be noted that a homogeneous warming signal that ceases abruptly at 10 m water depth would result in the same TPD as a warming signal that decreases linearly from the surface to 20 m water depth (TPD is 10 m in both cases)"
Lines 308-310 -How does this choice of integration depth compare to a best-guess NCP estimate calculated using a mixed layer depth based on a density difference criterion? Should the "best-guess" NCP estimate also be calculated using this density difference approach so that it is more comparable to the reconstructed NCP calculations later on in the paper?
We did actually calculate NCP based on a density difference criterion (MLD) and results are displayed in Fig. 6c (left panel). One could argue that this estimate is based on surface CO2 observations only, and is therefore a reconstruction and not a best-guess. However, the vertical variability of the C T * profiles above the MLD is very small (compare Fig 4, a2) and accordingly there is no significant difference between integrating the surface values or vertically resolved C T * values across the MLD. We therefore argue that the requested comparison is already covered in Fig. 6c.
Figure 5 -I am used to reporting NCP as positive if Ct* decreases, and negative if Ct* increases. Thus, I do not understand how NCP was negative, until August 6, and positive from August 6-13, unless the authors are using an opposite sign? (I would expect the opposite because inorganic carbon drawdown would indicate more production over respiration.) I suggest the readers reevaluate the sign of the NCP values they report here. This is another reason that it would be important for them to share their NCP calculation equations in Section 2, as well.
This comment is in agreement with a remark by R1. To clarify the issue we will indicate in Figure 5 and the corresponding caption, that the sign of NCP is indeed the opposite of the changes in C T *. We will also include the equation for NCP calculation in Sect. 2 and explicitly mention the interpretation of the sign of the three components (observed C T * changes, air-sea fluxes, and mixing).
Lines 321-322 -It would be interesting for the authors to discuss this peak cumulative NCP value more in the discussion. How does resolving the change in cumulative NCP over the course of a bloom improve understanding around hypoxia in the Baltic Sea?
The contribution of cumulative NCP estimates to the better understanding of hypoxia in the Baltic Sea will be addressed in a dedicated section in the discussion. Please refer to our answer to your comment on Line 46 above and also the reply to RC1.
Lines 323-324 -If the authors focused on reconstructing NCP over July 6 to July 24, why does Figure 6 show reconstructed values beyond July 24? The figure is not consistent with the text in this respect.