Reply on RC1

Sincere thanks for the evaluation of this work and your valuable comments and suggestions for improving this manuscript. We carefully considered the concerning points and made efforts to improve the rigor, logic, and clarity of our manuscript titled “A comprehensive geospatial database of nearly 100,000 reservoirs in China”. Here we submit the revised version, which has been modified according to the comments from the editor and reviewers. According to the editor and reviewers’ comments/suggestions, we clarified the manuscript and response letter below regarding the appropriate paragraphs and sections. The major changes that we made in the revised manuscript are summarized as follows:

Sincere thanks for the evaluation of this work and your valuable comments and suggestions for improving this manuscript. We carefully considered the concerning points and made efforts to improve the rigor, logic, and clarity of our manuscript titled "A comprehensive geospatial database of nearly 100,000 reservoirs in China". Here we submit the revised version, which has been modified according to the comments from the editor and reviewers. According to the editor and reviewers' comments/suggestions, we clarified the manuscript and response letter below regarding the appropriate paragraphs and sections. The major changes that we made in the revised manuscript are summarized as follows: (1) To further illustrate the accuracy of the CRD database, we added a validation experiment and followed the same sampling scheme (Create Random sampling Points method) to randomly selected ten sub-basins from the remaining sub-basins, including 1,752 reservoirs. The results were added to the 'Accuracy evaluation of the CRD database' section.
(2) We added one paragraph in the 'Comparisons with other reservoir databases' section to state the contributions of the CRD database. Also, Figure 10 is added to show comparisons between GRanD v1.3, GeoDAR v1.2, GOODD, and CRD in selected regions of China.
(3) We provided the residence time information of reservoirs in the revised manuscript and database and supplemented the 'Methodology' section.
(4) As suggested, we changed the unit of reservoir storage to 'km 3 ', and updated all full names of basins.
(5) We also updated the database simultaneously. Three attributes of river order, discharge, and residence time of reservoirs were added to the revised database. The revised China Reservoir Dataset (CRD v1.1) is publicly available at https://doi.org/10.5281/zenodo.6984619.
We attach the detailed item-by-item response to all comments and suggestions for the evaluation. The manuscript entitled "A comprehensive geospatial database of nearly 100,000 reservoirs in China" proposes an improvement to the dam and reservoirs dataset by compiling existing global data and local inventories. The authors access the quality of their data and compare their data with the existing datasets, and they announce that their dataset shows great improvement and, in many respects, is better than others. The topic of the manuscript is interesting and relevant to the earth science data community, as dam dataset are critical part of earth system science. Overall, this is a well-written manuscript.

Response:
We highly appreciate Referee #1's concise summary and positive review on the manuscript. Also, many thanks for all the constructive comments. We made changes on the manuscript with a sincere consideration of these points, the revisions and related explanations can be referred to the revised manuscript and the response letter item by item.
I have a few comments about the accuracy assessment: 1. I have concerns on the accuracy evaluation of their data (section 4.2). Line 338: does the random selection process is manual? In this section, I can not understand what accuracy means here. Do you mean area? Please provide more detail.
Response: Thanks for pointing out the unclear information. We corrected the statements by specifying that the random selection process is based on the Create Random Sampling Points tool from the ArcGIS Pro Data Management menu. Accuracy of the CRD database in Section 4.2 refers to the evaluation of the commission and omission errors of the database itself. Here, the commission error represents geocoding errors where the CRD information is inconsistent with the validation reference, and the omission error indicates the number of missing reservoirs in the samples.
To evaluate the commission and omission accuracy of the CRD database, we randomly selected sub-basins in each first-level river basin across China and manually checked 1,882 reservoirs. We followed the Create Random Sampling Points tool from the ArcGIS Pro Data Management menu to randomly select some subbasin areas from the first-level river basin in China. Most of them are third-level river basins. However, for the Yangtze River and the Yellow River basins with more reservoirs, three level 6 sub-basins from HydroBASINS were selected to distribute the sampled reservoirs evenly. A total of 1,882 reservoir samples were selected, distributed in 14 sub-basins. For each reservoir sample, we manually checked whether the spatial coordinates were consistent with those recorded in the Tiandi Map. In addition, we conducted a second round of quality control to check if any reservoirs were missing. Validation results show that the overall evaluation accuracy for the CRD database is 96.55%, ranging from 95.47% to 98.15% in different basins.
To further illustrate the accuracy of the CRD database, we followed the same sampling scheme and randomly selected ten validation sub-basins from the remaining sub-basins, including 1,752 reservoirs. The distribution of all sampled validation reservoirs is shown in Figure 4. Consistent with the first validation result, the evaluation accuracy of all river basins is higher than 90%. The accuracy ranges for the CRD database from 90.70% to 97.64% among different basins, with an overall accuracy of 93.61%. Integrating the two validation results, our overall evaluation accuracy is 95.13% in terms of commission and omission errors (Table 3).
Additionally, we clarified the method for selecting sub-basins, updated Table 3, and added Figure 4 in the revised manuscript to address the referee's concern. (Line 366-369) "To evaluate the commission and omission accuracy of the CRD database, we randomly selected sub-basin areas in each first-level river basin across China and manually checked 3,634 reservoirs ( Figure 4). The collection of the validation sub-basins followed the Create Random Sampling Points method." 2. For dataset comparison, you can not say improvements just based on more count, area, and storage. Does this data have better accuracy over other ones? Please try to show that. Overall the accuracy assessment is not clear, which directs an unclear description of their contribution.

Response:
We agree with the reviewer. The main purpose of CRD database is to catalogue more complete spatial distribution of reservoirs in China, especially to supplement median and small-sized reservoirs. Therefore, we cannot say that it has improvements over those global reservoir products. To clarify this point, we changed the relevant statement. Following this suggestion, we also added Figure 10 and one paragraph in the 'Comparisons with other reservoir databases' section to state the supplements of the CRD database. (Line 481-501) Figure 10a shows the distribution of large reservoirs (storage capacity larger than 3 million m 3 ) in the upper reaches of the Yangtze River in GRanD v1.3, GeoDAR v1.2, and CRD. Because the GOODD dataset is limited by the basic property (reservoir storage capacity, dam height), it was not included in comparing large reservoirs. GeoDAR v1.2 incorporates GRanD v1.3 so that the pattern of large reservoirs in the upper Yangtze River is consistent between the two databases. Compared with GRanD v1.3 and GeoDAR v1.2, CRD has added 16 large reservoirs in the upper reaches of the Yangtze River, with a total storage capacity of 52.60 km 3 , of which the total storage capacity of new reservoirs in the past five years accounted for 77.00% (40.50 km 3 ). The large reservoirs dominate the total storage capacity in the basin. Therefore, the increase of new large reservoirs dammed in recent years is one of the major differences of CRD in storage capacity.
Another supplement of CRD is to amplify the local details of smaller reservoirs based on enlarging the total area and storage capacity. Figure 10b shows the reservoir distribution in the 10-level sub-basin of Poyang Lake, and Figure 10c-d contains the enlarged details in Figure 10b. GRanD v1.3, GeoDAR v1.2, GOODD, and CRD can all digitize reservoirs on rivers with catchments of more than 10 km 2 (Figure 10b-c). However, many smaller reservoirs were not digitized by GRanD v1.3, GeoDAR v1.2, and GOODD. Overall, CRD is relatively better mapping reservoirs at smaller watershed levels (Figure 10d). CRD contributed to supplementing and updating new reservoirs and smaller reservoirs, nevertheless, it has a few limitations. The CRD contains a few basic reservoir attributes, such as location information (longitude, latitude, province, state, county), inundated area, and estimated water storage, and it still needs to be further supplemented and improved. Although we added reservoir residence time in the updated version, limited by the accuracy of discharge data, we only calculated the residence time of about 17,000 reservoirs.
" Figure 10a shows the distribution of large reservoirs (storage capacity larger than 3 million m3) in the upper reaches of the Yangtze River in GRanD v1.3, GeoDAR v1.2, and CRD. Because the GOODD dataset is limited by the basic property (reservoir storage capacity, dam height), it was not included in this comparison. GeoDAR v1.2 incorporates GRanD v1.3 so that the pattern of large reservoirs in the upper Yangtze River is generally comparable between the two databases. Compared with GRanD v1.3 and GeoDAR v1.2, CRD has added 16 large reservoirs in the upper reaches of the Yangtze River, with a total storage capacity of 52.60 km3, of which the total storage capacity of new reservoirs constructed in the past five years accounted for 77. 00% (40.50 km3). The large reservoirs dominate the total storage capacity in the basin. Therefore, the increase of new large reservoirs dammed in recent years is one of the major differences of CRD in storage capacity. As shown in Figure 10b-c, GRanD v1.3, GeoDAR v1.2, GOODD, and CRD can all digitize reservoirs on rivers with catchments of more than 10 km2. However, many smaller reservoirs were not compiled in GRanD v1.3,GeoDAR v1.2,and GOODD." 3. What is the normal elevation in table 1. Do you mean height? It is a little confusing.
Response: Thanks for pointing out the confusing statement. The 'normal elevation' in the original version of Table 1 should be termed as "water level of normal storage capacity" of the reservoir. "Normal storage capacity" means that the reservoir reaches the storage capacity that can actually be used to regulate runoff. 4. Do you have residence time? Without this information, it is really hard for use in hydrological modeling?

Response:
We appreciate the reviewer bringing this work to our attention. To follow the suggestion, we calculated the residence time of reservoirs by using river discharge data from HydroRIVERS. Moreover, we updated Table 2 and added the residence time information of reservoirs in the revised manuscript and database. (Line 311-331) HydroSHEDS provides hydrographic baseline information in a consistent and comprehensive format to support regional and global watershed analyses and hydrological modeling. It is currently considered the leading global product in terms of quality and resolution (Lehner and Grill, 2013). HydroBASINS and HydroRIVERS are extracted from HydroSHEDS at a 15 arc-second resolution. HydroRIVERS represents a vectorized line network of all global rivers that have a catchment area of at least 10 km² or an average river flow of at least 0.10 m³/s, or both. HydroRIVERS contains the attribute information of each river about an estimate of long-term average discharge. Therefore, we extracted the reservoir discharge at the location of each reservoir pour point based on HydroRIVERS product. Here, the average residence time for each reservoir was calculated as the ratio between reservoir storage capacity and discharge.
The HydroRIVER dataset covers all rivers in the Pfafstetter Level 12 sub-basins of HydroBASINS, so we focused on reservoirs (17,185) that locate on these rivers, covering 96% of CRD reservoirs larger than 1 km 2 . For the remaining reservoirs, on the one hand, they are not on the HydroRIVER rivers, and on the other hand, it is difficult to obtain the discharge of smaller reservoirs. Therefore, they are generally not included in hydrological simulations. Also, we calculated the R 2 of the estimated reservoir residence times and the corresponding results provided by HydroLAKES reservoirs is 0.82. While CRD database provided information about reservoir discharge and residence time, in fact, these data can be updated as needed for specific hydrological modeling.
"HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) provides hydrographic baseline information in a consistent and comprehensive format to support regional and global watershed analyses and hydrological modeling. It is currently considered the leading global product in terms of quality and resolution (Lehner and Grill, 2013). HydroBASINS and HydroRIVERS are extracted from HydroSHEDS at a 15 arc-second resolution. HydroRIVERS represents a vectorized line network of all global rivers with a catchment area of at least 10 km² or an average river flow of at least 0.10 m³/s, or both. HydroRIVERS covers all rivers in the Pfafstetter Level 12 sub-basins of HydroBASINS and contains the attribute information of each river about an estimate of long-term average discharge. Here, we focused on reservoirs (17,185) located on HydroRIVERS rivers and extracted reservoir discharges based on HydroRIVERS. Moreover, these reservoirs cover 96% of CRD reservoirs larger than 1 km2. The remaining smaller reservoirs, on the one hand, are not on the HydroRIVERS rivers, on the other hand, it is difficult to obtain the discharge of smaller reservoirs. Therefore, they are generally not included in hydrological simulations. Notably, while the CRD database provided information about reservoir discharge and residence time, these data can be updated for specific hydrological modeling. The equation of average residence time is as follows: RES_T= V / DIS_AV_CMS (3) where DIS_AV_CMS represents the reservoir discharge in the unit of m3/s, and RES_T represents the reservoir residence time in the unit of year. The R2 of the estimated reservoir residence times and the corresponding results of HydroLAKES reservoirs is 0.82."

Bibliography for response letter:
Lehner, B., Grill, G., 2013. Global river hydrography and network routing: baseline data and new approaches to study the world's large river systems. Hydrological Processes 27, 2171-2186.