Three-way Calibration Checks Using Ground-Based, Ship-Based

8 This study uses weather radar observations collected from Research Vessel Investigator to evaluate the Australian 9 weather radar network calibration monitoring technique that uses spaceborne radar observations from the NASA 10 Global Precipitation Mission (GPM). Quantitative operational applications such as rainfall and hail nowcasting 11 require a calibration accuracy of 1 dB for radars of the Australian network covering capital cities. Seven ground12 based radars along the coast and the ship-based OceanPOL radar are first calibrated independently using GPM radar 13 overpasses over a 3-month period. The calibration difference between the OceanPOL radar and each of the 7 14 operational radars is then estimated using collocated, gridded, radar observations to evaluate the accuracy of the 15 GPM technique. For all seven radars the calibration difference with the ship radar lies within ± 0.5 dB, therefore 16 fulfilling the 1 dB requirement. This result validates the concept of using the GPM spaceborne radar observations to 17 calibrate national weather radar networks (provided that the spaceborne radar maintains a high calibration accuracy). 18 The analysis of the day-to-day and hourly variability of calibration differences between the OceanPOL and Darwin 19 (Berrimah) radars also demonstrates that quantitative comparisons of gridded radar observations can accurately track 20 daily and hourly calibration differences between pairs of operational radars with overlapping coverage (daily and 21 hourly standard deviations of ~ 0.3 dB and ~ 1 dB, respectively). 22

variability in these comparisons owing to all the sources of errors involved in such comparisons (differences in exact 126 time of observations of a grid, imperfect attenuation corrections, gridding artefacts, differences in implicit resolution 127 of radar volumes at different ranges, differences in minimum detectable signal …). The gridding technique used for 128 all radars is the same and follows Dahl et al. (2019). This gridding technique uses a constant radius of influence 129 (3.5km) and a weighted summation with distance to the centre of the grid for points belonging to the same elevation 130 angle but a linear interpolation between elevation angles in the vertical. This technique has the great advantage of 131 not producing the typical artificial vertical spreading of observations below / above the lowest / highest elevation 132 angles observed when using a radius of influence in all directions. Depending on how old the ground radars are, 133 different minimum reflectivity thresholds are used in the comparisons to mitigate potential artefacts in calibration 134 difference estimates due to the degraded sensitivity and reflectivity resolution of the older radars for low to 135 intermediate reflectivities. In general, a relatively high threshold of 20-25 dBZ was required, which also had the 136 advantage of reducing the potential impact of different non-uniform grid filling at the edges of the convective 137 systems due to different radar detection capabilities. . It must be noted that additional comparisons done without attenuation corrections of the ground radars did 142 not yield large differences (less than 0.5 dB in all sensitivity tests conducted). This is presumably due to the fact that 143 there are many more points below 30-35 dBZ than above in those comparisons, resulting in a relatively minor 144 impact of attenuation on these statistical comparisons. Also, the ship and ground radars were generally not far away 145 https://doi.org/10.5194/amt-2021-257 Preprint. Discussion started: 14 September 2021 c Author(s) 2021. CC BY 4.0 License. from each other (typically 20-40 km), so the viewing geometry of the storms was quite similar from both radars in 146 most cases, resulting in similar levels of attenuation along the two different paths through the storms.

147
The scanning sequence employed for OceanPOL uses the exact same 14 elevation angles used throughout 148 the operational radar network. The start of each OceanPOL scanning sequence is synchronized with that of the 149 operational radars running a 6-minute sequence (starts on the hour then every 6 minutes), which implies that 150 temporal differences in volumes sampled by OceanPOL and the radars running the 6-minutes sequence are minimal.

151
The impact of temporal evolution on the comparisons between OceanPOL and the radars running a 10-minute 152 sequence will naturally be larger. To minimize this impact in our comparisons, we have discarded files for which the 153 start time differs from the OceanPOL start time by more than 2 min.

154
Finally, to mitigate the potential impact of wet radome attenuation at C-band on the comparisons, we have 155 screened out observations where precipitation was present within 5km of either of the radars from the comparisons.

156
More precisely, for each volumetric scan we estimate the precipitation fraction within 5 km, and if more than 20% 157 of this area is covered with precipitation, we conservatively discard this scan. However, it must be noted that results 158 obtained when changing that threshold were very similar, with maximum statistical differences in estimated 159 calibration difference less than 0.3 dB (not shown). From a visual inspection of radar scans, we inferred that this was 160 due to rainfall generally not observed over and around the radars when such comparisons were made.

162
In this section, we present the main results of this three-way calibration comparison exercise. As illustrated 163 in Fig. 1, the first part of the calibration consistency check is to calibrate OceanPOL and the ground radars using the 164 same single independent source, the GPM spaceborne radar. All calibration results are summarized in Fig. 2. We are 165 fortunate enough that over two months including the YMCA and ORCA observational periods, the rainfall activity 166 allowed us to collect a reasonable number of GPM overpasses over each radar (except for Learmonth, radar 29, Fig.   167 2). As a result, for radar 29, we will use an older calibration estimate (-2.6 dB), derived from a GPM overpass with 168 many matched volumes in July 2019. Additional checks of the outputs of the RCA technique for radar 63 (discussed 169 later and shown as black dots in Fig. 4) indicated that the calibration of these two radars had not changed over that 170 period, which means that we can simply average all the estimates of calibration error from individual overpasses to 171 come up with a more accurate estimate for these radars. Looking at the time series of GPM calibration estimates for 172 other radars than 63 and considering the expected typical error of 2 dB for individual GPM overpasses as a 173 guideline, it seems reasonable to assume that the calibration of the OceanPOL, Warruwi (77), Dampier (15), 174 Broome (17), and Serpentine (70) radars has not changed over the observational period either, with fluctuations 175 around the mean calibration error estimate less than ~1.5 dB. The Port Hedland (16) radar is more problematic, as 176 the time series shows calibration error estimates ranging from -8 dB to -2.5 dB over that period. However, the three 177 overpass points closest to the date when collocated observations with OceanPOL were collected (26 December 178 2019) seem to agree reasonably well (around the mean value of -5 dB), so we will use this value of -5 dB in the 179 following but will keep in mind the lower confidence in this calibration figure.  (77) radars also offers an opportunity to estimate daily calibration differences and take a 213 closer look at the day-to-day variability of calibration differences. We will get back to that point shortly.

214
When including all days of observations for radars 63 and 77 (25 days for radar 63 and 4 days for radar 77 215 with precipitation), the mean calibration difference between OceanPOL and radars 63 and 77 are 0.4 dB and -0.3 216 dB, respectively ( Fig. 4 for radar 63, Fig. 5a for radar 77, see also Table 2 for a summary of all calibration 217 differences found in this study). The next best operational radar is radar 70 (Perth). For this radar, only short 218 duration drizzle and scattered showers were observed when RV Investigator approached its destination (Fremantle 219 port), resulting in less points for the calibration difference estimate. Despite the short duration dataset for radar 70, 220 the 2D joint histogram of reflectivities show a consistent difference across the whole reflectivity range, with a mean 221 https://doi.org/10.5194/amt-2021-257 Preprint. Discussion started: 14 September 2021 c Author(s) 2021. CC BY 4.0 License. calibration difference of -0.4 dB (Fig. 5f). These three estimates are well below the required accuracy of 1 dB for 222 operational applications, which indicates that for these four good-quality radars (OceanPOL and radars 63, 77, and 223 70), the GPM comparisons provided a consistent calibration to within ± 0.5 dB. However, those are the comparisons 224 where errors were expected to be smallest, given the large number of days included in the comparisons for radars 63, 225 and the excellent synchronization of the 6-min scanning sequences with OceanPOL for these three radars.

226
Let us now turn our attention to the quantitative comparisons between OceanPOL and the older operational 227 radars (15,16,17,29) running with a 10-minute scanning sequence and / or a degraded range resolution (as reported 228 in Table 1), and only a few opportunistic hours of collocated samples with precipitation (see list of time spans in 229 Table 2). Visual inspection of gridded radar data revealed the presence of strong anomalous propagation (AP) signal 230 in the lower levels (up to about 2km height ASL) for radars 15, 16, and 29, which has not been filtered correctly by 231 the operational radar post-processing suite. This problem is well known to the BoM forecasters. As a result, for these 232 radars, two sets of results are presented in Table 2. Calibration differences obtained from all data are labelled "AP" 233 and those obtained when screening out all common grids below 2km height are labelled "noAP". Figure 5 shows the 234 2D joint histograms of reflectivity when the anomalous propagation is screened out. The largest impact of 235 anomalous propagation is found for radar 16, with a difference of 0.9 dB between estimates with and without AP 236 screening. For the two other radars 15 and 29, the impact is modest (0.3 to 0.5 dB). This is due to the higher 237 proportion of samples located below 2 km height for the radar 16 case (not shown) than for the two other cases.

238
Overall, this result is shown to illustrate that particular attention needs to be paid in regions prone to anomalous 239 propagation effects. From Table 2

246
As introduced earlier, the day-to-day variability of calibration differences between ship and ground-based 247 radars can be analysed using the month of collocated samples between OceanPOL and the Berrimah radar collected 248 during YMCA (coloured points in Fig. 4). From Fig. 4, some simple statistics can be derived and discussed. The 249 minimum and maximum calibration differences over the month-long time series are -0.2 and +1.1 dB, which 250 corresponds to minimum and maximum differences of -0.6 and +0.7 dB around the mean value of 0.4 dB. The 251 colour of the points is the number of samples that were available to estimate the daily calibration difference. The 252 coloured error bars are estimates of the hourly standard deviation of calibration difference for each day, which will 253 be discussed in more detail later. From a close inspection of the location of points with respect to the mean value for 254 the period, there does not seem to be any obvious relationship between the number of points and how close the 255 estimates are to the mean value of 0.4 dB. This result shows that the number of samples is not the main source of 256 differences between daily estimates.

257
The standard deviation of daily calibration difference between Berrimah and OceanPOL over this month of 258 data is 0.33 dB (Fig. 4). Since this standard deviation value includes any potential natural variability of the daily 259 calibration difference and the variability due to uncertainties in these daily shipground radar comparisons such as

276
The last thing we explore with this Darwin dataset is the potential for tracking calibration differences at the 277 hourly time scale rather than the daily time scale. To do so, for each day of observations, we have estimated the 278 calibration difference from 1-hour chunks of collocated data, then estimated the standard deviation of the hourly 279 estimates for each day. An example of such daily analysis is shown in Fig. 6 for a day (08/12/2019) where 15 280 successive hours of collocated samples were available. Although this example includes more hours of comparisons 281 than most other days, it is very typical in terms of the hour-to-hour variability we observe each day, making it a 282 good candidate for illustrative purposes. We have not elected to screen out hours with fewer points, which, as can be 283 seen from hours 14 and 15, would have resulted in a lower hourly standard deviation for that case. This should 284 probably be done in an operational implementation. In this respect, the standard deviation of hourly calibration 285 difference presented in Fig. 4 can be considered as an upper bound for the hourly standard deviation. The hourly 286 standard deviation is shown in Fig.6 as a red error bar on top of the daily average point, and as a coloured error bar 287 over each daily average in Fig. 4. Over the 1-month study period, the average hourly standard deviation derived 288 from all daily estimates is 0.8 dB, which is within the 1 dB requirement, but the two extreme values are 0.5 and 1.5 289 dB (Fig. 4), indicating that occasionally the hourly estimates of calibration difference would not fully meet this 290 requirement. From Fig. 4, it also appears that there is no inverse relationship between the number of samples and the 291 hourly standard deviation, which could have perhaps been expected. For instance, the two points with highest hourly 292 standard deviation (02 and 06 December 2019) are at both ends of the number of samples spectrum, and the three 293 points with the lowest hourly standard deviations are in the lower half of the number of samples spectrum. Fig.4 also   294 shows that when using the hourly standard deviation as an error bar, the mean value over that period (0.4 dB) is 295 always included within one standard deviation of the daily estimate. These results would obviously need to be 296 confirmed with more observations in the future but do highlight the potential for hourly tracking of calibration 297 differences, enabling very early detection of issues with operational radars.

445
Values of calibration differences are also reported in Table 2