Comparison of planetary boundary layer height from ceilometer with ARM radiosonde data

. Ceilometer measurements of aerosol backscatter proﬁles have been widely used to provide continuous planetary boundary layer height (PBLHT) estimations. To investigate the robustness of ceilometer-estimated PBLHT under different atmospheric conditions, we compared ceilometer-and radiosonde-estimated PBLHTs using multiple years of U.S. Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) ceilometer and balloon-borne sounding data at ARM ﬁxed-location atmospheric observatories and from ARM mobile facilities deployed around the world for various ﬁeld campaigns. These observatories cover from the tropics to the polar regions and over both ocean and land surfaces. Statistical comparisons of ceilometer-estimated PBLHTs from the Vaisala CL31 ceilometer data with radiosonde-estimated PBLHTs from the ARM PBLHT-SONDE Value-added Product (VAP) are performed under different atmospheric conditions including stable and unstable atmospheric boundary layer, low-level cloud-free conditions, and cloudy conditions at these ARM observatories. Under unstable conditions, good comparisons are found between ceilometer- and radiosonde-estimated PBLHTs at ARM low- and mid-latitude land observatories. However, it is still challenging to obtain reliable PBLHT estimations over ocean surfaces even using radiosonde data. Under stable conditions, ceilometer- and radiosonde-estimated PBLHTs have weak correlations. We compare different PBLHT estimations utilizing the Heffter, the Liu–Liang, and the bulk Richardson number methods applied to radiosonde data with ceilometer-estimated PBLHT. We ﬁnd that ceilometer-estimated PBLHT compares better with the Liu–Liang method under unstable conditions and compares better with the bulk Richardson number method under stable conditions.


Introduction
The planetary boundary layer is the lowest part of the troposphere that directly interacts with the earth's surface. The effects of surface friction, heating, and cooling cause significant exchanges of heat, mass, moisture, and momentum between the planetary boundary layer and the earth's surface through turbulent motions (Stull, 1988). Therefore, the planetary boundary layer structure responds quickly to surface forcing and may have large temporal and spatial variations, especially over land (Seidel et al., 2010;von Engeln and Teixeira, 2013). The structure and the depth of the planetary boundary layer play a critical role in near-surface air quality, land-atmosphere interactions, and a wide range of atmospheric processes such as cloud formation and evolution, aerosol mixing and transport, and aerosol-cloud interactions (Seinfeld et al., 2006;Konor et al., 2009;LeMone et al., 2019).
Following Stull (1988) and Liu and Liang (2010), the boundary layer structure can be classified into three major regimes depending on the atmospheric thermodynamic environment: convective boundary layer (CBL), stable boundary layer (SBL), and residual layer (RL). Under the CBL condition, which generally occurs during the daytime, the strong turbulence and convection causes intense mixing within the boundary layer. The top of the boundary layer (PBLHT) is often characterized by an inversion layer of temperature and a pronounced decrease in moisture and pollutant concentration. For deep CBLs such as in the tropics, however, it might be difficult to determine the top of the boundary layer using the temperature inversion (Kepert et al., 2016). The SBL is commonly formed during nighttime by surface radiative cooling or when warm air is advected over a cool surface. Under the SBL condition, virtual potential temper-Published by Copernicus Publications on behalf of the European Geosciences Union. 4736 D. Zhang et al.: Comparisons of PBLHT from ceilometer with radiosonde data ature increases with altitude in the boundary layer. Turbulence tends to be suppressed and occurs sporadically. The PBLHT is defined as the top of the stable layer or the height where turbulence is negligible compared to its surface value (Stull, 1988). The RL is usually formed during the evening or morning transition time. A RL that is associated with nearneutral conditions in the surface layer is neutrally stratified and keeps similar state variables and pollutant profiles as the recently decayed CBL and is referred to as the neutral residual layer (NRL) hereafter. It should be noted that atmospheric boundary layer stability ranges from very stable to strongly unstable. Classification of atmospheric boundary layer stability into these three major regimes is simplified and may not be appropriate for transient atmospheric conditions (Mahrt, 1999).
The PBLHT in numerical weather prediction and climate models is usually calculated using the Richardson number profile to find the first level where the Richardson number exceeds a critical value and in large-eddy simulation models using turbulence kinetic energy or eddy diffusivity thresholds (Seibert et al., 2000;Noh et al., 2003;Seidel et al., 2012). The PBLHT has been widely determined using radiosonde observations that provide profiles of atmospheric temperature, pressure, and moisture (Seibert et al., 2000;Liu and Liang, 2010;Seidel et al., 2010). Methods have been developed using the elevated temperature inversion, the maximum vertical gradient of potential temperature, the minimum vertical gradient of moisture, or the surface-based inversion to determine PBLHT under different regimes (Stull, 1988;Bradley et al., 1993;Kurowski et al., 2009;Seidel et al., 2010;Von Engeln and Teixeira, 2013;Bopape et al., 2020). For example, both Heffter (1980) and Liu and Liang (2010) use potential temperature gradient as a key parameter to determine the PBLHT for CBL and NRL regimes. However, radiosonde data have poor temporal resolutions and are subject to sampling errors. Most radiosonde stations launch a sounding system twice daily and thus cannot capture the diurnal evolution of the PBLHT (Seidel et al., 2010). Observing atmospheric boundary layer transitions with high temporalspatial resolution is required to investigate the evolution of PBLHT, which will help to improve its representation in models (Su et al., 2020;Fritz et al., 2021).
Remote sensing systems such as sodars, radio-acoustic sounding systems, wind profiling radars, and lidars provide continuous high-temporal observations that can be used to estimate PBLHT (Seibert et al., 2010). In recent years, aerosol lidar systems measuring vertical aerosol backscatter profiles with high temporal and vertical resolutions have also been widely used to derive PBLHT (Steyn et al., 1999;Brooks, 2003;Sawyer and Li, 2013;Dang et al., 2019;Su et al., 2020). Space-borne lidar such as the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) on board the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite can provide a global PBLHT climatology, although it is unable to capture the diurnal cycle (Luo et al., 2016). Atmospheric lidars use aerosol as tracers, and the gradient of the aerosol backscatter signal is generally used to derive the PBLHT. Numerous methods that use a prescribed lidar backscatter signal threshold (Frioud et al., 2003), first and second derivative of lidar signals (Sicard et al., 2006;Luo et al., 2014), lidar signal wavelet transformation (Brooks, 2003;Davis et al., 2000), and curve-fitting to idealized lidar profiles (Steyn et al., 1999) have been proposed to estimate the PBLHT.
A laser ceilometer is a type of atmospheric lidar that measures backscattered laser signals from atmospheric particles such as aerosols and cloud droplets. In particular, laser ceilometers are low-cost and reliable systems that provide fully automated all-weather measurements. Laser ceilometers have been deployed over many locations around the world, and their measurements have been widely used for cloud base detections and atmospheric aerosol and cloud structure analyses (Martucci et al., 2010;Kotthaus and Grimmond, 2018). To take the advantage of those continuous long-term ceilometer measurements, several PBLHT retrieval techniques using ceilometer aerosol backscatter data have been adopted to study the characteristics and evolutions of the boundary layer at various locations and to monitor the temporal and spatial variations in PBLHT (Münkel et al., 2007;Caicedo et al., 2017). Evaluations from previous studies show good agreement between PBLHT derived from ceilometer and radiosonde data for a limited number of cases (Haeffelin et al., 2012;Haman et al., 2012). However, those evaluations are based on limited data from a single location or a short-term campaign. The robustness of the estimated PBLHT from laser ceilometer measurements has not yet been validated under various atmospheric conditions and over multiple locations with different surface properties.
In this study, we use multiple years of US Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) ground-based remote sensing measurements and balloonborne sounding data to compare and evaluate PBLHT estimated from ceilometer backscatter with ARM sounding data at four ARM fixed-location atmospheric observatories and three ARM mobile facilities (AMFs) deployed around the world for various field campaigns. Multiple years of data at these climatologically significant locations allow us to statistically investigate how surface properties impact PBLHT estimation methods, how well PBLHT estimation methods perform under different atmospheric boundary layer regimes, and PBLHT diurnal and seasonal variations. The paper is organized as follows: Sect. 2 presents a brief description of ARM ground-based remote sensing measurements and methodologies used to derive PBLHT from sounding data and ceilometer measurements. Section 3 shows statistical comparisons of ceilometer-and radiosonde-estimated PBL-HTs under different atmospheric conditions including stable and unstable atmospheric boundary layers, low-level cloudfree conditions, and cloudy conditions at various ARM observatories. PBLHT diurnal evolution and its seasonal varia-tions are also presented. Summary and conclusions are given in Sect. 4.

Datasets and methodology
The DOE ARM user facility provides continuous field measurements of atmospheric conditions by deploying remote sensing and in situ atmospheric observatories at climatically significant locations. ARM operates three fixed-location atmospheric observatories at US Southern Great Plains (SGP), North Slope of Alaska (NSA), and Eastern North Atlantic (ENA) located in the Azores. These fixed-location observatories have been acquiring long-term measurements of cloud, aerosol, precipitation, and atmospheric dynamic and thermodynamic data for over 25 years at some locations. In this study, we also use data from the former ARM Tropical Western Pacific (TWP) observatory located at Darwin, Australia. In addition, ARM operates three mobile facilities (AMFs) which can be requested by scientists through a proposal process for various field campaigns that deploy ARM instruments anywhere in the world for roughly a year. We use observations from five AMF field campaigns including the Observations and Modeling of the Green Ocean Amazon -GOAMAZON (MAO); the Layered Atlantic Smoke Interactions with Clouds -LASIC (ASI); the Cloud, Aerosol, and Complex Terrain Interactions -CACTI (COR); the ARM West Antarctic Radiation Experiment -AWARE (AWR); and the AMF3 deployment at the Oliktok Point AK (OLI). The three-letter ARM identifier in parentheses is defined by a geographic reference or the International Air Transport Association (IATA) three-letter airport code to indicate approximate location. In total, measurements from nine ARM atmospheric observatories are used. The geographical locations of ARM fixed-location observatories and AMF field campaign deployments used in this study are shown in Fig. 1. Table 1 lists the site elevations above sea level (ASL), surface characteristics, time periods, and the number of radiosonde releases during the period for ARM observatories and AMF deployments.
ARM deploys various state-of-the-art instrument platforms at each observatory including radiometers, radars, lidars (including ceilometers), total sky imagers, surface meteorological instrumentation, aerosol observing systems, and radiosondes. Details on these instruments and their measurements are presented in Mather and Voyles (2013) and can also be found in each ARM instrument handbook (https: //www.arm.gov/capabilities/instruments, last access: 9 August 2022). Furthermore, ARM produces higher-order data products named value-added products (VAPs) using existing ARM data streams as inputs. VAPs use quality-controlled data to derive higher-order atmospheric quantities that can be more directly used for atmospheric research and by global climate models. A full list of ARM VAPs can be found at the ARM VAP website (https://www.arm.gov/capabilities/vaps, last access: 9 August 2022). All data obtained at ARM fixedlocation observatories and AMF field campaigns and derived VAPs are available at the ARM Data Discovery website (https://adc.arm.gov/discovery, last access: 9 August 2022). In this study, we mainly focus on analyses of ARM radiosonde data, ceilometer measurements, and corresponding VAPs.
The balloon-borne sounding system (SONDE) is launched four times a day (at 05:30,11:30,17:30,and 23:30 UTC) at most of the ARM observatory sites, except twice a day at NSA (05:30 and 17:30 UTC) and OLI (17:30 and 23:30 UTC). SONDE provides measurements of vertical profiles of atmospheric thermodynamic state such as atmospheric pressure, temperature, moisture, and the wind speed and direction with a 1 s temporal resolution. The accuracies of radiosonde-measured temperature and wind speed are 0.2 • C and 0.2 m s −1 , respectively. Table 2 gives a brief summary of methods used to estimate PBLHT in this study. The ARM PBLHT-SONDE VAP implements three commonly used methods including the Heffter method (Heffter, 1980), the Liu and Liang method (Liu and Liang, 2010), and the bulk Richardson number method (Seibert et al., 2000) to estimate PBLHT from radiosonde data (Sivaraman et al., 2013). To reduce the identification of spurious layers due to noisy data, the radiosonde data are subsampled at a 5 mb resolution, corresponding to vertical height resolutions of 30 to 60 m depending on the atmospheric environment.
The Heffter method (referred as PBLHT Heffter hereafter) determines the PBLHT from a potential temperature gradient profile. It first identifies each large potential temperature gradient layer, which is defined as two or more continuous heights where the potential temperature gradient (θ ) is greater than 0.005 K m −1 . PBLHT is then determined as the base height of the lowest layer in which the potential temperature difference between the base and top of the layer is greater than 2 K. If the algorithm does not identify a layer below 4 km above the ground level (a.g.l.) that meets the criteria, the Heffter method identifies the base height of the layer that has the largest potential temperature gradient within 4 km a.g.l. as the PBLHT and flags the derived PBLHT as indeterminate.
The Liu-Liang method (referred to as PBLHT Liu-Liang hereafter) uses different algorithms for different boundary layer regimes to determine PBLHT. The first step is to identify the boundary layer regime by examining the near-surface thermal gradient. Following Liu and Liang (2010), the potential temperature (θ ) difference between the fifth and second levels of sounding data (θ 5 −θ 2 ) is used to represent the nearsurface thermal gradient. The Liu-Liang method classifies the boundary layer regime as CBL, SBL, or NRL by comparing θ 5 −θ 2 with a stability threshold δ s . For CBL, θ 5 −θ 2 <− δ s ; for SBL, θ 5 −θ 2 >+δ s ; and for NRL, −δ s <θ 5 −θ 2 <+δ s . For the CBL and NRL regimes, the PBLHT is determined following Stull (1988) as the height at "which an air parcel rising adiabatically from the surface becomes neutrally buoy-  Table 2. Methods used to estimate PBLHT from radiosonde data and ceilometer measurements.

Method Algorithms Reference
Heffter method Layers where two or moreθ values are greater than 0.005 K m −1 Height at which θ = θ z − θ base is greater than 2 K Heffter (1980) Liu-Liang method CBL and NRL: Height at which θ k −θ 1 >δ u anḋ θ k >θ r SBL: top of the stable layer above the surface or the height of the LLJ nose Liu and Liang (2010) Bulk Richardson number method Height at which Ri b is greater than Ri bc Seibert et al.
Enhanced gradient of ceilometer backscatter Height at which ceilometer backscatter gradient is strongly negative Münkel and Roininen (2010) ant". Practically, the Liu-Liang method searches upwardly for the PBLHT as the level k at which θ k − θ 1 >δ u , where δ u is another stability threshold, and theθ k is larger than a gradient thresholdθ r . The threshold values of δ s , δ u , andθ r are dependent on the surface type and are empirically determined in Liu and Liang (2010). For the SBL regime, however, the determination of the PBLHT is much more challenging as the SBL turbulence can result from either buoyancy forcing generated by the stable layer above the surface or wind shear that is usually associated with low-level jet (LLJ). The Liu-Liang method determines the PBLHT for the SBL regime as the top of the stable layer above the surface or the height of the LLJ nose, whichever is lower. The bulk Richardson number Ri b represents the ratio of thermally produced turbulence to that generated by vertical wind shear. Since wind-shear-produced turbulence is greatly reduced above the top of atmospheric boundary layer, Ri b increases dramatically at the top of SBL. The PBLHT is de- termined as the first level at which Ri b is greater than a critical value Ri bc . According to Sørensen et al. (1998), Ri b between the surface and a given altitude z can be calculated from sounding data with the following equation: where g is the gravitational acceleration, z is the height in a.g.l., θ vz and θ v0 are the virtual potential temperatures at the surface and height z, and u z and v z are the wind speed components at height z. The magnitude of Ri bc employed in previous studies ranges from 0.25 to 0.5 (Mahrt, 1981;Holtslag et al., 1990). Seibert et al. (2000) suggest an optimal Ri bc value of 0.25 when applied to radiosonde data. The ARM PBLHT-SONDE VAP provides estimated PBLHTs using two Ri bc values of 0.25 and 0.5 (referred to as PBLHT Richardson_p25 and PBLHT Richardson_p5, respectively). Each PBLHT estimation is given a quality control (QC) flag to indicate possible issues that are related to bad input data or unreasonable estimations (e.g., estimated PBLHT >4 km a.g.l.). Only PBLHT estimations with clear QC flags are selected in this study. Overall, unreasonable PBLHT estimations removed by QC flags are less than 10 % of the total data. It should be noted that different algorithms used in the PBLHT-SONDE VAP could produce dramatically different PBLHT estimations, especially for the SBL regime. The challenge is that there is no ground truth measurement to determine which method performs better than others. The performance of each method depends on the surface type and boundary layer conditions. For example, previous studies suggested PBLHT estimated with the Liu-Liang method generally agrees better with lidar estimations than the Heffter and bulk Richardson number methods (Sawyer and Li, 2013;Su et al., 2020). However, Lewis (2016) argued that the Liu-Liang and bulk Richardson number methods did not produce realistic PBLHT estimations while the Heffter method pro-duces reasonable PBLHT values based on careful inspection of temperature and humidity profiles during the Marine ARM GPCI Investigation of Clouds (MAGIC) field campaign. Although the Heffter and Liu-Liang methods generally provide more reliable PBLHT estimations for the CBL and NRL regimes, the bulk Richardson number method provides better PBLHT estimations for the SBL regime (Seibert et al., 2000).
ARM uses the Vaisala CL31 ceilometer model, which has a maximum vertical range of 7.7 km (Münkel and Räsänen, 2004). The Vaisala CL31 ceilometer detects up to three cloud layers simultaneously and measures vertical visibility. Ceilometer cloud detections are used to distinguish cloudy and cloud-free conditions. In addition, the Vaisala CL31 ceilometer also provides total attenuated backscatter coefficient profiles at the wavelength of 910 nm with a vertical resolution of 10 m and temporal resolution of 2 s, which have been used widely to derive continuous estimations of PBLHT (Münkel et al., 2007). To estimate PBLHT, the Vaisala CL31 ceilometer employs the gradient method that searches for local gradient minima of the range-and overlap-corrected total backscatter coefficient profile. To get more reliable aerosol signals, ceilometer data are first averaged to a temporal resolution of 16 s. To search for local gradient minima of the total backscatter coefficient profile, ceilometer data are further applied with 30 min temporal and 360 m vertical sliding average. By taking advantage of the high-temporal-resolution measurements, the CL31 ceilometer software called "BL-VIEW" provides PBLHT estimations with a temporal resolution of 16 s and a vertical resolution of 10 m. Compared with vertical resolutions of 30 to 60 m for PBLHT-SONDE, CEIL has a higher vertical resolution. The presence of cloud layers and precipitation might impact PBLHT estimations. Therefore, the enhanced gradient method applies a cloud and precipitation filter during the averaging process and suppresses false layer identification, which allows for robust es-timations of PBLHT under all weather conditions (Münkel and Roininen, 2010). The Vaisala CL31 incorporates the enhanced gradient method into the "BL-VIEW" software that provides real-time monitoring of boundary layer structures and identifies up to three boundary layer height candidates. The BL-VIEW algorithm gives a quality index from 1 to 3 to each boundary layer height candidate. The quality index value is determined based on the gradient magnitude, detected cloud base, and the distance of the local gradient minimum to other gradient minima. A low gradient results in a high quality index, clouds detected in the vicinity of a boundary layer reduce its quality index, and a large distance to other gradient minima results in a high quality index. We select the boundary layer height candidate with the highest quality index as the ceilometer-estimated PBLHT (referred to as PBLHT CEIL hereafter). Figure 2 shows an example of Vaisala CL31 total attenuated backscatter coefficient profiles and estimated PBLHT from ceilometer data as well as from the ARM PBLHT-SONDE VAP on 9-10 February 2015, at the ARM SGP site. From Fig. 2a we can see the presence of several aerosol layers and their evolution with time. Starting at 18:00 LT local time (LT) on 9 February, a residual aerosol layer with a top of ∼ 0.7 km a.g.l. was present, and its top descended gradually with time, which caused a large variation in the ceilometer-estimated PBLHT. This situation represents a challenging scenario to use the aerosol gradient to estimate PBLHT. At ∼ 21:00 LT on 9 February, another dense aerosol layer was formed near the surface and started to grow steadily to a height of approximately 0.3 km a.g.l. until ∼ 23:00 LT. After 23:00 LT the residual aerosol layer and the dense surface aerosol layer were separated. The residual aerosol layer was forced to ascend slightly and then disappeared at ∼ 02:00 LT on 10 February, probably due to advection out of the ceilometer field of view or being entrained into the lower mixed-layer zone. The dense surface aerosol layer stayed quite stable within the boundary layer until ∼ 09:00 LT and then started to grow quickly from ∼ 0.3 km a.g.l. at 09:00 LT to ∼ 1.2 km a.g.l. at 18:00 LT. As the aerosol layer expanded to a higher altitude, its density decreased as revealed by the decrease in ceilometer backscatter coefficient values after 12:00 LT. Figure 2b shows that ceilometer-estimated PBLHTs match the evolution of aerosol layers shown in Fig. 2a well, which demonstrates the advantage of using continuous high-temporal-resolution PBLHT data for studying boundary layer structures. PBLHT from the ARM PBLHT-SONDE VAP at each radiosonde launching time are also plotted in Fig. 2 with different colors for different PBLHT estimation methods. Ceilometer-estimated PBLHT agrees well with that from the PBLHT-SONDE VAP at 05:30 and 11:30 LT on 10 February radiosonde launches, and PBLHT from the PBLHT-SONDE VAP has a narrow range. Ceilometer-estimated PBLHT agrees well with the bulk Richardson number method at 23:30 LT on 9 February and with the Heffter method at 17:30 LT on 10 February. PBLHT from the PBLHT-SONDE VAP spans a large range at these two time periods.
To better understand PBLHT estimations from ceilometer data and radiosonde data, Fig. 3 shows profiles of ceilometer backscatter coefficient, radiosonde-derived potential temperature, and Richardson number that are used to estimate PBLHT in different methods. Estimated PBLHT from ceilometer data and the ARM PBLHT-SONDE VAP are also plotted. At 23:30 LT on 9 February, the boundary layer is stable as seen in the potential temperature profile. The ceilometer backscatter coefficient profile shows a strong negative gradient, but potential temperature and bulk Richardson number show a strong positive gradient at the height of 0.3 km a.g.l., which agrees well with PBLHT CEIL and PBLHT Richardson. PBLHT Heffter and PBLHT Liu-Liang are underestimated. At 05:30 LT on 10 February, the boundary layer is still stable. PBLHT CEIL, PBLHT Richardson, and PBLHT Liu-Liang agree well, but PBLHT Heffter is underestimated. At 11:30 LT on 10 February, the boundary layer is well-mixed and all PBLHT estimations agree well. At 17:30 LT on 10 February, there is a weak stable layer developed near the surface, where the low-altitude atmosphere is still well-mixed. This is a typical structure of a residual layer overlaying a weak stable layer. PBLHT CEIL and PBLHT Heffter captured the top of the residual layer, while PBLHT Liu-Liang is underestimated. PBLHT bulk Richardson is quite low, because it identifies the top of the weak stable layer as the PBLHT.
It should be noted that PBLHT CEIL performed well for this day. However, it is not uncommon that there are days when aerosol loading is not strong or there are advected aerosol layers that cause trouble for accurate PBLHT estimations from ceilometer measurements. Therefore, for the rest of the sections, we will focus on statistical comparisons of these PBLHT estimations using ARM measurements and the PBLHT-SONDE VAP at different ARM fixed-location observatories and AMF field campaigns.

Results and discussions
As discussed in the preceding section, the performance of PBLHT estimation methods might be impacted by the boundary layer stability and the surface type. Literature suggests that the presence of low-level clouds could also impact PBLHT estimations. Therefore, we separate comparisons of PBLHT CEIL and PBLHT SONDE for different boundary layer regimes and cloudy and cloud-free conditions. Figure 4 shows occurrence fractions of different boundary layer regimes, cloudy conditions, and cloud-free conditions at ARM fixed-location observatories and AMF deployments. Figure 4a shows that MAO, NSA, OLI, and AWR are dominated by the SBL regime, while TWP, ASI, SGP, ENA, and COR are dominated by the NRL regime. The CBL regime generally has a small fraction for all the observatories and is negligible for the TWP, MAO, and SGP observatories. Since the CBL and NRL have similar state variables and pollutant profiles, we combine the CBL and NRL regimes and refer to it as the unstable boundary layer condition, in contrast to the SBL regime, which stands for the stable boundary layer condition. To investigate possible impacts of clouds, comparisons are also separated for conditions with and without the presence of low-level clouds below 4 km a.g.l. (referred to as LLC and LLC-free, respectively), as detected by the ceilometer at the time of the radiosonde launch. Figure 4b shows that most of the observatories have an LLC fraction greater than 0.6. In particular, ASI, ENA, NSA, and OLI are largely dominated by LLC. Since the vertical resolution for PBLHT-SONDE is 30 to 60 m, the minimum PBLHT from PBLHT-SONDE is usually higher than 90 m above the surface. Therefore, we only compare PBLHT higher than 90 m from both PBLHT-SONDE and PBLHT CEIL.

Low-level cloud-free unstable boundary layer conditions
To statistically compare PBLHT CEIL and PBLHT SONDE estimations, Fig. 5 shows the correlation coefficients between PBLHT CEIL and PBLHT SONDE estimations with different methods at the nine ARM observatories under LLCfree unstable boundary layer conditions. As a reference, correlation coefficients between PBLHT Heffter and PBLHT Liu-Liang are also plotted (R Heffter−LiuLiang ). From Fig. 5a, PBLHT CEIL has higher correlation coefficients with PBLHT Liu-Liang (R CEIL−LiuLiang ) than PBLHT Heffter (R CEIL−Heffter ) and PBLHT Richardson (R CEIL−Richardson_p25 and R CEIL−Richardson_p5 ) at all ARM observatories except OLI. One reason that PBLHT Liu-Liang performs well might be because the Liu-Liang method uses different algorithms and thresholds for different boundary layer regimes and surface types as discussed in Sect. 2. Sawyer and Li (2013) and Su et al. (2020) suggested that their PBLHT estimations using micro-pulse lidar (MPL) measurements compared better with PBLHT Liu-Liang and preferred to use PBLHT Liu-Liang data to evaluate their PBLHT estimations at the ARM SGP observatory. Su et al. (2020) show that their PBLHT estimations with 8 years of MPL data using a wavelet method have a correlation coefficient of 0.61 with PBLHT Liu-Liang under NRL boundary layer conditions at the ARM SGP site, which is slightly higher than the correlation coefficient of 0.54 for our PBLHT CEIL and PBLHT Liu-Liang comparisons. This could be because MPL operates at the wavelength of 532 nm, which is more sensitive to sub-micron aerosol particles, while ceilometer operates at the wavelength of 910 nm, which is less sensitive to sub-micron aerosol particles and might miss thin aerosol layers. PBLHT CEIL has higher correlation coefficients with PBLHT Heffter than PBLHT Richardson at most ARM observatories. Seibert et al. (2000) suggested that parcel methods using potential temperature profiles are more reliable for PBLHT estimations under convective boundary layer conditions. PBLHT Richardson using Ri c values of 0.25 and 0.5 does not produce statistically different comparisons with PBLHT CEIL. At different ARM observatories, PBLHT CEIL and PBLHT SONDE comparisons show dramatic differences. Low-and mid-latitude land observatories including MAO, SGP, and COR have higher correlation coefficients between PBLHT CEIL and PBLHT SONDE than other observatories, indicating surface type impacts on the comparisons. PBLHT Heffter and PBLHT Liu-Liang comparisons also show high correlation coefficients at MAO, SGP, and COR and weak correlation coefficients at TWP, ASI, and ENA, suggesting that it is still challenging to provide reliable PBLHT estimations at these locations even using radiosonde measurements. It is also noted that correlation coefficients at ASI and AWR show a broad spread, which might be caused by small samples over these two sites. From Fig. 4, these two observatories are either dominated by LLC or under SBL conditions. Although these correlation coefficients show the covariances between PBLHT CEIL and PBLHT SONDE, they do not provide information on absolute differences between PBLHT CEIL and PBLHT SONDE. Kernel density estimates (KDEs), which represent the continuous probability density function of observations in datasets, are shown in Fig. 6 for PBLHT CEIL and PBLHT Liu-Liang under LLCfree unstable boundary layer conditions at the nine ARM observatories. Since PBLHT CEIL has higher correlation coefficients with PBLHT Liu-Liang (R CEIL−LiuLiang ) under LLC-free unstable boundary layer conditions, we prefer to show KDE plots for comparisons between PBLHT CEIL and PBLHT Liu-Liang among all PBLHT estimation methods using radiosonde data. Consistent with correlation coefficients in Fig. 5, MAO, SGP, and COR, which have high correlation coefficients, also show better comparisons between PBLHT CEIL and PBLHT Liu-Liang. At ASI and ENA, which have low correlation coefficients, PBLHT CEIL is generally lower than PBLHT Liu-Liang, probably because these observatories are over the ocean and do not have strong aerosol loadings. While at NSA and OLI, PBLHT CEIL sometimes is much higher than PBLHT Liu-Liang, probably because free-tropospheric aerosol layers transported from low latitudes have larger CEIL backscatter gradients than boundary layer aerosols, and the top of the elevated aerosol layer is misidentified as the PBLHT by CEIL.

Low-level cloud-free stable boundary layer conditions
As was pointed out by previous studies, it is still challenging to obtain reliable PBLHT estimations under stable boundary layer conditions even using in situ radiosonde data (Seibert et al., 2000). Figure 7 shows the correlation coefficients between PBLHT CEIL and PBLHT SONDE estimations with different methods at the nine ARM observatories under LLCfree stable boundary layer conditions. As expected, most correlation coefficients including those of PBLHT Heffter and PBLHT Liu-Liang are close to zero, and some comparisons are even negatively correlated, suggesting significant differences in PBLHT estimations under stable boundary layer conditions among different methods. R CEIL−Richardson_p25 , R CEIL−Richardson_p5 , and R CEIL−Heffter are weakly positive at TWP, SGP, COR, and AWR. Su et al. (2020) show a correlation coefficient of 0.27 for MPL-derived PBLHT and PBLHT Liu-Liang under stable boundary layer conditions at the ARM SGP site, compared to a correlation coefficient of −0.03 for PBLHT CEIL and PBLHT Liu-Liang comparison in this study. This could be because the method they used can only estimate PBLHT several hundred meters above the ground due to MPL near-surface "blind zone", overlap corrections, and the "dilation" parameter used to conduct the wavelet transform. Correlation coefficients at ASI show a broad spread, which might be caused by small samples as ASI is dominated by LLC.
Since R CEIL−Richardson_p25 shows positive values for several ARM observatories, KDE plots for comparisons between PBLHT CEIL and PBLHT Richardson_p25 under LLC-free stable boundary layer conditions are shown in Fig. 8. KDE plots for comparisons between PBLHT CEIL and PBLHT Heffter, and PBLHT Liu-Liang, as well as related discussions, are presented in the Supplement (Figs. S1 and S2, separately). From Fig. 8, although correlation coefficients are low, absolute differences between PBLHT CEIL and PBLHT Richardson_p25 are not large, as can be seen,  and the maximum occurrences of KDE are generally located close to the 1 : 1 line. This is because PBLHT is low under stable boundary layer conditions. PBLHT CEIL often shows much larger values than PBLHT Richardson_p25. This is expected because PBLHT CEIL tends to pick the top of the residual layer or elevated aerosol layer under stable boundary layer conditions.

Low-level cloudy conditions
In the presence of LLC, the ceilometer backscatter coefficient profile shows a sharp gradient at the cloud layer level due to strong attenuation of ceilometer signal by cloud droplets, which could be captured by the ceilometer PBLHT detection algorithms as a PBLHT candidate. Most LLC cloud bases occur at or close to the top of the boundary layer. Indeed, comparisons of PBLHT CEIL and ceilometer LLC cloud base show that in general, they match well, except that sometimes ceilometer LLC cloud bases are higher than PBLHT CEIL (Fig. S3). This is because ceilometer-detected clouds could be advected from other locations or could be formed from moisture layers that are advected from other locations. Correlation coefficients for PBLHT CEIL and PBLHT SONDE comparisons under LLC unstable boundary layer conditions are shown in Fig. 9. Similar to the LLC-free unstable boundary layer conditions, low-and mid-latitude land observatories including MAO, SGP, and COR have higher correlation coefficients between PBLHT CEIL and PBLHT SONDE, while ASI and ENA have low and even negative correlation coefficients. KDE plots for PBLHT CEIL and PBLHT Liu-Liang are shown in Fig. 10. Good agreements between PBLHT CEIL and PBLHT Liu-Liang are shown at MAO, SGP, and COR. Under LLC stable boundary layer conditions, LLCs are often decoupled from the boundary layer. Therefore, correlation coefficients between PBLHT CEIL and PBLHT SONDE are low (Fig. S4), and PBLHT CEIL is generally higher than PBLHT SONDE (Fig. S5).

PBLHT diurnal evolution and seasonal variations
One advantage of PBLHT estimations with remote sensing measurements is that they provide continuous PBLHT estimations that can be used to study PBLHT diurnal evolution. To compare PBLHT CEIL and PBLHT SONDE at different times of a day, Fig. 11 shows box-and-whisker plots of PBLHT diurnal cycles and their seasonal variations from PBLHT CEIL, PBLHT Liu-Liang, and PBLHT Richardson_p25 at the ARM SGP observatory. There are clear PBLHT diurnal evolutions at all seasons from all PBLHT estimation methods at the ARM SGP observatory. In general, the boundary layer stays shallow within about 1 km a.g.l. during nighttime and starts to grow at ∼ 09:00 local time, reaches its peak value at late afternoon, and then begins to decay. This is consistent with past studies (Sawyer and Li, 2013;Su et al., 2020). Comparing different PBLHT estimation methods, PBLHT CEIL produces overall higher PBLHTs during nighttime when boundary layers are mostly SBL and PBLHTs are comparable with PBLHT SONDE during daytime. In particular, after ∼ 20:00 local time, PBLHT CEIL is much higher than PBLHT SONDE because PBLHT CEIL tends to detect the residual layer, while PBLHT SONDE tends to detect the stable boundary layer. PBLHT Liu-Liang produces low PBLHTs during nighttime that are generally within 0.4 km a.g.l., which are always lower than PBLHT Richardson_p25. This suggests that PBLHT Liu-Liang may need to adjust the thresholds used to derive PBLHT under stable boundary layer conditions. PBLHT Richardson_p25, on the other side, often produces low PBLHTs in the afternoon compared with PBLHT CEIL and PBLHT Liu-Liang, which indicates that the bulk Richardson method is not suitable to provide reliable PBLHT estimations for strong convective boundary layer conditions. For different seasons, summer has the largest PBLHT diurnal evolution as well as the highest PBLHTs during the afternoon under convective boundary layer conditions, while winter has the smallest PBLHT diurnal evolution and lowest PBLHTs, mainly because summer has stronger surface convections than other seasons.

Summary and conclusions
Ceilometer observations facilitate continuous measurements of aerosol backscatter profiles, which have been widely used to estimate the planetary boundary layer height (PBLHT). Good agreements between the ceilometer and radiosonde estimations have previously been reported for short-term campaigns at single locations. In this study, we extend that comparison to multi-year time series for nine different DOE ARM sites located over land and ocean in different climate zones.
The ARM PBLHT-SONDE Value-added Product (VAP) implements three commonly used methods including the Heffter method, the Liu-Liang method, and the bulk Richardson number method to estimate PBLHT from radiosonde data. The Vaisala CL31 ceilometer at ARM observatories identifies up to three boundary layer height candidates from total backscatter coefficient profile measurements using the enhanced gradient method and assigns a quality index to each candidate. The boundary layer height candidate with the highest quality index is selected as the ceilometerestimated PBLHT (PBLHT CEIL).
We first compared PBLHT CEIL and PBLHT SONDE estimations for an example day on 10 February 2015, at the ARM Southern Great Plains (SGP) observatory. By examining the ceilometer backscatter coefficient, radiosondederived potential temperature, and Richardson number profiles, we found that PBLHT CEIL performed well at all four radiosonde launching times for this day. Then, statistical comparisons of PBLHT CEIL and PBLHT SONDE under different atmospheric conditions including stable and unstable boundary layers, low-level cloud-free conditions (LLCfree), and LLC cloudy conditions were performed at different ARM observatories. Under unstable boundary layer conditions, ARM low-and mid-latitude land observatories have higher correlation coefficients and good comparisons between PBLHT CEIL and PBLHT SONDE, while ARM observatories at the ocean surface have weak correlation coefficients. Comparisons between different methods used in PBLHT SONDE show similar features, indicating that it is still quite challenging to provide reliable PBLHT estimations over the ocean surface. Under stable boundary layer conditions, however, most correlation coefficients including those for comparisons between different methods used in PBLHT SONDE are close to zero or even negative, except those comparisons between PBLHT CEIL and PBLHT Richardson show agreement at several ARM observatories. This suggests that it is still challenging to obtain reliable PBLHT estimations under stable boundary layer conditions, even using in situ radiosonde data, and the Richardson method is more suitable for estimating PBLHT for these conditions. Overall, the presence of LLC has little impact on the comparisons between PBLHT CEIL and PBLHT SONDE. We further compared PBLHT CEIL and PBLHT SONDE at different times of the day by examining the PBLHT diurnal evolution at  the ARM SGP observatory. PBLHT CEIL produces overall higher PBLHTs during nighttime when boundary layers are mostly under stable conditions and comparable PBLHTs with PBLHT SONDE during daytime.
Our statistical comparisons between PBLHT CEIL and PBLHT SONDE at the ARM SGP observatory are similar to past studies that compared PBLHTs estimated with micropulse lidar (MPL) and with radiosondes using multiple years of data (Sawyer and Li, 2013;Su et al., 2020), but are not as good as those comparisons using limited data from a single location or a short-term campaign (Haeffelin et al., 2012;Haman et al., 2012). The main reason is that in those studies PBLHTs were manually selected from ceilometer backscatter local gradient minimum, while we used an automatic selection method that may fail to pick up the correct PBLHT candidate under stable boundary layer conditions or when a strong elevated aerosol layer is detected. Therefore, advanced PBLHT estimation methods are still needed to improve PBLHT estimations from both ceilometer and radiosonde data. Comparisons of different PBLHT estimation methods could help provide an uncertainty range for PBLHT. On the other hand, the residual layer top detected by ceilometer during nighttime could provide useful information to quantitatively study the impacts of the residual layer on the development of the boundary layer.
Data availability. The ceilometer backscatter and boundary layer height data and ARM PBLHT VAP data used in this study can be downloaded from the ARM data archive site: https://doi.org/10.5439/1095593 (Morris and Brian, 2022) and https://doi.org/10.5439/1150253 (Riihimaki and Zhang, 2022).
Author contributions. DZ analyzed the data and wrote the manuscript draft; JC and VM reviewed and edited the manuscript.