Clouds over Hyytiälä, Finland: an algorithm to classify clouds based on solar radiation and cloud base height measurements

. We developed a simple algorithm to classify clouds based on global radiation and cloud base height measured by pyranometer and ceilometer, respectively. We separated clouds into seven different classes (stratus, stratocumulus, cumulus, nimbostratus, altocumulus+altostratus, cirrus+cirrocumulus+cirrostratus and clear sky+cirrus). We also included classes for cumulus and cirrus clouds causing global radiation enhancement, and classiﬁed multilayered clouds, when captured by the ceilometer, based on their height and characteristics (transparency, patchiness and uniformity). The overall performance of 5 the algorithm was nearly 70 % when compared with classiﬁcation by an observer using total sky images. The performance was best for clouds having well-distinguishable effects on solar radiation: nimbostratus clouds were classiﬁed correctly in 100 % of the cases. The worst performance corresponds to cirriform clouds (50 %). Although the overall performance of the algorithm was good, it is likely to miss the occurrence of high and multilayered clouds. This is due to the technical limits of the instrumentation: the vertical detection range of the ceilometer and occultation of the laser pulse by the lowest cloud layer. 10 We examined the use of brightness parameter, which is deﬁned as a ratio between measured global radiation and modeled radiation at the top of the atmosphere, as an indicator of clear sky conditions. Our results show that cumulus, altocumulus, altostratus and cirriform clouds can be present when the parameter indicates clear sky conditions. Those conditions have previously been associated with enhanced aerosol formation under clear sky. This is an important ﬁnding especially in case of low clouds coupled to the surface which can inﬂuence aerosol population via aerosol-cloud interactions. Overall, caution is 15 required when the parameter is used in the analysis of processes affected by partitioning of radiation by clouds. contributions. and MK developed the concept of the study. MP and MV initiated the data analysis and development of the algorithm. SK reﬁned the parameter ranges of the algorithm using total sky images, and estimated the performance of the algorithm. IY performed the data analysis for the paper and wrote it. EE, DT and VMK helped with the interpretation of the results. All authors gave critical feedback and helped with shaping the paper.

In order to reduce the overlap of parameter ranges between different cloud types (Duchon and O'Malley, 1999), and to make sure that the parameter ranges are applicable for conditions in Hyytiälä, we compared the cloud classification made by a human 125 observer, using total sky images, to corresponding radiation characteristics and CBHs between 01 May and 31 July 2014 ( Fig.   1). In the validation process, we used total sky images and ceilometer data from the BAECC campaign. The ceilometer used in the campaign was also Vaisala CL31, but it was positioned about 500 m away from the standard ceilometer of SMEAR II. We discuss the consequences of ceilometer position for cloud classification in Appendix A1 (Fig. A1).
In the cloudiness and cloud classification analysis, we used quality checked pyranometer and ceilometer data measured at 130 SMEAR II in 2014 and 2016-2017. Data from 2015 was excluded because the data availability was low due to instrumental issues of the ceilometer. The time resolution of the data was 1 min, and gaps were interpolated with the nearest value. The interpolation was important only for intermittent measurements of precipitable water and AOD. Otherwise the data availabilities of the measured variables were high during the measurement period, ca. 90 %. When conducting seasonal analysis, we determined the seasons so that spring included March, April and May, followed by summer (June, July and August), autumn 135 (September, October and November) and winter included (December, January and February).
For the cloud type classification and cloud occurrence analysis based on pyranometer measurements, we used only data when solar zenith angle (SZA) was less than 70 • as the pyranometer data are not reliable when the Sun is close to horizon.
Because SZA is always larger than 70 • before 27 February and after 16 October, we included only data from March to September so that we used only months with full data availability. However, for the cloud occurrence and CBH analysis using 140 the ceilometer measurements, we used data independent of the time of day and season, because the ceilometer is not as sensitive to SZA as the pyranometer. We calculated the value of SZA with Solar Position Algorithm (SPA) online calculator, available in https://midcdmz.nrel.gov/solpos/spa.html (last access: 09 January 2020).

Cloud type classification parameters
Our algorithm uses three parameters to classify clouds: transparency (TR), patchiness (PA) and measured CBH. Transparency (1) Transparency describes how effectively clouds block solar radiation. Transparency is equal to 1 in clear sky conditions and it approaches 0 for an overcast sky. In this work the chosen time interval is 21 min similar to Duchon and O'Malley (1999).

150
The length of the time interval is based on empirical experience: the time interval should be long enough to capture the cloud variability, but it should not be too long so that the prevailing cloud type changed within one interval.
Patchiness determines the variability of the cloud layer. The same time window of 21 min is used. The scaling of the global 155 radiation is discussed later in this section.
The third criterion is the running minimum of the lowest CBH over a 21 min interval. To assess the patchiness of cumulus clouds, an additional parameter (TR max ) is included. TR max is 21 min moving maximum of the relation between measured global radiation and modeled clear sky radiation, and hence it describes the cloud free moments when cumulus clouds are present.

160
In this study, we used Solis clear sky model to calculate the amount of global radiation that would reach the surface of the Earth in case there were no clouds (Ineichen, 2008). From the model, the obtained global radiation at ground level is where I 0 is the solar flux density at the top of the atmosphere (I 0 ) multiplied by a factor associated with AOD and precipitable water, τ g is global total optical depth, and g is a fitting parameter related to AOD and precipitable water. The detailed 165 descriptions of the parameters can be found in Ineichen (2008).
The relationship between the measured global radiation and the modeled global radiation gives the fraction of radiation that reaches the surface of the Earth. For the cloud classification algorithm, we scaled the measured radiation because the magnitude of the oscillations in the radiation due to clouds are different depending on the time of day. Since the amount of incoming solar radiation is lower in the morning and evening compared to the noon, the fluctuations due to same types of clouds are higher 170 around noon. We used 1400 Wm −2 for scaling because it is slightly higher than the theoretical maximum of incoming solar 6 https://doi.org/10.5194/amt-2020-130 Preprint. Discussion started: 8 June 2020 c Author(s) 2020. CC BY 4.0 License.
radiation. The scaling factor (s) was calculated as in Duchon and O'Malley (1999): We multiplied the measured global radiation by the scaling factor in order to obtain the scaled radiation:

Calculating cloud occurrence from pyranometer data
We determined the cloud occurrence from the pyranometer measurements as the ratio between the measured global radiation and modeled radiation at the top of the atmosphere (I). The radiation at the top of the atmosphere is We used 21 min running average of data in 1 min time resolution. When the radiation measured with the pyranometer was less 180 than a certain percentage of the modeled top of the atmosphere radiation, we assumed that the data point corresponded to cloudy conditions. For summer months, the percentage that we used was 70 %, for April 65 %, and for March and September 55 %.
We estimated the percentages separately for each month using a clear sky model with relatively high aerosol load (AOD 675 nm = 0.17). The percentages were different for different months because the position of the Sun is higher in summer than in spring and autumn. For this analysis, we used only data from March to September to avoid errors in the measurements caused by large 185 SZA.

Results and discussion
First, we analyzed the ceilometer and the pyranometer data to study the seasonal variation in cloud occurrence in order to gain insight into how often clouds are observed over Hyytiälä, and what are the typical CBHs (Sect. 3.1). Second, the cloud classification algorithm is introduced along with the evaluation of the performance of the algorithm (Sect. 3.2). We study the 190 statistics of the automatically produced cloud types in Sect. 3.3. In Sect. 3.4, we discuss the use of brightness parameter as an indicator of clear sky condition, and finally in Sect. 3.5 we compare the main findings of this work with other studies.

Cloud properties
We studied the seasonal variation of cloud occurrence measured with the ceilometer (Fig. 2a) and the pyranometer (Fig.   2b). We obtained the cloud occurrence by dividing the number of cloud observations in one month by the total number of 195 data points in the month. Cloud occurrence calculated using ceilometer data (Fig. 2a) included all the data, whereas cloud occurrence calculated using the pyranometer data (Fig 2b) included only times when SZA was less than 70 • . The ceilometer measurements show a robust seasonal variation in cloud occurrence in Hyytiälä, with cloud occurrence being lower during summer months (56 %) and higher during winter months (79 %) (Fig. 2a). The overall cloud occurrence was about 66 %, and from March to September it was about 55 % (Fig. 2a). Cloud occurrence calculated from the pyranometer data did not show a 200 seasonal cycle, and also the cloud occurrence was higher (73 %) compared to the ceilometer measurements (Fig. 2b).
Diurnal variation of cloud observations by the ceilometer in different seasons is shown in Fig. B1. A diurnal cycle in cloud occurrence was observed in summertime (Fig. B1b). The cloud occurrence had a maximum around 14:00, likely being associated with the development of convective clouds. In May and September, a robust diurnal cycle was also observed whereas in other months the variation was absent ( Fig. B1a and c). We did not investigate the diurnal variation from the pyranometer 205 measurements as the method is limited by SZA, and hence the observations were not distributed evenly throughout the day.
From the ceilometer data, we could retrieve the occurrence of the two-layered and three-layered clouds. The second and the third cloud base were observed about 2-10 % and less than 1 % of the time, respectively, depending on the month (Fig. B2).
Hence, the frequency of the times when single-layered clouds were detected by the ceilometer, was approximately the same as the observed cloudiness in total. Both the second and third cloud layer seemed to have higher frequencies of occurrence 210 during summertime compared to winter, even though there were substantial differences between the years (Fig. B2). When a multilayered cloud was observed, it was two-layered in 92 % of the cases.
To identify the most common CBHs observed over Hyytiälä, we investigated the seasonal (Fig. 3) and diurnal ( Fig. B3) variation of CBHs measured with the ceilometer. In each month, we divided every CBH record from the lowest cloud layer into 400 m bins. We calculated the frequency of CBH records in each bin as a ratio between the number of CBH records in the bin and the total number of CBH records in that month. Figure 3 shows that when a cloud base was observed, the most frequently observed CBH was below 800 m for all months, although the relative amount of CBH records below 800 m was higher in winter than in summer. Figure 3 also shows that in spring and summer, the CBH distribution was more dispersed, and a second maximum at about 1 600 m was detected. The measured CBHs were most likely associated with cumulus clouds as they were more often observed in summer (see Fig. 5 in Sect. 3.3). This is also supported by the fact that a pronounced diurnal 220 cycle in CBH, with higher values in afternoon compared to morning, was measured in summer, whereas in winter no diurnal cycle was observed (Fig. B3). In summertime, the frequently detected CBHs around 3 000-4 000 m are probably middle level altocumulus and altostratus clouds. Overall, low clouds (CBH < 2000 m) were observed 87 % of the time when clouds were detected, middle clouds (2000 m < CBH < 7000 m) 13 %, and high clouds (CBH > 5000 m) 1 % of the time. The limits of the cloud level classification follow the criteria defined in Houze (1993).

225
The seasonal variation of CBH distribution of all clouds, and single, two and three-layered clouds, are presented in Fig. B4.
It confirms the observation of single-layered clouds dominating the CBH distribution based on ceilometer data that we found 9 https://doi.org/10.5194/amt-2020-130 Preprint. Discussion started: 8 June 2020 c Author(s) 2020. CC BY 4.0 License. also in Figs. 2 and B2, as the distribution of single-layered clouds resembles the distribution of total observed cloudiness. The seasonal variation of the CBHs of multilayered clouds reflects the seasonal variation of the lowest cloud layer (Fig. 3). Figure B5 displays the height differences between cloud layers in different seasons. In all the seasons, the most common 230 height difference between all the layers (1st and 2nd in two-layered clouds, 1st and 2nd in three-layered clouds and 2nd and 3rd in three-layered clouds) was less than 400 m (50-70 %). Towards the higher end, the distribution decreased gradually.
The large distances between cloud layers were slightly more common in autumn and winter than in spring and summer ( Fig.   B5). In cases when three cloud layers were detected, the distance between the first and the second cloud layer was usually smaller than the distance between the second and the third cloud layer. Also, the first and the second cloud layer, in cases with 235 three-layered clouds, were found more often close to each other (distance less than 400 m) than the first and the second cloud layer in two-layered cloud cases (Fig. B5).

New cloud classification algorithm and its evaluation
The algorithm classifies clouds based on three parameters, determined in Sect. 2.2: CBH, transparency and patchiness. We adjusted the ranges of the parameters, corresponding to different types of clouds, by constructing planes (TR,PA) for different 240 types of clouds based on total sky images from Hyytiälä ( Fig. 1). We took uniformly and randomly a sample of 665 total sky image-measurement data pairs. To ensure that the middle and high clouds were represented in the analysis, we took another sample of 320 pairs with the condition that the minimum CBH at that time was at least 2 000 m.
We first classified the clouds into cumulus, stratus, stratocumulus, nimbostratus, altocumulus, altostratus, cirrus, cirrocumulus, cirrostratus or clear sky, and the corresponding transparency, patchiness and CBH were recorded. We put the transparency 245 and patchiness values in the plane of parameters (TR,PA) in order to determine the regions in the plane, corresponding to different cloud types. Some cloud types had a significant overlapping in the plane of parameters (TR,PA), and thus could not be distinguished from each other from the point of view of their influence on solar radiation. We created rectangular segmentations in the (TR,PA) plane based on those cloud classes, and thus gained the new parameter 255 ranges for each cloud type. After that, we implemented the parameter ranges into the cloud type classification algorithm. The whiskers inserted into Fig. 4 indicate the transparency and patchiness ranges for different cloud type classes. The CBHs and parameter ranges of radiation characteristics for all cloud types are listed in Table 1.
If the parameters did not fit to the parameter ranges of any of the listed cloud types (Table 1), or the data were missing, we classified the cases into separate classes based on whether the ceilometer did ("Base, no class") or did not ("No base, no 260 class") capture a cloud base. As the ceilometer data were quality checked, the latter class contains basically data points when the sky was clear but the values of other parameters did not fit to the clear sky parameter ranges, or the data were missing.
Additionally, as the field of view of the ceilometer is narrow, the class contains data points when the cloud was not in the field of view of the ceilometer although there were clouds present.
We also examined the characteristics of the second and third cloud layer in cases of multilayered clouds as measured by 265 the ceilometer. We classified these clouds based on the height of the second or third cloud base, and characteristics defined by transparency and patchiness. We used three height classes: "Low level" (LL), "Middle level" (ML) and "High level" (HL).
If the second or third cloud base was below 2000 m, the case was classified as "Low level", if between 2000 and 7000 m situation was "Middle level", and if above 5000 m the situation was "High level" (Houze, 1993). We classified the different cloud layers separately. Hence, even though e.g. the second cloud layer was middle level cloud, simultaneously there might 270 also exist low or high level cloud layers. As other cloud layers could be difficult to be detected above the first cloud layer with the ceilometer, additional condition low level cloud layer was determined: if the difference between the 21 min moving maximum and minimum CBH (CBH max − CBH min ) of the lowest cloud layer was more than 1000 m, and CBH max was less than 2000 m, the case was considered as low level multilayered cloud (Table 1).
We divided multilayered clouds into three characteristic classes: "Multilayer uniform" (MuUni) for uniform and thick cloud 275 layers such as stratus and nimbostratus, "Multilayer transparent" (MuTr) for uniform and transparent cloud layers like cirrostratus and "Multilayer patchy" (MuPa) for patchy clouds with varying transparency such as altocumulus. However, the actual Middle level (ML) 2000-7000 2 or 3 High level (HL) ≥ 5000 2 or 3 Multilayer uniform (MuUni) < 0.5 < 120 2 or 3 Multilayer transparent (MuTt) > 0.5 < 120 2 or 3 Multilayer patchy (MuPa) > 0 > 120 2 or 3 cloud types of the second and third cloud layer could not be determined with the current algorithm. Hence, multilayered classes rather inform of the presence of other cloud layers on top of the lowest, classified, cloud layer.
Before analyzing the cloud type data produced by the algorithm, the performance of the algorithm was investigated. To "Other types" includes cases when the cloud type was changing, and two types of clouds were present in the same image, or cases when it was hard for the observer to distinguish between two similar cloud types (e.g. stratocumulus and stratus).
The data presented in the (TR,PA) plane forms an upside down facing U-shaped pattern (Fig. 4). The physical reason for the U-shape is as follows: when the transparency is high, the patchiness may be either low or high (clear sky vs. cumulus cloud 290 streets), and when the transparency is low, the patchiness is also low (stratiform clouds). There are simply no clouds with low patchiness that would simultaneously have moderate transparency (0.6-0.8). Those transparency values are typical for cumulus clouds which are naturally patchy.

Cloud statistics using the new cloud algorithm
The algorithm was applied to the data from 2014 and 2016-2017 from SMEAR II. Only cases when SZA was less than 70 • 295 were included. Figure 5 displays the monthly occurrence of each cloud type with respect to the total number of data points in that month. Figure 5a-d represent the classified cloud types of the lowest cloud layer detected by the ceilometer. As clear sky cases and cases when the cloud class could not be determined are included ( Fig. 5c and d), Fig. 5a-d will give the frequency of occurrence of each cloud type month-wisely. Therefore, summing the percentages corresponding to these classes ( Fig.   5a-d) monthly will give 100 %. Figure 5e and f represent the second and the third cloud layer in multilayered cloud cases.

300
Overall, the most commonly observed cloud types were stratus (28 %), cumulus (26 %) and stratocumulus (18 % The relative share of "Base, no class" and "No base, no class" cases peaked in the beginning and end of the period of 310 investigation (Fig. 5d). This indicates that the classification of clouds was more difficult in spring and autumn compared to summer. This may be caused by the fact that the used total sky images were taken between 1 May and 31 July, leading to overrepresentation of summertime clouds. Thus, the number of undefined cases could increase in spring and autumn. It should, however, be noted that wintertime was not included. Figure 5e shows that the most commonly observed multilayered cloud type was the low level class (79 %). This is in line 315 with the CBH observations (Fig. B4). Multilayered clouds with middle level cloud layer were also observed often (25 %), whereas high level multilayered clouds were seldom observed (2 %). Overall, the relative fraction of high clouds was smaller compared to the other cloud types ( Fig. 5a and b, 2 and B2).
Multilayered clouds were characterized by low transparency: 41 % of the clouds were patchy and 38 % uniform (Fig. 5f).
Multilayered clouds were transparent in 5 % of the cases. The reason why the numbers do not sum up to 100 % is the missing 320 radiation data needed for the calculation of transparency and patchiness.
When multilayered clouds were present, the lowest cloud layer was most often stratus (29 %), stratocumulus (23 %) or cumulus (18 %). The first cloud layer was determined as low cloud in 72 % of the multilayered cloud cases. Hence, there was typically a low cloud layer above another low cloud layer. This can also be seen in Fig. B4 and B5 as the distance between the cloud layers was usually less than 400 m. Accordingly, the seasonal variations of multilayered uniform and multilayered 325 patchy clouds follow the seasonal variations of stratiform and cumulus clouds, having the maxima in early and late summer, respectively (Fig. 5f). The multilayered transparent cloud type did not show seasonal variation. Transparent clouds are cirriform clouds, and nor did they show seasonal variation. The lack of seasonal variation of cirriform clouds may partially be related to the relatively high occurrence of clear sky and cirrus cases in summertime, because also this class contains cirriform clouds as they cannot be distinguished from the clear sky (Fig. 5b).

330
The diurnal variation of cloud types show that low cumulus clouds peak in the afternoons (Fig. B6a). Similar diurnal variation can also be seen in the frequency of low and patchy multilayered clouds ( Fig. B6e and f). Stratus, nimbostratus, and altocumulus and altostratus were more common in the morning and evening compared to noon ( Fig. B6a and b). As multilayered uniform and middle level layered clouds showed similar diurnal variation ( Fig. B6e and f), those clouds were probably responsible for the observed variation in multilayered clouds. Clear sky combined with cirrus clouds were most often observed in the morning 335 (Fig. B6b). Global radiation enhancement during the presence of cumulus clouds were more common in late afternoons and evenings compared to mornings while during the presence of cirrus clouds it took place both in the mornings and evenings ( Fig. B6c).

Brightness parameter
Brightness parameter (P B ) is determined as a relation between the measured global radiation and the radiation at the top of the 340 atmosphere averaged over half an hour: The parameter has been used as a simplified measure of the prevailing cloudiness in terms of "cloudy" or "clear sky" (Kulmala et al., 2010(Kulmala et al., , 2014aDada et al., 2017). In Kulmala et al. (2010), the limit of clear sky was set to P B > 0.50, in Kulmala et al.
(2014a) it was P B > 0.60, and in Dada et al. (2017) it was P B > 0.70. The limit of cloudy sky in all the three articles was 345 P B < 0.30. Figure 6 demonstrates the brightness parameter values obtained when different cloud types were present. We can see that even when the brightness parameter was above 0.7 (black line in Fig. 6), different types of clouds were present. Only stratus and nimbostratus were not observed with the lowest brightness parameter limit (P B > 0.5). According to our results, cumulus, altocumulus, altostratus and cirriform clouds occurred when the brightness parameter was above the 0.6 or 0.7 limit.

350
The analysis of Dada et al. (2017) was related to aerosol formation. They concluded that aerosol formation was enhanced under clear sky conditions that were determined by the brightness parameter. Our findings indicate that clouds could be present during those days. A possible implication is that there could be a mechanism similar to that in the tropics where aerosol particles

Discussion
Karlsson (2003), Pipatti et al. (2010), and Joro et al. (2010) reported similar frequency of cloud occurrence and yearly variations in Finland, based on satellite and surface observations, as we found in Hyytiälä from the ceilometer measurements (Fig. 2a).
The cloud occurrence retrieved with the pyranometer method were found to be higher than those measured with the ceilometer, 365 and the seasonal variation was absent (Fig. 2b). One reason for the difference between ceilometer and pyranometer results might be the limited vertical resolution of the ceilometer, in which case some of the highest clouds would not be detected, lowering the observed cloud occurrence. As pyranometer measures only the attenuation of solar radiation, the altitude of the cloud does not affect its performance. However, this explanation is improbable because the cloud occurrence estimated using the pyranometer method gave higher values than those found in the literature. Furthermore, the several studies mention the 370 capability of ceilometers to detect clouds reliably, though their field of view is narrow, and performance is better with low clouds (Rodriguez, 1998;Kalb et al., 2004).
In Fig. 2, the cloud occurrence from the ceilometer observations contained also nighttime data points contrarily to the pyranometer data which were filtered by SZA. When only daytime (9:00-15:00) data from the time when SZA was less than 70 • were used in both methods, the difference between the calculated cloud occurrences was reduced slightly (Fig. B7). The 375 cloud occurrence estimated from the ceilometer measurements increased, presumably because the cloud occurrence had a maximum during daytime (Fig. B1). Additionally, the cloud occurrence from the pyranometer method decreased, implicating that the pyranometer method overestimated the cloudiness when data from early mornings and evenings were included, despite the filtering with SZA. Due to Finland's northern location, SZA is high throughout the year compared to locations closer to the Equator. Hence, as the pyranometer method is sensitive to SZA, the most reliable results are obtained during the hours when 380 the Sun is at the highest position, especially in summertime when cloud occurrence had a diurnal cycle (Fig. B1). Moreover, as shown also in Fig. 6, the simple limits set for determining cloudiness may not be efficient in all cases.
The cloud occurrence by the pyranometer was also modulated by the averaging over a 21 min time window. Thereby also cloudless data points might have been considered as cloudy whereas the ceilometer separated clear and cloudy periods.
However, the best practice to separate between clear and cloudy cases depends on the application. For example, if the objective 385 is to quantify albedo, it is reasonable to rely on ceilometer data. Yet, if the objective is to study the effect of clouds generally on the ecosystem, pyranometer data averaged over 21 min are more appropriate in describing the integrated effect of changing light conditions on plants.
Despite the good agreement with the frequency of the cloud occurrence with values found in other studies, we are likely to miss the occurrence of the second and third cloud layer (Fig. B2). Costa-Surós et al. (2013) found similar occurrence 390 of multilayered clouds in Girona, Spain, using identical ceilometer. They compared their results with observations from the nearby airport, and noticed that the ceilometer overestimated the occurrence of single-layered clouds. They hypothesized that it might be due to the occultation of the laser pulse by the first cloud layer, and the fact that the vertical resolution of the ceilometer was too low to detect all high clouds. The occultation by the first cloud layer might be an important phenomenon in Hyytiälä where low and stratiform clouds were frequently observed (Fig. 5). Other studies have also shown higher frequencies 395 of multilayered clouds (Wang and Rossow, 1995;Wang et al., 2000;Li et al., 2015), and differences in detection of cloud layers depending on the method (Wang et al., 1999(Wang et al., , 2000Rossow et al., 2005;Rossow and Zhang, 2010). Additionally, when multilayered clouds were observed, they predominately were two-layered, having two low cloud layers on top of each other (Fig. B4). Comparing the results with previous publications, this indicates that we miss middle and high clouds (Rossow et al., 2005;IPCC, 2013;Li et al., 2015).

400
The high contribution of low single-layered clouds also modulates the observed CBHs. Joro et al. (2010) investigated cloudiness in Finland by combining satellite and ceilometer data. They found that in February low clouds dominated the CBH distribution while in August there was a second maxima around 2 500 m. We found a similar second maximum in April, May and June but in August the distribution decreased towards the end of higher CBHs ( Fig. 3 and B4). The second maximum that we found was around 1 600 m, i.e. at lower altitude compared to results by Joro et al. (2010). However, Joro et al. (2010) reported 405 results from only two months whereas we had data from three years. Despite the high frequency of low clouds, our findings produce the distribution of CBHs similar to Wang and Rossow (1995) and Wang et al. (2000) who reported averaged CBH distribution of satellite and rawinsonde, respectively, data from many stations.
Our observation that there is often a low level cloud layer on top of low clouds explains the small difference between cloud layers (Fig. B5). Wang and Rossow (1995) reported separation distances between cloud layers. They found that most often the 410 separation distance was about 1 km whereas we found that the distance between two consecutive cloud layers was about 400 m. However, the results are not completely comparable due to different data analyzing procedures.
The cloud classification algorithm was able to produce the correct cloud type in about 70 % of the cases (Table 2). When other types of clouds than those that are classified by the algorithm were excluded, the performance was up to 84 %. The performance was better with clouds having distinguishable effects on radiative conditions. For example, very opaque nimbostratus clouds, the algorithm identified correctly in 100 % of the cases. The least accuracy was obtained with cirriform clouds (50 %). This may be caused by the weaker detection of the high clouds by the ceilometer.
The performance of our algorithm was significantly better compared to the 45 % agreement of the original algorithm by Duchon and O'Malley (1999), and 45 % agreement of an other algorithm employing also solar radiation measurements (Calbó et al., 2001). Moreover, when Calbó et al. (2001) reduced the number of cloud classes from nine to five, the classifier reached 420 58 % agreement with human-observed cloud classes. The performance of the new algorithm was approximately similar to the average performance of the reviewed cloud classification algorithms in Tapakis and Charalambides (2013). Our simple algorithm is based on measurements by two common instruments: pyranometer and ceilometer, and hence the good performance compared also to other, more sophisticated or expensive methods, is remarkable.
We found that low clouds were frequently observed (Fig. 5). When comparing the results with surface observations from 425 Finland, we found that the algorithm produced approximately a similar frequency of occurrence and diurnal variation (Fig. 5 and B6) as the observations in Eastman and Warren (2014) and Climatic Atlas of Clouds Over Land and Ocean (available online at https://atmos.uw.edu/CloudMap/, last access: 10 January 2020; method explained in Hahn and Warren (2007)). However, the frequency of middle, and especially high cirriform clouds, were up to tenfold smaller compared to the values in Climatic Atlas of Clouds Over Land and Ocean (Fig. 5b). This can partly be explained by the fact that our clear sky class contained 430 also cirrus cloud cases. The better accuracy for low clouds was, however, likely caused by the limitations of the ceilometer to observe high clouds as discussed above. Moreover, Li et al. (2015) reported that high and middle clouds often coexist with other types of clouds. According to their results, at 60 • N high clouds are often observed together with low or middle clouds.
We could not capture cases with many cloud layers because of the occultation of the laser pulse. When studying our results, high level multilayered clouds were seldom observed but rather two low cloud layers coexisted (Fig. 5).

435
Many cloud types showed seasonal variation, having a maximum in summertime, e.g. cumulus clouds peaking in late summer (Fig. 5). The algorithm reproduced the seasonal variation of clouds reported in Climatic Atlas of Clouds Over Land and Ocean. However, as our analysis does not cover winter months, some possible discrepancies were observed: according to Climatic Atlas of Clouds Over Land and Ocean, the occurrence of nimbostratus has a minimum in summertime, and the occurrence of stratus has a maximum in autumn. In our study, nimbostratus showed relatively constant frequency of occurrence 440 from April to August but was almost absent in March and September. Stratus had a maximum in June but the differences in frequency with August and September were minor (Fig. 5a). Hence, we cannot conclude the deviation from the cloud observations reported in Climatic Atlas of Clouds Over Land and Ocean, and overall the performance of the simple algorithm was very good.

445
The present study included a formulation of a cloud type classification algorithm, and investigation of cloud properties at SMEAR II measurement site in Hyytiälä, Finland. The overall cloud occurrence measured by the ceilometer was in agreement with the reported values in literature, though the frequency of single-layered clouds were likely overestimated, and the occurrence of middle and high clouds underestimated. We hypothesize that this is caused by the facts that the vertical maximum measurement height of the ceilometer did not allow it to detect all the high clouds, and that the occultation of the laser pulse 450 by the lowest cloud layer prevented the observation of other cloud layers.
The developed cloud classification algorithm is based on two variables measured continuously at the station: global radiation and CBH. Despite the simplicity of the algorithm, it can identify seven different cloud types along with classification of multilayered clouds based on their base height and characteristics (uniform, transparent or patchy). The overall performance of the algorithm was almost 70 %, indicating a good ability to distinguish cloud types observed over a boreal forest. The 455 algorithm may, however, be utilized also in other environments. Because the algorithm is based on attenuation of solar radiation, the performance is better with cloud types that have a distinguishable impact on radiative conditions on the Earth, such as nimbostratus. We are confident that the algorithm is able to reproduce the cloud types rather reliable in common situations, though it is probable that it does not reproduce all the high and multilayered clouds due to the limitations of the performance of the ceilometer, as discussed above. Indeed, we showed that low and optically thick stratiform and cumulus clouds occurred 460 frequently, indicating the high probability for occultation of the laser pulse.
The brightness parameter is defined as a ratio between the measured global radiation and calculated radiation at the top of the atmosphere, and has been used as an indicator of clear sky or presence of clouds. We found that cumulus, altocumulus, altostratus and cirriform clouds were present when brightness parameter indicated clear sky conditions. Thus, the studies defining clear sky cases based on brightness parameter, may be biased. The new algorithm may be utilized in the future to 465 distinguish clear sky conditions in a more reliable way.
As the focus of this study was in the development of the algorithm, we used data only from three years. The measurements of the CBH and the global radiation at SMEAR II have been ongoing since 2008, and therefore the analysis can easily be extended in the future for longer time periods and different data sets. The current algorithm is the first one indicating the prevailing cloud types at SMEAR II, and we encourage to use it in studies related to the boundary layer interactions involving 470 radiation processes and clouds.
Data availability. The used data measured at SMEAR II can be accessed with Smart-SMEAR online tool (https://avaa.tdata.fi/web/smart, last access: 09 January 2020, ). AOD and precipitable water were obtained from AERONET data base are available at https://aeronet.gsfc.nasa.gov/ (last access: 04 February 2020, (Holben et al., 1998)). Total sky images are available at https://adc.arm.gov/ discovery/#v/results/s/fsite::tmp (last access: 11 March 2020, (Petäjä et al., 2016)). The cloud classification produced in this study is available were examined. Thus, the classification algorithm developed with one instrument is also applicable with another instrument.    Figure A1. Comparison of the frequencies of the produced cloud types with the ceilometer of SMEAR II (blue) and ARM campaign (red)

31-Jul-2014
during four random days. The frequencies were obtained by dividing the number of cloud type records by the total number of data points in the day. The abbreviations of the cloud types are found in Table 1. Note the different limits of the y-axes.   Figure B4. The frequency of single-layered clouds (a,d,g,j), two-layered clouds (b,e,h,k), and three-layered clouds (c,f,i,l). The CBH records of one season were divided into 400 m bins, and the frequencies were obtained by dividing the number of CBH records in each bin by the total number of CBH records in the season. Figures (a-c) represent spring, (d-f) summer, (g-i) autumn and (j-l) winter. Note the different limits of the y-axes.  Figure B5. Distribution of the height difference between cloud layers. Blue represents the difference between the lowest and the second cloud layer in two-layered clouds, red similarly but in three-layered clouds, and yellow represents the difference between the second and the Daytime (9:00-15:00) values from March to September when SZA was less than 70 • were used. The number of cloud observations was divided by the total number of data points in one month to obtain the cloud occurrence. The occurrence estimated from the ceilometer measurements (64 %) was lower compared to the occurrence estimated from the pyranometer data (71 %).