An all-sky camera images classification method using cloud cover features

The all-sky camera (ASC) images can reflect the local cloud cover information, and the cloud cover is one of the first factors considered for astronomical observatory site selection. Therefore, the realization of automatic classification of the ASC images plays an important role in astronomical observatory site selection. In this paper, three cloud cover features are proposed for the TMT (Thirty Meter Telescope) classification criteria, namely cloud weight, cloud area ratio and cloud dispersion. After the features are quantified, four classifiers are used to recognize the classes of the images. Four classes of 10 ASC images are identified: “Clear”, “Inner”, “Outer” and “Covered”. The proposed method is evaluated on a large dataset, which contains 7328 ASC images taken by an all-sky camera located in Xinjiang (38.19° N, 74.53° E). In the end, the method achieves an accuracy of 97.28% and F1_score of 96.97% by a random forest (RF) classifier, which greatly improves the efficiency of automatic processing of the ASC images.

emergence of more and more all-sky imaging devices such as Whole Sky Imager (WSI; Sneha et al., 2020), Total Sky Imager 30 (TSI; Ryu et al., 2019), All Sky Imager (ASI; Nouri et al., 2018) and All Sky Camera (ASC; Fa et al., 2019), a large number of all-sky images have now been generated, which provides a basis for the development of automatic cloud images classification algorithms.
Many researchers pay attention to the feature extraction technology of different cloud types. Calbo and Sabburg (2008) classified eight types of sky images using statistical features (smoothness, standard deviation, uniformity) and features obtained 35 after Fourier transform of the images. Heinle et al. (2010) proposed a classification algorithm based on spectral features in RGB color space and texture features extracted by grey-level co-occurrence matrices (GLCMs). This method has a high accuracy for the classification of seven different classes of clouds. Sun et al. (2009) proposed a method where the local binary pattern (LBP) operator and the contrast of local cloud image texture are used to classify sky conditions. Li et al. (2016) expressed each image as a feature vector, which was generated by calculating the weighted frequency of the microstructures 40 observed in the image. Dev et al. (2015) investigated an improved text-based classification method that combines color and texture features to improve the effect, the average accuracy on the Singapore Whole-Sky Imaging Categories (SWIMCAT) dataset is 95 %. Gan et al. (2017) worked on different size regions of the image and they used a method based on sparse coding which is effective in terms of localization and computation. Wan et al. (2020) demonstrated that cloud classification result was not well if only texture or color features are used. As a result, a mixture of texture, color and spectral features were obtained 45 from color images and then fed into a random forest (Svetnik et al., 2003).
In recent years, convolutional neural network (CNN) has shown very superior effects in the field of image classification and have been applied to all-sky images classification. Shi et al. (2017) used the output of the shallow convolutional layer of CNN as cloud features and fed it into a support vector machine (SVM) (Cristianini and Shawe-Taylor, 2000). Ye et al. (2017) extracted multiscale feature maps from pretrained CNN and then employed the Fisher vector to encode them, finally sent them 50 to a classifier. Zhao et al. (2019) used the 3D-CNN model to extract the texture features of the images and then output the classification results with a fully connected layer. Zhao et al. (2020) proposed the improved Frame Difference Method extractor to detect and extract features from large images into small images, then these small images were sent to a multi-channel CNN classifier. Liu et al. (2021) proposed context graph attention network (CGAT) for the cloud classification, which the context graph attention layer learned the context attention coefficients and acquired the aggregated features of nodes. 55 The algorithms proposed above, both traditional and deep learning methods, are based on texture, color and spectral features of cloud for classification, and they are all for cloud shape classification of local images or all-sky images. But for the astronomical observatory site selection, cloud cover is also an important factor that must be considered besides cloud shape.
At present, few researchers have studied the classification algorithm based on cloud cover. The existing cloud cover calculation method only calculates the ratio of the cloud coverage area to the effective area of the ASC images, and does not provide a 60 detailed description of the cloud cover (Esteves et al., 2021). Therefore, we propose three cloud cover features in this paper that introduces cloud thickness and distribution position into the cloud cover calculation, and classify the ASC images according to the TMT classification criteria. The rest of the paper is organized as follows. Section 2 introduces the dataset and https://doi.org/10.5194/amt-2021-379 Preprint. Discussion started: 6 December 2021 c Author(s) 2021. CC BY 4.0 License.
TMT classification criteria. Section 3 describes three cloud cover features and proposed classification method. Section 4 shows experiment results and analysis of the results. Finally, it is our conclusion in Section 5. 65 2 Dataset and image classes

Dataset
Images used in this paper were taken by an all-sky camera located in Xinjiang, China (38.19° N,74.53° E), and provided by the Key Laboratory of Optical Astronomy at the National Astronomical Observatories of Chinese Academy of Sciences. The ASC has two parts, a Sigma 4.5 mm fisheye lens and a Canon 700D camera providing a 180°×180° field of view. The frequency 70 of shooting in daytime is 20 minutes, increasing to 5 minutes a shot at night, and the exposure time will be adjusted between 15 and 30 seconds depending on the phase of the moon. The ASC images are stored in color JPEG format with a resolution of 480×720 pixels. In order to facilitate subsequent processing, the images are cropped to retain only the central valid area, and the cropped image size is 370×370 pixels. Note that the captured image is originally rectangular but the mapped all sky is circular, where the center is the zenith and the boundary is the horizon. Figure 1a shows an example of the original ASC image 75 and Fig. 1b displays the cropped ASC image.
We screened the ASC images taken from January 2019 to November 2020 to remove those that were overexposed or encountered bad weather. The screened images are all classified by professionals using a similar manual cloud identification method (see the next section) as described in the Thirty Meter Telescope (TMT) site testing campaign paper (Skidmore et al., 2008). In the end, a dataset with 7328 ASC images was obtained. We did our best to ensure a balanced number of images of 80 each class in the dataset.

Image classes
Traditionally, the ASC image classification takes cloud shape as the basic element, while considering the shape development 85 and internal microstructure of the clouds. However, the cloud cover in the image is the first important factor to be considered https://doi.org/10.5194/amt-2021-379 Preprint. Discussion started: 6 December 2021 c Author(s) 2021. CC BY 4.0 License.
for astronomical observatory site selection. Therefore, we classify the ASC images following the method of Skidmore et al. (TMT classification criteria;. The ASC images are divided into inner and outer circles with zenith angles of 44.7° and 65° respectively, then the cloud cover is determined as "Clear", "Outer", "Inner" and "Covered" according to the thickness and distribution of the cloud on the ASC images. The classification definition is shown in Table 1, and typical images of each 90 class are shown in Fig. 2. Table1. The ASC image classes and corresponding description.

Class Description
Clear No thick cloud within the inner circle. Fig. 2a Outer No thick cloud within the inner circle, has cloud within the outer circle. Fig. 2b Inner No more than 50% cloud within both the inner and outer circles.

Classification based on cloud cover features
In this section, we first describe the cloud cover features proposed, and then introduce the overall process of the ASC images classification method based on the cloud cover features. The framework is illustrated in Fig. 3.

Cloud cover features
The current TMT classification criteria is essentially only based on the area ratio and distribution of the cloud regions in the ASC images with insufficient consideration of cloud thickness and density of distribution, and no intuitive and reliable quantitative indicators, so the classification results are easily affected by human subjective factors. In this paper, three cloud features are proposed for the above problems, namely cloud weight, cloud area ratio and cloud dispersion. Among them, cloud 105 weight indicates the thickness of the cloud, cloud area ratio represents the ratio of the cloud area to the effective area of the image and cloud dispersion reflects the distribution of cloud regions around the center of the ASC images.

Cloud weight
The grayscale value of any pixel in the ASC image can be regarded as the weighted superposition value of the grayscale value of multiple elements such as cloud, sky background and impurities in an ideal state, as shown in Eq. (1). The ideal state 110 means that the grayscale value of the pixel is contributed by only one element and no other elements are involved, and the pixel can be regarded as the "pure pixel" of the element.
where 1 , 2 , ..., denote the grayscale value of the "pure pixel" of different elements. 1 , 2 , ..., are the weights of the contribution of the "pure pixel" of each element to the actual grayscale value of the image pixel. 115 For an ASC image, the grayscale value of the cloud region ideally is superimposed by the grayscale value of the sky background and cloud. Therefore, the grayscale value of cloud region can be defined by the superimposed model as: where is the true grayscale value of the cloud region; is the grayscale value of the sky background in an ideal state.
Since the shooting time of the image is known, the grayscale value of the sky background can be obtained by using images of 120 different dates but the same moment; is the grayscale value of cloud in the ideal state; is the weight of the contribution of the sky background to the actual grayscale value of the pixel in the ideal state. During the shooting process of ASC, the sky background will be blocked by clouds. As the cloud becomes thicker and thicker, the contribution of the sky background to the grayscale value of cloud region will gradually decrease, and this phenomenon will be more obvious in the initial stage when the cloud thickness increases. When the cloud thickness increases to a certain degree, the grayscale value of cloud region 125 is almost completely contributed by the cloud, so the relational expression between and is approximately a monotonically decreasing concave function. After extensive experimental verification, the relationship between the two is derived in this paper as: According to Eq. (2-3), is calculated as follows: 130 https://doi.org/10.5194/amt-2021-379 Preprint. Discussion started: 6 December 2021 c Author(s) 2021. CC BY 4.0 License.
In order to verify the validity of the cloud grayscale value obtained by the superposition model, we process the grayscale image of the local cloud region according to Eq. (4), and the result is shown in Fig. 4. Figure 4a shows the original grayscale image, and Fig. 4b is the grayscale image in the ideal state obtained by using the superposition model. By comparison, it can be found that the grayscale value of the processed image is significantly lower in the thin cloud part, which eliminates the influence of 135 the sky background on the cloud grayscale value, thereby better reflecting the cloud contour. Therefore, the grayscale image processed by the superposition model reflects the grayscale value and the number of cloud pixels more realistically. Based on the previous derivation, we proposed "cloud weight" to indicate the thickness of the clouds. Since there are large 140 differences in color and brightness between clouds and sky in ASC images, this section defines the cloud weight by exploring the relationship between pixel grayscale value, cloud reflectivity, light intensity and cloud thickness.
In grayscale images, the magnitude of the grayscale value is related to the reflectivity of the object and the intensity of the incident light. The grayscale value can represent the brightness of the pixel, which is determined by the reflectivity and the intensity of the incident light. Therefore, the grayscale value of a cloud pixel is calculated as: 145 where indicates the grayscale value of the pixel in the ideal state, represents the reflectivity of the cloud to the light and is the intensity of the incident light. Generally speaking, the reflectivity of the cloud to the light is proportional to the cloud thickness . The thicker the cloud, the higher the reflectivity, the expression between the two is approximately as: where is the correlation coefficient. The incident light intensity is determined by the incident light and the incident light where is the positive correlation coefficient. According to Eq. (5-7), is calculated as follows.

= (8)
Since / is a constant and does not affect the classification result, we omit / in the subsequent calculations. Then the value of represents the relative size of the cloud weight, which can approximately reflect the thickness of the cloud. In order to facilitate subsequent calculations and comparisons, we normalized as follows. 160 where is the normalized value of cloud weigh, ( ) and ( ) represents the minimum and maximum values of cloud weight in the cloud region of the image respectively.
This section defines the cloud weight based on the grayscale value to reflect the thickness of the cloud. The physical meaning of cloud weight is the difference between the brightness of the cloud and sky background, which can approximate the thickness 165 of the cloud region. The feature of cloud weight makes the evaluation of cloud thickness completely based on grayscale images, which promotes the automatic processing of ASC images.

Cloud area ratio
At present, the common cloud area ratio calculation methods include ISCCP (Evan et al., 2007), CLAVR-1 (Wang et al., 2013) and CLAVR-X (Kim et al., 2016), etc. ISCCP divides pixels into non-cloud and cloud pixels and assigns weights of 0 and 1 170 respectively to calculate the ratio of the weighted sum of the two to the total number of pixels as the final result. CLAVR-1 divides the pixels into non-cloud, mixed cloud and cloud, then assigns weights of 0, 0.5, and 1 respectively to calculate the ratio of cloud area. CLAVR-X classifies the image pixels into non-cloud, thin cloud, medium cloud and thick cloud according to the cloud thickness, and the weights are adjusted according to the cloud thickness. All the above calculation models divide the pixels into limited categories and assign weights respectively. Although cloud thickness and cloud coverage area are 175 considered as two cloud cover features, the division of cloud thickness is slightly rough. In the previous section, we completed the preliminary exploration of cloud weight and obtained a more accurate and scientific numerical representation of cloud thickness, so that the weight of cloud pixels can be optimized more accurately.
This work refines the weight for each pixel of the ASC images and redefines the calculation model of cloud area ratio. We use the cloud weight derived in the previous section to represent the weight of cloud pixels and define the ratio of the weighted 180 sum of all pixels to the effective area of the image as the cloud area ratio (CAR), whose equation is: where is the total number of cloud pixels in the image, denotes the cloud weight of each cloud pixel and represents the total number of pixels in the effective area of the ASC image.

Cloud dispersion 185
Cloud dispersion (CD) is proposed mainly to indicate the influence degree of the distance between cloud and the center of the ASC image on astronomical observations results. The closer to the center, the greater the influence. Cloud dispersion is a quantitative representation of this effect. Generally speaking, cloud dispersion has three determinants: cloud area, cloud thickness and the distance between cloud and the center of the image. Therefore, we divide the image into regions, namely ( = 1,2, … , ), the area of is denoted as , the average cloud thickness is denoted as and the absolute distance 190 between the center of the region and the center of the ASC image is . In order to facilitate the calculation, we convert the absolute distance to the relative distance * = / , where represents the radius of the effective area of the image. Then cloud dispersion of an ASC image can be represented using Eq. (12): where ( * ) is an influence degree function with * as the independent variable. The smaller the * , the greater the influence 195 on the observation results, and the greater the value of ( * ). From the above, it can be seen that ( * ) is a function that decreases monotonously with * , and the influence on the observation results is greatest when the cloud is located at the image center. Since the classification of the ASC images in this paper is based on the TMT classification criteria, it is necessary to In actual calculations, this paper takes each pixel as the calculation unit, which can make the measurement of cloud dispersion accurate to the pixel level. Then the region represents the individual image pixel whose area is 1 and the cloud thickness can be expressed by the cloud weight in section 3.1.1. According to Eq. (12-13)

Pre-processing
Since the extraction of cloud cover features in this paper is based on the grayscale images of the cloud region, it is necessary to extract the cloud region in the ASC images first. Sunlight has a high grayscale value and is very easy to be mistaken for 215 cloud when performing cloud detection in the ASC images. Therefore, it is very important to remove the influence of sunlight.
We use the difference method to filter the sunlight in this paper. For a given ASC image, the shooting time and the latitude and longitude of the shooting location are known, so we can use the image with different dates but the same moment as the clear sky background image with the same sun elevation angle as the original image. The grayscale images of two images are first obtained and the difference operation is performed to acquire the image after removing the sun. Then the cloud detection 220 results are obtained using the binarization method. Finally, the cloud region is extracted by applying the image multiplication method.   Figure 6 shows an example to illustrate the various steps of extracting cloud region. Figure 6a is the acquired original ASC image and Fig. 6b is the clear sky background image with the same sun elevation angle. Figure 6c and Fig. 6d are the grayscale images of Fig. 6a and Fig. 6b respectively, which have very similar brightness distributions. Figure 6e is the result of Fig. 6c minus Fig. 6d, and it can be seen that the influence of the sun has been completely eliminated. The cloud region shown in Fig.  230 6f can then be obtained by binarization, and the final extracted cloud region (Fig. 6g) can be obtained by multiplying Fig. 6f with Fig 6c. Generally speaking, the cloud detection accuracy of traditional methods around the sun and near-horizon regions is relatively low, but the method used in this paper can achieve better results in all regions. For thin cloud regions, this method can also accurately identify them. In addition, it is possible to exclude bright noises due to light refraction because they are in the same position and have similar brightness in Fig. 6c and Fig. 6d. 235

Extraction of cloud cover features
In this paper, we proposed three cloud cover features: cloud weight, cloud area ratio and cloud dispersion, which basically contain the information of cloud coverage area, thickness and distribution location. We represent the computational models of three features separately, which should theoretically match the TMT classification criteria. In order to improve the accuracy of ASC image classification, we extracted five features of each image separately, including: cloud area ratio ( ), cloud 240 dispersion ( ), inner circle cloud weight ( ), outer circle cloud weight ( ) and global cloud weight ( ). Among them, represents the average cloud weight within the inner circle, denotes the average cloud weight between the inner and the outer circle and indicates the average cloud weight within the outer circle. Besides, the effective range of both and is within the outer circle. The steps of the feature extraction algorithm are shown in Algorithm 1:  Table 2 shows examples of the value of cloud cover features for each type of ASC images, and the value are normalized for 245 the convenience of comparison. As can be seen from the table, the size of the cloud weight is determined by the thickness of the cloud in the specified area, so there is no obvious difference between the cloud weight of each type image. However, it has a great influence on and , and the size of cloud weight in the inner and outer circle affects the classification of ASC images based on TMT classification criteria, so the , and are used as features for classification in this paper. The and are derived on the basis of cloud weight and the two have an overall positive correlation. Besides, there are obvious 250 differences between the values of different categories. Figure 7 shows the distribution of and for the four ASC images.

Algorithm 1：Extraction of cloud cover features
As Fig. 7 illustrates, the distribution of feature values of each type ASC image is concentrated in a certain range. The and of "Clear" and "Covered" have a more obvious division in values, while the overlap phenomenon exists in "Inner" and "Outer". The distribution ranges of and are relatively similar for each class ASC image, but the overlapping part of for "Inner" and "Outer" is reduced compared with , which indicates that considering the information of cloud location 255 distribution can distinguish the image classes more effectively. Therefore, the and can be used as features to classify the ASC images.
Table2. Examples of cloud cover features for each type of ASC images.

Selection of classifier and training sample
The above features are integrated into a feature set to feed into the classifier for training. To test the effectiveness of our proposed cloud cover features for classification, we selected four classifiers: decision tree (DT), support vector machine (SVM), K nearest neighbor (KNN) and random forest (RF). There are a total of 7328 different types of ASC images in the data set, 265 and we selected 5863 images for training the classifier and 1465 images for testing the effectiveness.

Results and discussion
In this section we evaluate the effectiveness of cloud cover features for the classification of ASC images. Then we analyze and discuss the classification results.

Experiment results 270
The data set is trained and tested using the method in the previous section. To evaluate the classification performance more comprehensively, we use Accuracy, Precision and Recall as evaluation metrics. The accuracy can be calculated based on positive and negative classes as: where In addition, we use F1_score in the evaluation, which can be expressed as: The final test results of the four classifiers are shown in Table 3. It can be seen that the average accuracy of each classifier is more than 95%, indicating that the cloud weight, cloud area ratio and cloud dispersion proposed are effective cloud cover features that can classify ASC images based on the TMT classification criteria, which greatly promotes the automatic 285 processing of the images. The precision and recall of "Clear" and "Covered" are greater than 95%, while "Outer" and "Inner" are both less than 95%, indicating that "Outer" and "Inner" are easy to be misclassified. Among the four classifiers, the best performer is RF, which has an average accuracy of 97.28% and F1_score of 96.97%. To further analyze the misclassification, we obtained the confusion matrix of random forest classifier as shown in Table 4. It can be seen from the table that the classification accuracy of "Covered" and "Clear" is higher, while "Outer" and "Inner" is lower. The non-zero values of the non-diagonal elements in the table represents the probability of misclassification between classes. By looking at the misclassified images, it can be learned that some "Outer" images are misclassified as "Clear" or "Inner". Because some "Outer" images have clouds in the inner circle, but the thickness is extremely small, they will be 295 misclassified as "Inner". Or there are only scattered thin clouds in the outer circle, so they are misclassified as "Clear". "Inner" images are also misclassified as "Outer" or "Covered". Some "Inner" images have clouds in the inner circle, but an incorrect classification is caused by the thickness, or some "Inner" images have a thin cloud thickness althought the distribution of clouds is wide, so it is easy to make a misjudgement. Figure 8 displays some misclassified ASC images. The reason for misclassification is that the thickness of some cloud regions is incorrectly identified. Although the cloud cover features we 300 proposed has taken into account the thickness of the cloud, there is still room for further improvement.
Table4. Confusion matrix of classification result using random forest.

Conclusions
This paper proposes three cloud cover features according to the TMT classification criteria, namely cloud weight, cloud area 310 ratio and cloud dispersion, and completes the classification of ASC images based on these features. In this method, the cloud weight indicates the thickness of the clouds, the cloud area ratio represents the distribution range of the cloud and the cloud dispersion reflects the cloud influence degree on astronomical observation results. We quantify these features and then use a classifier to identify classes of ASC images. A large data set is composed of ASC images taken by the all-sky camera located in Xinjiang, China (38.19° N,74.53° E), and evaluated to verify the effectiveness of the method. The experiment results show 315 that the highest classification accuracy is 97.28% and F1_score is 96.97% by using the cloud cover feature. Based on this method, astronomical observatory site selection experts can greatly reduce the time to classify the ASC images which will also greatly improve the efficiency of image processing. With comprehensive statistical data, they can choose the best site. https://doi.org/10.5194/amt-2021-379 Preprint. Discussion started: 6 December 2021 c Author(s) 2021. CC BY 4.0 License.