Improving Cloud Type Classification of Ground-Based Images Using Region Covariance Descriptors

Cloud types are important indicators of cloud characteristics and short-term weather forecasting. The meteorological researchers can benefit from the automatic cloud type recognition of massive images captured by the ground-based imagers. 10 However, by far it is still of huge challenge to design a powerful discriminative classifier for cloud categorization. To tackle this difficulty, in this paper, we present an improved method with region covariance descriptors (RCovDs) and Riemannian Bag-of-Feature (BoF). RCovDs model the correlations among different dimensional features, that allows for a more discriminative representation. BoF is extended from Euclidean space to Riemannian manifold by k-Means clustering, in which Stein divergence is adopted as a similarity metric. The histogram feature is extracted by encoding RCovDs of the cloud image 15 blocks with BoF-based codebook. The multi-class support vector machine (SVM) is utilized for the recognition of cloud types. The experiments on the ground-based cloud image datasets validate the proposed method and exhibit the competitive performance against state-of-the-art methods.

Within recent years, convolutional neural networks (CNNs) have been exploited to tons of image recognition and has achieved remarkable performance (Krizhevsky et al., 2012). Being different from hand-crafted features, CNNs extract 40 hierarchical features including the low-level details and high-level semantic information. Recently, plenty of works (Shi et al., 2017;Ye et al., 2017) have obtained encouraging results by extracting the cloud signature from pre-trained CNNs, such as AlexNet (Krizhevsky et al., 2012) and VGGNet (Simonyan and Zisserman, 2015). In addition, attempts have been made to simply exploit end-to-end CNN models for cloud categorization (Li et al., 2020;Liu et al., 2019;Zhang et al., 2018b). However, the insufficiency of labelled samples might make the network hard to converge in the 45 training stage.
The main challenges of the ground-based cloud image classification task can be ascribed to the following reasons: (1) One single feature cannot effectively describe different types of clouds, we need to extract textural, structural, and statistical features simultaneously.
(2) The scale of cloud varies greatly, therefore, the extracted features should be robust in the presence of illumination changes and nonrigid motion. (3) Different cloud types may have similar local characteristics, and thus the global 50 features need to be considered. To address those issues, we utilize the region covariance descriptors (RCovDs) to encode the features of the cloud image blocks, and with the aid of Bag-of-Feature (BoF), we aggregate those local descriptors to obtain the global cloud image feature for cloud type classification.
The performance of RCovDs (Tuzel et al., 2006) is proved to be superior on object detection (Carreira et al., 2015;Guo et al., 2010;Li et al., 2013;Pang et al., 2008) and classification tasks (Fang et al., 2018;Li et al., 2013;Wang et al., 2012). As 55 the second-order statistics of the image features, RCovDs can provide rich and compact context representations. The noises are largely filtered out by removing the mean values of the features. RCovDs are also scale and rotation invariant, irrespective of the pixel positions and numbers of sample points. Despite of their attractive properties, directly adopting RCovDs for cloud type classification is still of difficulty on account of their non-Euclidean geometry property. RCovDs are Symmetric Positive Defined (SPD) matrices and naturally reside in a Riemannian manifold, therefore, the machine learning algorithms on 60 Euclidean space should be adapted for the automatic cloud image recognition.
In Euclidean space, BoF describes an image as a vector from a set of local descriptors (Jé gou et al., 2012), and it aggregates the local features to obtain a global representation. Inspired by the work in (Faraki et al., 2015a), we encode RCovDs of the https://doi.org/10.5194/amt-2020-189 Preprint. Discussion started: 7 August 2020 c Author(s) 2020. CC BY 4.0 License. local image blocks into a histogram by using Riemannian counterpart of the conventional BoF, taking the geodesic distance of the underlying manifold as the metric. 65 In this paper, we extend our previous work (Luo et al., 2018), and propose an improved cloud type classification method based on RCovDs. The diagram is shown in Fig. 1. In the first step, we extract multiple pixel-level features such as intensity, color and gradients from the cloud image blocks to form RCovDs. In the second step, RCovDs are encoded by the Riemannian BoF to output the histogram representation. In the last step, the histogram is taken as the feed of the multiclass SVM for cloud type prediction. 70 The main contributions of this paper are:  The RCovD is firstly introduced to characterize the cloud image local patterns and the Riemannian BoF is applied to encode RCovDs into image-level histogram;  The impacts of Riemannian BoF codebook size and the image block size on cloud type classification accuracy are investigated; 75  For The small training dataset, the proposed algorithm offers better performance as compared to the state-of-the-art approaches.
The remainder of this paper is organized as follows. Section 2 introduces the ground-based cloud image datasets and details the proposed cloud type classification method. Experimental results and comparisons with other methods are presented in Section 3. Section 4 concludes our contributions and discusses the future work. 80 clear sky, patterned clouds, thick-dark clouds, thick-white clouds, and veil clouds. Figure 2 shows sample images from each category, the images have a dimension of 125 125  pixels . (2) zenithal dataset: This dataset was acquired by the whole-sky infrared cloud-measuring system (WSIRCMS), which is 95 located in Nanjing, China. The zenithal dataset contains 500 sky/cloud images, comprising of five different categories: cirriform clouds, clear skies, cumuliform clouds, stratiform clouds and waveform clouds Liu et al., 2013).

Region Covariance Descriptors
Let f be the W H d  feature map extracted from the cloud image I . For a given rectangular region R with size ww  , it The RCovD is defined by a dd  symmetric covariance matrix R C : C and E is the identity matrix (Huang et al., 2018;Wang et al., 2012;Wang et al., 2018a). RCovDs belong to SPD manifold, when it is endowed with a Riemannian metric, it forms a Riemannian manifold. Based on the metric, the geodesic distance can be induced to measure the similarity of the image features. The geodesic distance is the length of the shortest curve between two SPD matrices on SPD Riemannian manifold. The most common distance is the 115 Affine Invariant Riemannian Metric (AIRM) : where F is the Frobenius matrix norm and log( ) denotes the matrix logarithm. The matrix logarithm can be calculated by singular-value decomposition (SVD), let  T A = U U be the eigenvalue decomposition of a symmetric matrix, the logarithm of A is given by 120 However, AIRM is computationally demanding. Driven by such computational concerns, in this paper, we adopt the Stein divergence (Sra, 2012) as a Riemannian distance metric, which is defined as where denotes det operator. 125

Feature Extraction
The features for cloud type recognition should be representative and discriminative. In this paper, for the zenithal dataset, 7 features are extracted, including the image intensity ( , ) I x y , the norms of first and second order derivatives of ( , ) I x y in both x and y direction, and the norm of gradient. The zenithal cloud image is mapped to a 7-dimensional feature space: We divide the cloud image into image blocks and then compute the SPD matrices with the feature maps defined in Eq. (5)  135 and Eq. (6). With the Riemannian BoF, those local feature descriptors in the form of SPD matrices are converted into a histogram feature vector, which is used for cloud type classification.

Riemannian Bag-of-Feature
BoF requires a codebook with k codewords, which are usually obtained by clustering local descriptors. To extend the conventional BoF from Euclidean space into SPD Riemannian manifold , two issues should be considered: (1)  An alternative way to learn a codebook is to apply the conventional k-means on vectorized RCovDs in the tangent space (Faraki et al., 2015b), however, it neglects the non-Euclidean geometric structure of SPD matrices. Taking the Riemannian geometry of SPD matrices into consideration, a possible way is to compute the cluster centers with Karcher mean (Pennec, 145 2006 where S  is Stein divergence to measure the geodesic distance of i X and the clustering center j C . Given the training set , the codebook is initialized by randomly selecting k RCovDs from , and iteratively update the cluster centers using Eq. (7) until the average distance of each point i X to its nearest cluster is minimized. The procedure is summarized in Algorithm 150 1. We choose the number of codewords empirically by considering the trade-off between classification accuracy and computation consumption, which will be detailed in Section 3. images of each cloud type from the remaining images in the SWIMCAT dataset to construct a set of RCovDs for test, and assign each RCovD to the nearest codeword to obtain the RCovD histogram of each cloud type. As shown in Fig. 4, RCovDs 160 from different cloud types have obviously separable codeword distributions. RCovD distributions of clear sky, pattern and thick-dark clouds are relatively concentrated, while the distributions of thick-white and veil clouds are slightly scattered. In particular, the RCovDs of veil clouds and clear sky are assigned to almost the same codewords, which makes the categorization of these two types challenging. Overall, our proposed Riemannian BoF provides vectorized discriminative representation for the cloud classification task. 165 https://doi.org/10.5194/amt-2020-189 Preprint. Discussion started: 7 August 2020 c Author(s) 2020. CC BY 4.0 License.

Classification
SVM has significant performance in the classification task, since it establishes an input-output relationship straightly from the training dataset, and it exclude the need of any priori assumptions or specific preprocessing phases. Another merit is that, once the training procedure is finished, the classification is directly obtained in real time with a strong reduction of computation (Taravat et al., 2015). 175 For m-class classification tasks, there are several ways to build SVM classifiers. In this paper, the "one-against-one" method is adopted, in which ( 1) / 2 mm− binary classifiers are constructed, and each classifier distinguishes one cloud type to another.
We use the voting strategy to designate the cloud image to the category with the maximum number of votes (Chang and Lin, 2007;Hsu and Lin, 2002;Knerr et al., 1990;Kreßel, 1999). The proposed algorithm is summarized in Algorithm 2, in which SVM is implemented by the LIBSVM toolbox (Chang and Lin, 2007). 180

Experiments and discussion
To demonstrate the performance of our proposed cloud type classification method, we conduct several experiments on the SWIMCAT and zenithal datasets. We firstly analyze the effects of the two parameters (i.e. the codebook size k and the image block size ww  ) involved in the proposed algorithm on cloud type classification accuracy. Then, we design an empirical validation with various training/test partitions. Finally, we quantitatively evaluate and compare the best results of different 185 methods, i.e., WLBP (Liu et al., 2015), BC (Cheng and Yu, 2015), and Luo's methods (Luo et al., 2018).

Parameter Configuration Analysis
In order to assess the impacts of the codebook size, i.e., the centroids number k, and the image block size ww  on cloud classification accuracy, we conduct sensitivity analysis on the SWIMCAT and zenithal datasets. In our experiments, k ranges from 5 to 40 with interval 5 and w ranges from 8 to 120 with the step size of 4. For a given w, the W H d  feature map is 190 the better performance on both datasets. However, we observe that the improvement is not statistical significance after k exceeds 20, while the computing burden increases obviously. In fact, the complexity of the Riemannian BoF is mainly 195 determined by the cluster center number. We note that as the block size w increases, the classification accuracy increases first and then degrades beyond the highest point, this trend is especially evident on zenithal dataset. The reason is that larger blocks can capture more abundant texture information, while the local details might be ignored. Therefore, in the following experiments, considering trade-offs between classification accuracy and efficiency, we set k = 30, w = 24 for the SWIMCAT dataset, and k and w are set to 35 and 52 for the zenithal dataset.

Evaluation on Dataset with Small Sample Size
In machine learning tasks, suitable annotated data samples are in short supply and quite costly for classifier training and testing.
Since manual labeling requires much workforce, it is of great significance to reduce the dependence of the classification model on the labeled dataset. To estimate the performance of the proposed method comprehensively, we extract different proportions 205 of training images randomly from each dataset and take the rest images as the test set. In order to guarantee the stability of the classification results, each experiment was repeated five times to take the average as the final classification result. Figure 6 shows that in the situation of small sample size, for the SWIMCAT dataset, the proposed method achieves accuracy more than 90% on the test set with only 3% images (i.e., 24/784) of the dataset as the training set. The accuracy can be improved by 5% at least when the training set accounts for 9% images (i.e., 72/784). As for the zenithal dataset, our method obtains more than 210 90% classification accuracy on the test set when we randomly select 6% images (i.e., 30/500) of the dataset as training set, and achieving more than 95% accuracy when the proportion of training images increases to 10%. Generally, our proposed method significantly fulfills a high classification accuracy in small training sample situations. This is remarkable, considering that our proposed method is combining just RCovDs and Riemannian BoF. In conclusion, the proposed method requires only a few manually labeled samples to achieve a high cloud type recognition accuracy. 215

Comparison with state-of-the-art methods
Iterated cross validation is chosen as an effective scheme to verify the performance of the classifier. This strategy estimates the performance by randomly choose a part of the samples for independent training and testing the model without these samples, 220 and repeating the procedure dozens of times (Beleites et al., 2013). In each experiment, we randomly select the same proportion (i.e., 1/10, 5/10, 9/10) of images for each category as the training set, and the remaining images are used as the test set. Each classification experiment is repeated 50 times to obtain the average accuracy as the final experimental result.
We compare the performance of our method with the best results published on the SWIMCAT dataset in Table 1.Notice that our algorithm utilizing RCovDs has a 2.58% accuracy rate at SWIMCAT dataset than other methods when the training 225 sample accounts for 1/10 of the total data. And when the training sample accounts for 5/10 and 9/10, the proposed method is slightly higher than Luo's method and much higher than the other two methods. Figure 7 shows the confusion matrix of classification results with our proposed method on SWIMCAT dataset, with 9/10 of the dataset as training set. The discrimination rates of clear sky, pattern clouds and thick-dark clouds are perfect 100%, which demonstrates that these three types tend to be easily distinguished among all cloud types since they have the most significant features. Figure 8 shows two 230 misclassified examples of SWIMCAT dataset, where yellow labels are the ground truth, and the red labels are the predicted cloud types predicted by our method. Notice that the veil clouds are prone to be misclassified as clear sky, since the veil clouds are thin and have highlight transmittance. Moreover, some veil clouds are misclassified as thick-white cloud, when the camera lens is contaminated, and the clouds is too thick. Besides, a small amount of thick-white clouds is misclassified as clear sky, pattern clouds or veil clouds. 235 https://doi.org/10.5194/amt-2020-189 Preprint. Discussion started: 7 August 2020 c Author(s) 2020. CC BY 4.0 License. Figure 7: The confusion matrix of the SWIMCAT dataset classification results using our proposed method. 9/10 of the dataset is used for training and the rest is used for testing, the overall classification accuracy is 98.4%. As for the zenithal dataset, Table 2 illustrates that the proposed method gains the highest overall accuracy compared with the other approaches. Figure 9 displays the confusion matrix of classification results with our method on the zenithal dataset, 245 when 90% of the dataset is used as the training set. The discrimination rates of clear sky, cumuliform clouds and stratiform clouds are up to 100%. Only a small part of waveform clouds is misclassified as clear sky or cirriform clouds. In addition, some of the cirriform clouds are misclassified as clear sky or waveform clouds. Figure 10 illustrates the misclassified images of the zenithal dataset, waveform clouds and cirriform clouds are easy to be categorized as clear sky if the size of sky area is much larger than that of clouds. The reason why the waveform clouds and cirriform clouds are confused with each other is 250 that they sometimes own extremely similar textures.
https://doi.org/10.5194/amt-2020-189 Preprint. Discussion started: 7 August 2020 c Author(s) 2020. CC BY 4.0 License.   https://doi.org/10.5194/amt-2020-189 Preprint. Discussion started: 7 August 2020 c Author(s) 2020. CC BY 4.0 License. Figure 10: Misclassified images of the zenithal dataset. The yellow labels are the ground truth, and red labels are predicted cloud types. Waveform clouds and cirriform clouds are categorized as clear sky because the size of sky area is much larger than that of clouds, and these two cloud types are easily confused as they share similar local patterns.

Conclusions
To tackle the challenge of automatic cloud type classification for ground-based cloud images, in this paper, we present a new classification method with RCovDs as the local feature representation. RCovDs provide a simple way to fuse multiple pixellevel features, which improves discriminative ability for cloud images. The image-level information is obtained by applying Riemannian BoF to encode RCovDs into a histogram. Finally, we apply the "one-against-one" multi-class SVM as the 265 classifier.
It is noted that even we choose relatively simple image features to calculate RCovDs, the performance of the proposed method is still impressive. We conduct parameter analysis experiment and figure out how block size w and codewords number k affect the accuracy of the proposed method. Classification experiments with different training set sizes demonstrate that our method is still efficient in the case of small size training set, which can greatly reduce the labor for labeling. In the third 270 experiment, we compare our method to the other three cloud classification algorithms with different configurations of training/test sets. As the experimental results validate, the proposed method is competitive to state-of-the-art methods on both SWIMCAT and zenithal datasets.
In future work, the features like LBP or GLCM could be gathered and mapped into Riemannian manifold and multi-scale block strategy can be taken into consideration for a higher cloud type categorization accuracy. Others, the complex sky 275 conditions with various cloud types should be deeply investigated to fulfill the application needs.