Cloud detection in all-sky images via multi-scale neighborhood features and multiple supervised learning techniques

. Cloud detection is important for providing necessary information such as cloud cover in many applications. Existing cloud detection methods include red-to-blue ratio thresholding and other classiﬁcation-based techniques. In this paper, we propose to perform cloud detection us-ing supervised learning techniques with multi-resolution features. One of the major contributions of this work is that the features are extracted from local image patches with different sizes to include local structure and multi-resolution information. The cloud models are learned through the training process. We consider classiﬁers including random forest, support vector machine, and Bayesian classiﬁer. To take advantage of the clues provided by multiple classiﬁers and various levels of patch sizes, we employ a voting scheme to combine the results to further increase the detection accuracy. In the experiments, we have shown that the proposed method can distinguish cloud and non-cloud pixels more accurately compared with existing works.


Introduction
With the trend of sustainable and green energy, there is a growing demand for solar energy technology.To utilize solar energy effectively, integrated and large-scale photovoltaic systems need to overcome the unstable nature of solar resource (Gueymard, 2004;Heinemann et al., 2006;Lorenz et al., 2009).The ability to forecast surface solar irradiance is helpful for planning and deployment of electricity generated by different units.Numerical weather prediction information or satellite images are popular materials used for wide-range prediction (Marquez and Coimbra, 2011;Perez et al., 2002Perez et al., , 2010;;Remund et al., 2008).However, the resolution of prediction with respect to space and time obtained by weather prediction information or satellite cloud images is relatively coarse compared to the resolution desired for photovoltaic grid operators.For more refined spatial and temporal resolution of irradiance prediction, research that analyzes images obtained from devices capturing skies has emerged.Groundbased sky camera systems have been proposed to capture the images of the sky (Sabburg and Wong, 1999), allowing researchers to study the relationship between the sun and clouds and the effect of clouds.Devices developed to monitor the sky presented in some of the pioneering works include whole sky imager (Kassianov et al., 2005;Li et al., 2004), whole sky camera (Long et al., 2006), all-sky imager (Kubota et al., 2003), and total sky imager (Pfister et al., 2003).More recent commercial products include all-sky cameras by Eko Instruments, Oculus, and SBIG.These devices are useful to make up the deficiency of satellite cloud observations in terms of spatial and temporal resolutions.
tively correlated under most conditions.In addition to providing cloud coverage information, accurate cloud detection result could further improve the cloud type classification accuracy (Cheng and Yu, 2015b).It has been established that employing cloud type information in the process of shortterm irradiance prediction could yield more accurate prediction results (Cheng and Yu, 2015a).
Cloud detection in all-sky image decides if a pixel belongs to a cloud.Traditionally, red-to-blue ratio (RBR) of each pixel is used to indicate whether the dominant source of the pixel is from clear sky or clouds (Chow et al., 2011;Johnson et al., 1989Johnson et al., , 1991;;Long et al., 2006;Shields et al., 2007Shields et al., , 2009)).Then, a threshold is applied to RBR to determine cloud pixels in a sky image.The pixels whose RBRs are lower than the threshold are classified as clear sky and the pixels whose RBRs are higher the threshold are labeled as clouds.Selecting a good threshold is very important for RBR method.The work by Long et al. (2006) suggested that different thresholds should be selected depending on the relative position of the pixel being classified in contrast to the positions of sun and horizon.In addition to pure color characteristics, Roy et al. (2001) tried a neural network approach with a wider range of variables for cloud segmentation.West et al. (2014) also used a neural network to classify pixels.The features they used are colors and the distance of the pixel to the sun.Under lower-visibility conditions, aerosol and thin clouds tend to cause errors in cloud determination.To improve the accuracy of the single threshold method, Huo and Lu proposed an integrated method for cloud determination under low-visibility conditions (Huo and Lu, 2009).The integrated cloud-determination algorithm uses fast Fourier transform, symmetrical image features, and self-adaptive thresholds.Li et al. (2011) proposed a hybrid thresholding algorithm (HYTA) for cloud detection on ground-based color images, aiming at complementing fixed thresholding and adaptive thresholding algorithms.HYTA identifies the ratio image as either unimodal or bimodal according to its standard deviation.Then, the unimodal and bimodal images are handled by fixed and minimum cross entropy (MCE) thresholding algorithms, respectively.Kazantzidis et al. (2012) tuned multiple heuristic thresholds on RGB (red, green, blue) color components to detect clouds.The abovementioned works mostly consider the features extracted from each single pixel but not the local image patch and structure around the pixel.Bernecker et al. (2013) used color and texture as features.After applying deep belief networks to learn the structure of the features, a random forest classifier is used to classify image patches into three classes: sky, cloud, and thick cloud.Bernecker et al. (2013) proposed to utilize information of image patch.However, they used fixed-size patches for training and classification without considering multi-resolution information.Patches with sizes that are too large would include features from both sky and clouds.In contrast, patches with sizes that are too small might not include enough information to represent the appearance of the clouds.
In this paper, we propose to perform cloud detection via extracting features from local image patches with various sizes.Patches of different sizes extract information at different levels of resolution.For classification, we utilize multiple supervised learning techniques.We regard the cloud detection problem as a two-class classification problem.In other words, we classify each pixel in the image as cloud or non-cloud.The cloud models are learned through the training process.We consider classifiers including support vector machine (SVM), random forest, and Bayesian classifier.To extract features from each pixel, we calculate the RBR as well as the color components of various color models including RGB, HSV (hue, saturation, value), and YCbCr.To take advantage of the clues provided by multiple classifiers and multi-level resolution, we employ a scheme to combine multiple classification results to further increase the cloud detection accuracy.The methodology, including the features and the classifiers, is elaborated in Sect. 2. In Sect.3, the proposed system framework is validated using a set of experimental images with manually labeled ground truth.The experimental results using different classifiers are demonstrated and discussed.Finally, conclusions are made in Sect. 4.

Methodology
The proposed system framework is illustrated in Fig. 1.For each all-sky image, Hough line transform is performed first to detect the vertical line of the sun, which is caused by the CCD device when capturing all-sky images.The pixels on this line often has bright intensities and could be confused as cloud pixels.After detecting and eliminating the vertical line of the sun, the rest of the pixels in the image are classified as cloud or non-cloud.The input images are RGB color images.For each all-sky image, the color components in various color space are computed.The color models considered in this work include RGB, HSV, and YCbCr.In addition to the abovementioned color components, the RBR of each pixel is also calculated and considered as a feature.To perform pixelwise classification, all the color components and the RBR of the local image patches around a pixel are collected and concatenated as a feature vector for the pixel.Training samples are obtained from manually labeled ground truth images.

Hough line transform and sun position detection
Hough transform (Shapiro and George, 2001) is used to detect the vertical line of sun in an all-sky image.The procedure of detecting lines can be regarded as finding the coefficients of the line equations using a voting mechanism.The procedure of detecting lines via voting in the parameter space can be achieved by dividing the parameter space into grids.Because all the pixels satisfying a certain line equation would vote to the same grid, a high vote would appear in the corresponding grid in the parameter space.Hough transform re-parameterizes the line equation as x cos θ + y sin θ = ρ to avoid using the slope parameter for line equation y = mx + b.Because possible values for the slope parameter m range from minus infinity to infinity, it would be infeasible to find the slope parameter m via grid search.After reparameterizing the line equation, the range of the parameter ρ can be set according to the width and height of the image.The range of the parameter θ is from −180 • to 180 • .Figure 2 displays an example of Hough line detection on an image.After detecting the vertical line, the sun position is determined by accumulating the intensities of the pixels along x direction in a window with width w 1 .The position with the highest accumulated intensity is the center of the sun.The pixels in the line window with a fixed width w 2 are eliminated from the image.The pixels within the sun position and the line window with width w 2 are determined as non-cloud pixels and do not have to go through the subsequent classification steps.The values of w 1 and w 2 are determined depending on the size of the all-sky images.In our experiments, we set w 1 and w 2 as 60 and 12 pixels, respectively.

Color models
RGB is a very common color model, being used in most computer systems.It is an additive color model based on trichromatic theory.RGB is easy to implement.However, it is nonlinear with visual perception, and the specification of colors is semi-intuitive.HSV is a color model that describes colors in terms of hue, saturation, and value components (Gonzalez and Woods, 2002).Hue is expressed as a number from 0 to 360 • .The hue component of red starts at 0, green starts at 120, and blue starts at 240.Saturation is the amount of gray in the color.And the value component describes the brightness or intensity of the color.YCbCr is a color space used in video and digital photography systems.Y is the luminance component, and Cb and Cr are the blue-difference and reddifference chroma components.HSV and YCbCr color components can be obtained from RGB color components using color model transformation equations (Gonzalez and Woods, 2002;Poynton, 2003).Although the color models are not independent and including color components from different color models may introduce redundancy in the feature vector, considering various color models still provides the classifier more information that is beneficial to performing classification.

Feature vector construction for local image patches of various sizes
For each pixel, local image patches with various sizes are used to extract features.The size of the image patch at level , where denotes the total number of levels.For each local image patch, the color components and the RBR of all the pixels in the patch are concatenated to form a feature vector.Consequently, the dimension of each feature vector is L i × L i × 10.There are feature vectors constructed for each pixel.

Dimension reduction
We apply principal component analysis (PCA) (Duda et al., 2001) on the feature vectors to reduce their dimensions.
Based on the assumption that the importance of the features lies in the variability of the data, PCA chooses principal components along the directions with the largest variance of the data distribution first.The principal components are a set of new orthogonal bases that can be used to re-express the data in order to reduce the correlation among different variables.Suppose that the original dataset has N Samples samples and each sample has D 1 variables.The data matrix X is established with each sample as a column vector.Therefore the data matrix X has N Samples columns and D 1 rows.If we would like to reduce the feature dimension to D 2 , then we need to select D 2 principal components.PCA constructs a matrix X T X, which is a matrix proportional to the sample covariance matrix of the dataset X.The first D 2 eigenvectors of X T X whose corresponding eigenvalues are largest are chosen as principal components.To determine the desired number of dimensionality D 2 , we check the eigenvalue ratio R eigenvlaue : (1) In Eq. ( 1), λ k denotes the kth eigenvalue of X T X.The first D 2 eigenvectors are preserved so that R eigenvalue is larger than a threshold Thr PCA .The selection of Thr PCA is discussed in the experiments in Sect.3. 2.5 Classifiers

Random forest
Classification and Regression Tree (CART) is a systematic procedure that learns decision trees proposed by Breiman et al. (1984).The splitting rules of the tree include an attribute value test at each node of the tree.Starting from the root node, all training data are used to split the root node.The tree is then built recursively.Considering all the possible splitting rules, CART would construct the tree by selecting the splitting rule that can maximize the impurity drop when a node is added.The impurity measures the condition of mixed class labels at each node.The goal is to make the class labels at each node as "pure" as possible.The splitting process stops when all the samples in a node have the same class label or when the measure of purity at the child nodes cannot be improved compared with its parent node.After a decision tree is built, it might need to be pruned using a cross-validation procedure.The reason for pruning is that some branches of the tree might overfit the training data.In our experiment, we use 10-fold cross validation.Instead of growing a single decision tree, random forest grows an ensemble of trees and lets them vote for the most popular class label.In this work, we adopt random split selection (Dietterich, 2000) to build the ensemble of trees.At each node, the split is selected at random from the K best splits.The features for the split rules are randomly selected.It reduces the correlation between the trees and improves the efficiency of training.

Support vector machine
The SVM learns a set of hyperplanes that maximize the margins between the hyperplanes and the training samples in order to lower the classification error of unknown testing samples.The motivation of SVM is that an ideal decision boundary should have the largest distance to the nearest training sample of all the classes.However, it might be infeasible to separate data samples using linear hyperplanes in practice.Therefore, soft margins and kernel functions are applied in the SVM in practice.We apply SVM with radial basis functions as one of the classifiers in this work.For the details of SVM, please refer to the work by Cristianini and Shawe-Taylor (2000).

Bayesian classifier
Bayesian classifier aims at minimizing the probability of misclassification by classifying a sample x to the class ω k with the largest posterior probability P (ω k |x).Since the posterior probability P (ω k |x) itself is unknown, we need to transform the problem using the probabilities that can be obtained via training samples.Bayesian classifier uses the Bayes' theorem to re-express the posterior probability using In Eq. ( 2), P (ω k ) denotes the prior probability, which is independent of the testing sample.In other words, P (ω k ) states how likely a pixel belongs to cloud or non-cloud regardless of its observed feature vector.It is possible to use meteoro-logical conditions and weather forecast report to determine different prior probabilities P (ω k ) for each day.However, we use the same prior probabilities for both cloud and non-cloud classes for simplicity, and no meteorological information is required to be involved as prior knowledge in our decision process.The class conditional probability P (x|ω k ) in Eq. ( 2) can be learned from the training samples.We use Gaussian distributions (3) to model the class conditional probability P (x|ω k ) for each class.In Eq. ( 3), µ k denotes the mean vector, k denotes the covariance matrix, and p is the number of dimensionality of x and µ k , i.e., x ∈ p and µ k ∈ p .To learn the parameters of Gaussian functions, training samples from each class are used to calculate the sample mean vector µ k and the sample covariance matrix k for the class.The probability of the sample P (x) in Eq. ( 2) does not depend on the class label and can be neglected in the decision process.

Combining results of multiple-level neighborhoods and classifiers
The concept of a multiple expert system is to take advantage of the clues provided by multiple classifiers.Instead of majority voting, we use a different voting scheme to combine the results of multiple-level patches and classifiers.The voting is performed in a multi-scale neighborhood, which is inspired by the works of Lowe (2004) and Bay et al. (2008).
As shown in Fig. 3, considering a 3×3 neighborhood around a target pixel p at level i, its previous level i − 1 and its next level i + 1, voting is performed in the scale space of its 3 × 3 × 3 neighborhood.That is, we consider the classifier results of a target pixel p itself and its eight neighbors in the 3 × 3 region at the current level i, the pixel p and its eight neighbors in the 3 × 3 region at the previous level i − 1, and the pixel p and its eight neighbors in the 3 × 3 region at the next level i +1.For the pixels in level i −1 in Fig. 3a, the size of the local image patch used for feature vector construction is L i−1 × L i−1 in Fig. 3b.Similarly, image patches of size L i ×L i and L i+1 ×L i+1 are used for level i and level i+1, respectively.The voting scheme takes into account the classification results from four classifiers: RBR thresholding, SVM, random forest, and Bayesian classifier.In other words, there are 27 × 4 votes for the pixel at each level.Let V cloud (x Level i ) denotes the number of votes in the neighborhood classified as cloud for pixel x at level i.The decision for a pixel at level i is determined by V cloud (x Level i ) > N v .In other words, if there are more than N v votes in the 3 × 3 × 3 neighborhood of a pixel at level i, the pixel is classified as a cloud pixel at this level.Considering the example illustrated in Fig. 3c, the numbers represent the votes in the 3 × 3 × 3 neighborhood of pixel p at level i. Summing up the numbers in Fig. 3c, we obtain V cloud (x Level i ) = 61.If the threshold N v equals to 57, then pixel p is classified as cloud at level i.For the boundary conditions at level 1 and level , there is no level i − 1 at level 1, and there is no level i + 1 at level .There are 18 × 4 votes for the pixels at these two levels.When performing voting for pixels at level 1 and , as long as the votes for a pixel exceeds threshold N v , the pixel is still classified as cloud as that level in our implementation.
To combine the decision at different levels, the probability P (x ∈ cloud| Num  4) using Bayesian rules of conditional probability.In Eq. ( 4), the term P ( Num

Experimental results
In this work, the device used to capture the all-sky images is the all-sky camera manufactured by the Santa Barbara Instrument Group (SBIG).The field of view is 185 • .The focal length is 1.44 mm.And the focal ratio range is f/1.4-f/16.The resolution of the bitmap images is 640 × 480.We manually marked the ground truth of cloud pixels in 250 images for training and testing.These images are collected from January to June 2014 at the National Central University, Taiwan.With the ground truth labels of the images, we are able to calculate the detection accuracy at pixel level.We adopt 10fold cross validation to calculate the average detection ac- In this work, the RGB thresholding method proposed by Long et al. (2006) will be used as the baseline method for comparison.In Long's work, an RBR threshold is recommended for the whole sky camera and several thresholds are suggested to be used for the total sky imager.Since the desired threshold varies due to different devices and weather conditions, we perform an experiment to test the best threshold for our all-sky camera.Also, to avoid false positive detection at highlighted regions around the sun, we employ an upper bound threshold.Therefore, two thresholds, Thr upper and Thr lower , are used in the experiments.A pixel is classified as cloud if its RBR is higher than Thr lower and lower than Thr upper .We perform experiments on several thresholds to select the best thresholds for our dataset.In Fig. 4, we can observe the trade-off between precision and recall.As the thresholds become stricter, the precision increases and the recall drops.Precision rate and recall rate cannot be used alone to measure the accuracy since precision does not consider false negatives and recall does not consider false positives.Therefore accuracy defined in Eq. ( 5) is used as the conclusive metric to measure the performance.As shown in Fig. 4, we have observed that Thr lower = 0.8 and Thr upper = 0.9 yield the best detection accuracy for our dataset.In the rest of the experiments, we use RBR thresholding with Thr lower = 0.8 and Thr upper = 0.9 as a baseline method for comparison.However, even with the best selected RBR thresholds, the cloud detection result is not satisfying.The thresholds Thr lower = 0.8 and Thr upper = 0.9 might cause some false positives for certain images while causing some false negatives for other images.Therefore, neither raising or lowering the threshold could improve the detection results by thresholding.
To observe classification results of different classifiers, the detection accuracy of different classifiers based on single pixel color information are plotted in Fig. 5. Compared with other classifiers, RBR thresholding with Thr lower = 0.8 and Thr upper = 0.9 has the lowest detection accuracy.Major- ity voting of the four detection methods can yield better accuracy.We also compare with the classification accuracy of using only single RGB color model as the feature vector to validate that adding other color models in the feature vector can yield better classification results.With voting schemes that combine the information from multiple classifiers, the accuracy can be enhanced compared with individual single classifiers.However, utilizing only single pixel color information is not sufficient to give satisfying detection accuracy.Applying features extracted from local image patch is able to further enhance the detection results.
When applying the proposed cloud detection method, we use five levels of local image patches with different sizes, i.e., = 5.The size at each level is L 1 = 5, L 2 = 10, L 3 = 15, L 4 = 20, L 5 = 25.To observe the effect of parameter Thr PCA for dimension reduction at each level, we perform an experiment using the feature vector constructed at each single level with SVM as the classifier for different settings of Thr PCA .The value of Thr PCA is typically between 90 and 99 % and is selected empirically.Typically, the accuracy of classification would increase as the value of Thr PCA goes from 100 % (which means no dimensionality reduction at all) to 99 %.The accuracy of classification would continue increasing until Thr PCA reaches a certain value, which is caused by the benefit of dimensionality reduction.After that, the accuracy of classification would start to decrease due to too much information loss.We plot the cross-validated detection accuracy in Fig. 6.From Fig. 6, we can observe that the detection accuracy at single level using SVM is highest for Thr PCA = 97 % at levels L 1 and L 2 .At levels L 3 , L 4 , and L 5 , the parameter Thr PCA = 95 % yields better results.Therefore, for levels L 1 and L 2 , Thr PCA = 97 % is selected; for levels L 3 , L 4 , and L 5 , Thr PCA = 95 % is selected.
To combine results of multiple-level patches and classifiers, the threshold for voting N v needs to be determined.The detection accuracy of combining the results using different N v settings is plotted in Fig. 7.As shown in Fig. 7, when N v ranges from 50 to 70, the detection accuracy is higher.We select N v = 57 for the proposed method.To test the number of levels required to yield better detection results, we plot the detection accuracy using different number of levels in Fig. 8.Note that for the sixth level and seventh level, the size of the local image patch is L 6 = 30 and L 7 = 35.We can observe that using four or five levels results in better detection accuracy.When involved with levels with image sizes that are too large, the detection accuracy drops.
Selected cloud detection results are shown in Fig. 9b.The proposed method using features from multi-scale local image patches can accurately detect clouds in the all-sky images.The pixels within the vertical line and the solar disk would not be detected as clouds even though their intensities are high.The Hough line detection and sun position detection successfully eliminated those pixels before performing classification.Compared with detection results of RBR 0.8-0.9 in Fig. 9c, the proposed method can detect cloud pixels with satisfying accuracy with the proposed multi-level local patch feature extraction mechanism and combination of multiple expert decision.
To summarize the detection accuracy, the detection accuracy of various methods are plotted in Fig. 10.We compare the proposed method with ANN (Roy et al., 2001) and HYTA (Li et al., 2011).ANN utilized a feed-forward backpropagation neural network to perform detection.HYTA em-  ploys dynamic thresholding based on MCE when necessary.The ANN and HYTA methods outperform traditional RBR thresholding.Nevertheless, the accuracy of ANN and HYTA still has room for improvement.Using the single pixel color components described in Sect.2.2 and utilizing SVM as the classifier can yield slightly improved accuracy compared with ANN and HYTA.Incorporating feature vector extracted from single level 15 × 15 neighborhood patch can further improve the accuracy compared with using only information from single pixel.The proposed method utilizing features extracted from multi-level neighborhood yields the best accuracy since multiscale information is considered.

Conclusions
With the development of all-sky cameras, the cloud conditions in the sky can be monitored and useful information can be extracted for solar irradiance prediction with refined spatial and temporal resolutions.Clouds play a critical role in affecting the amount of solar irradiance penetrating the atmosphere.With more accurate cloud detection schemes, subsequent prediction modules that forecast solar irradiance could benefit a lot from the enhanced detection results.In this work, supervised learning methods are utilized to train various classifiers that can distinguish cloud pixels from noncloud pixels in all-sky images.The classifiers implemented in this work include RBR thresholding, SVM, random forest, and Bayesian classifier.We propose to use features extracted from multi-level local image patches with different sizes to include local structure and multi-resolution information.Final decision is made according to multi-level classification results by various classifiers.A challenging dataset with ground truth labels is used to validate the detection schemes.Experiments have also shown that the proposed detection method yields better results than both fixed and dynamic RBR thresholding.Combining the information of multiple classifiers using voting can improve the detection accuracy.It is also validated that using color information in multi-level local neighborhood instead of only a single pixel is very helpful to improve the detection accuracy.To apply the proposed method on different all-sky cameras, images captured by various cameras can be added into the training set to enhance the robustness of the detector.For the selection of parameters Thr PCA and N v for different devices and sites, if the number of levels and feature length are fixed, the desired parameters should not be seriously affected even if the training samples are changed.

Figure 2 .
Figure 2. Hough line transform and sun position detection.

i=1∼(x
Level i ∈ cloud)) is computed.The probability P (x ∈ cloud| Num i=1∼ (x Level i ∈ cloud)) states the probability of pixel x belonging to cloud given the number of levels that the pixel is determined as cloud.Suppose Num i=1∼ (x Level i ∈ cloud) denotes the number of levels at which pixel x is determined as cloud among all levels i = 1 to .If Num i=1∼ (x Level i ∈ cloud) is 0, it means that the pixel is not classified as clouds in any level.If Num i=1∼ (x Level i ∈ cloud) is , it means the pixel is classified as clouds in all levels.If P (x ∈ cloud| Num i=1∼ (x Level i ∈ cloud)) is larger than P (x ∈ noncloud| Num i=1∼ (x Level i ∈ cloud)), the final decision would classify the pixel to be a cloud pixel.The probabilityP (x ∈ cloud| Num i=1∼ (x Level i ∈ cloud)) can be expressed as Eq. (

i=1∼(x
Level i ∈ cloud)) is independent of class label and would not affect the decision.The prior probabilities P (x ∈ cloud) and P (x ∈ noncloud) are assumed to be equal as stated in Sect.2.5.3.The likelihood term P ( Num i=1∼ (x Level i ∈ cloud)|x ∈ cloud) is learned from the training dataset by constructing the normalized histogram of Num i=1∼ (x Level i ∈ cloud) using all ground truth cloud pixels.

Figure 3 .
Figure 3. Voting in the scale space of a 3 × 3 × 3 neighborhood: (a) structure of the scale space neighborhood; (b) size of the local image patch at different levels; (c) number of votes in the scale space neighborhood.

Figure 5 .
Figure 5. Comparisons of detection accuracy using different classifiers with single pixel color information.

Figure 6 .
Figure 6.Detection accuracy with different Thr PCA settings in Eq. (5) at each level using SVM.

Figure 7 .
Figure 7. Detection accuracy with different N v settings.

Figure 8 .
Figure 8.Detection accuracy using different number of levels.