Introduction
Understanding cloud physical mechanisms is essential for understanding
climate and meteorological processes. On climate scales, it is recognized
that clouds are a major source of incertitude in atmospheric models
, whether for the energy balance or water cycle. Yet, many
aspects of cloud's life cycle are still not understood by the scientific
community , hence the need for measurement tools
allowing cloud monitoring, particularly in a Lagrangian sense.
At present, the instruments most frequently used for remote sensing of clouds
from the ground are ceilometers, lidars and cloud radars. Ceilometers and
lidars estimate the base height and thickness of several cloud layers. Cloud
radars have the same capacities, but are also able to obtain information on
the nature of the condensed elements in the cloud (crystals, droplets), as
well as their vertical velocities. These ground-based remote sensing
instruments are generally oriented towards the zenith and have a narrow field
of view. Cloud radars rotate to reconstruct the cloud field; however, the
minimum period to complete 360∘ sweep is a limiting factor for
following a cloud field in real time .
Stereophotogrammetry based on a network of all-sky cameras yields
three-dimensional information by matching points across stereo images and
using triangulation. These techniques provide an inexpensive method to study
the three-dimensional organization of a cloud field. The use of all-sky
cameras makes it possible to widen the field of view.
Stereophotogrammetry applications for use in meteorology have existed since
the beginning of analogue photography , and more
recently digital cameras have been used . In the recent years,
several technological advances have been made in camera lenses, image
resolution, network communications, computational power and cost reduction.
Moreover, major computational improvements have been made in computer vision
algorithms, especially in multi-vision reconstruction methods (e.g.,
OpenCV library – ). It is now possible to achieve
cloud automatic 3-D reconstruction by stereophotogrammetry relatively
cheaply.
Recent studies on this topic generally use conventional or wide-angle lenses
to calculate macroscopic characteristics of a cloud field, such as cloud base
heights and cloud layer horizontal velocities. uses a pair of
wide-angle cameras spaced 800 m apart and pointing to the zenith to
calculate the height of the cloud base. The orientation of the cameras is
done using the stars. The errors obtained are about 5 % for
mid-altitude clouds at 4000 m a.s.l. use conventional cameras
spaced 1.5 km apart and oriented to mountains to study the
three-dimensional organization of orographic convection. The orientation of
the cameras is achieved using elements of the landscape. are
interested in the height of maritime clouds with cameras spaced about
900 m apart. The cameras are oriented towards the horizon. They obtained
an error in cloud base height of 2 % for low-layer clouds and 8 %
for cirrocumulus by comparison with lidar measurements. They also calculate a
horizontal velocity field that they compare to the data from a radiosonde. In
their case, the orientation of the cameras is achieved by using the position
of the sun and the horizon line. In all these previous publications,
triangulation is based on the matching of corresponding pixels through the
stereo images by manual or automatic methods. In , the cloud
ceiling information for VFRs (visual flight rules) is calculated by
matching a zenith-centered sub-part of the initial stereo images. The authors use low-cost consumer
cameras that are oriented towards zenith and spaced about 30 m apart. The
orientation of the cameras is achieved using the stars. For clouds under
1500 m a.g.l., which are of prime interest for VFR applications, results
at zenith point show good agreement with lidar measurements in single cloud
layer situations.
The first study using all-sky cameras in stereophotogrammetry for
meteorological purposes is performed by to calculate the cloud
base height, but temporal synchronization constraints did not allow for usable information to be obtained. More recently, in order to forecast intra-hour solar
irradiance, use their own high-resolution
all-sky cameras, providing very precise equisolid projection. The cameras are
spaced 1230 m apart and the authors use the position of the solar disk to
determine orientation. Clouds are filtered in the images with saturation
value and cloud base height is determined by plane-sweeping across the stereo
images. The results are compared to ceilometer with 8 h time series.
Residual mean square deviation of 7 % for cloud base height at
5000 m a.s.l. is obtained. Three-dimensional reconstruction is also
performed and height distribution of triangulated pixels is compared to
ceilometer time series showing good agreement. Recently,
performed a dense 3-D reconstruction from a pair of fisheye lens HD cameras
spaced 300 m apart. The relative orientation of the cameras is estimated
using the positions of the stars. This estimation is then refined by an
algorithm which automatically matches corresponding stereo pixels. The method
is validated by comparison with the data of a ceilometer, a lidar and a cloud
radar for a cloud layer of altocumulus stratiformis at about 3000 m. The
results show cloud base height relative errors less than 5 %. The
method is then applied to enable a 3-D reconstruction of a developing cumulus
mediocris.
In this paper, we use all-sky stereophotogrammetry to perform geolocation of
individual elements of a cloud field in order to follow individual clouds in
a Lagrangian way, estimate their morphological characteristics and their
evolution in real time. Furthermore, this allows the use of cloud geolocation for
cloud airborne measurements. For example, in the case of instrumented unmanned aerial vehicles (UAVs),
the GPS coordinates of the target cloud may be communicated in real time to
the autopilot. In addition, installation of a camera network for a
measurement campaign poses additional challenges. Indeed, it may be
difficult, time-consuming, or sometimes impossible to use landscape elements,
or the position of the stars. Therefore the methodology, developed in
Sect. , is based on the principles of simplicity of
implementation, auto-calibration, and portability.
Stereophotogrammetry is based on triangulation: knowing the distance between
two cameras, their orientation and the angles of incidence of the optical
rays emitted by a physical point, it is possible to find the 3-D coordinates
of the physical point in a given frame. Thus, several indispensable steps are
needed. The calibration of each camera encompasses a mathematical description
of the projection of an incident optical ray onto the image. This step is
carried out in a laboratory using a test pattern. In our case, we use a
generic method to perform all-sky camera calibration developed by
. The calibration of the stereo system consists in knowing
the distance between the cameras and the relative orientation of each camera.
This step is performed once the cameras are installed on the experimental
field. In our methodology, positioning and orientation are achieved with GPS,
leveling instruments and visual sight, with no obstacles between the two
cameras. Precise relative orientation between the cameras is determined by
matching feature points across the stereo images automatically. This is
achieved with the SIFT algorithm . The 3-D reconstruction step
consists in finding for each pixel of the left stereo image, its
correspondent in the right stereo image. Three-dimensional information is
then calculated by triangulation, involving previously calculated camera
internal geometry and orientation. In this work, a dense 3-D reconstruction
is performed by using a block-matching method on rectified
stereo images (undistortion and misalignment corrections). Additionally, the
velocity field is estimated by tracking subparts of the initial image through
two successive images and combine this information with the cloud height map.
In the case where clouds are sufficiently separated to be considered as
identifiable objects, we implement image segmentation for individual cloud
georeferencing. We use a color filter to extract the cloud contours of the
image and use a segmentation algorithm inspired by to identify
cloud objects. Most of the methodology relies on algorithms implemented in
open-source software libraries: OcamCalib for
camera calibration, and OpenCV for the other steps.
The accuracy depends on the quality of the cameras and the algorithms used,
as well as on the distance between the cameras, with the following paradox:
the greater the distance between the cameras, the better the accuracy but the
more difficult pixel matching is.
In Sect. , we present the results comparing cloud base
heights to traditional methods as well as georeferencing individual cloud
elements and calculating the velocity field. The method is applied with
cameras spaced 150 m apart, for two validation cases. The first validation
case is carried out in the context of a moderately convective situation with
isolated cumulus clouds with a cloud base height at 1500 m a.g.l. The
second validation case is carried out in a situation where two cloud layers
overlap: a layer of altocumulus stratiformis with a base height of
2300 m a.g.l. and a layer of cumulus fractus with a base height at
1000 m a.g.l. The height uncertainty is estimated by comparison with a
Vaisala CL31 ceilometer located on the site. The uncertainty on the
horizontal coordinates is theoretically quantified by using the experimental
uncertainties on the height and uncertainties on the orientation of the
cameras. In the cumulus case, a segmentation of the image as well as an
estimation of the horizontal positions of the cloud centers is carried out.
The results are then discussed in Sect. .
Material and methods
Material
In this work, we use two VIVOTEK FE8391-V network fisheye cameras designed
for outdoor video surveillance applications (see Fig. ).
The focal length is 1.5 mm and the field of view is 180∘. The
digital sensor is a 12-megapixel CMOS, providing in its full resolution a
2944 px × 2944 px image. The images are transmitted to a computer by a
WiFi local network using two directional antennas (TP-Link 2.4 GHz 24 dBi) with
several hundred meters of range. Horizontal leveling is achieved by the use
of a bubble level (accuracy ca. 1∘). The respective positions of both
cameras in the Earth frame are evaluated by using GPS, and inter-camera
alignment is achieved with vertical sights on the camera housing.
(a) VIVOTEK FE8391-V fisheye camera and installation
structure. (b) Vertical sights on the camera housing allow visual
inter-camera alignment in the horizontal plane.
Camera projection model, calibration and image undistortion
In the camera frame, the projection of an optical ray towards a pixel of the
image is generally described by a model which depends on intrinsic camera parameters. The camera optical system approaches more or less
precisely different types of projections among which the most commonly
encountered are the stereographic, equidistant, equisolid, and orthographic
projections. In the case of non-scientific cameras, these simple theoretical
models are far from sufficient. It is then necessary to use models allowing a
better description of the projection by taking imperfections into account
(e.g., distortions, offset between optical axis and center of the image,
digitization effects). In this article, we use the model proposed by
to calibrate the cameras. This model was introduced to
generically simulate omnidirectional cameras with the property of the single
point of view (property generally well approached by a fisheye lens). The
intrinsic parameters associated with this model are determined by a
calibration step. This calibration is carried out by taking several shots of
a flat 2-D chessboard pattern. This flexible technique inspired by
is adapted in the toolbox OcamCalib for the Scaramuzza
model. One of the advantages of this calibration method is its ease of
implementation and its accuracy e.g.,for a comparative benchmark
between several calibration methods for omnidirectional cameras.
Pinhole projection. Physical point M is projected on (u,v) on
the image plane (Ω,U,V). Camera coordinate system is
defined by axis X and Y, which are colinear to U and
V, and by axis Z, which is the optical axis.
Principal point (u0,v0) is the projection of the optical center O on the image. The radial projected distance on the image is
denoted r′.
Pinhole camera model
The starting point is to consider the simplest camera, that is, the
pinhole camera. It is a box that allows the light rays to pass
through a small hole pierced on one side. On the opposite side of the hole,
the inverted scene is projected onto a plate. In order to simplify the way in
which this projection is represented, a central symmetry is applied to have a
situation in which the image plane and the scene are of the same side with
respect to the optical center (Fig. ). The rectangular
image plane has an orthonormal coordinate system (Ω,U,V), where U is the horizontal axis of the image and V
the vertical axis of the image. The origin Ω is located at the upper
left corner of the image. The camera reference frame is defined by the
orthonormal frame (O,X,Y,Z), where Z
corresponds to the optical axis directed to the observed scene and X
and Y correspond to the U and V axes of the image.
The point of intersection of the optical axis with the image is called
principal point. It does not necessarily coincide with the center of
the image, which is especially the case for non-scientific cameras. In this
configuration, if (u′, v′) denotes the centered coordinates of a pixel
with respect to the principal point, the projection of a physical point M(x,y,z) is given by the following equation:
u′,v′=ftan(ϕ)x/r,ftan(ϕ)y/r,
where r=x2+y2 denotes the distance from the physical point to the
optical axis and ϕ=arctan(r/z) denotes the angle of incidence
of the optical ray. The parameter f is the pinhole camera focal length
(expressed in pixels in the case of a digital camera). Thus, if (u,v)
denotes the pixel associated with the M(x,y,z) point in the frame of the
image, the projection is defined by
(u,v,1)T=10u001v0001(u′,v′,1)T,
where (u0,v0) contains the coordinates of the principal point. We
denote Gperspectivef,u0,v0 the projection function of
parameters {f,u0,v0} which maps a physical point M(x,y,z) to a
pixel (u,v). The reciprocal projection is denoted by
Gperspective-1f,u0,v0. It maps an optical ray
{λ(x,y,1),λ∈R} to a pixel (u,v).
Radial distortion modelization in for
omnidirectional cameras. Incident angle ϕ and projected radial distance
r′ are related by tanϕ=-r′/p(r′). The polynomial function p is
represented by the red curve. The case where there is no distortion (i.e.,
pinhole projection r′=ftanϕ) corresponds to a constant polynomial
function p=a0 represented by the green line.
Omnidirectional Scaramuzza model
Under the axisymmetric assumption, and if r′ denotes the distance between
(u,v) and the principal point, Eq. () can be
generalized to
u′,v′=r′(ϕ)x/r,r′(ϕ)y/r.
The distance r′ in pixels depends on ϕ and characterizes the radial
distortions. These distortions are preponderant in a fisheye lens. This is
the reason why the function r′(ϕ) is called representation function of the fisheye lens. In the Scaramuzza model, this function is
implicitly defined by the relation tanϕ=-r′/p(r′), where p(r′) is a
polynomial function p(r′)=a0+a1r′+…+anr′n (Fig. 3). The tangential
distortions are taken into account linearly by an additional correction step
(parameters c, d and e). Thus, if (u,v) denotes the pixel associated
with the (x,y,z) point in the frame of the image, the projection is
defined by
(u,v,1)T=M(u′,v′,1)T=1eu0dcv0001(u′,v′,1)T.
We denote GfisheyeM,a0,…,an the fisheye
projection function of intrinsic parameters {M,a0,…,an}.
Calibration procedure by multiple views of the same chessboard. The
procedure is automatized by using an algorithmic corner detection. The camera
projection function is estimated with the OcamCalib toolbox
following modelization.
Camera calibration method
The camera calibration determines the camera intrinsic parameters
{M, a0,…,an}. To do this, we use N shots of a
chessboard with K1×K2 corners (intersections between black and
white tiles – Fig. ). We denote by
Rchessboard a coordinate system such that the origin is
located on one of these corners, and that the horizontal axes coincide with
the chessboard lines. For each shot i, and for each chessboard corner
xj,yj,0RchessboardT, we have the relation
(uij,vij)T=GfisheyeM,a0,…,anRixj,yj,0RchessboardT+Tii=1,…,Nj=1,…,K1×K2,
where (uij,vij) denotes corners positions on the image,
Ri the rotation from the camera frame to
Rchessboard and Ti the translation between the
optical center of the camera and the origin of
Rchessboard. The calibration is based on the following
steps using the toolbox OcamCalib:
For each shot i, corners are automatically detected in the image using
the intensity gradient specific signal and the pattern of the board
(Fig. bottom). This process gives (uij,vij)
values.
In the nonlinear system (Eq. ), the values of
(uij,vij) and (xj,yj) are known and the system is
overdetermined for sufficiently large values of N and K1×K2.
Parameters M,a0,…,an,{Ri,Ti,∀i=1…N} are determined by using a Marquardt–Levenberg method.
Undistortion
In order to produce undistorted images, the scene is reprojected according to
a conventional centered perspective projection of focal length f. During
this reprojection, we move from a circular fisheye image to a square image of
size Npx×Npx. The intensity of each pixel of the
undistorted image is calculated according to the relationship
RGBundistorteduundistorted,vundistorted=RGBfisheyeGfisheyeM,a0,…,an⋅Gperspective-1f,Npx/2,Npx/2uundistorted,vundistorted).
In this transformation, the peripheral areas are mapped from a given region
of the fisheye image onto a larger projection area in the rectified image,
producing the blur effect. Note that the values of f and Npx
can be freely chosen. The field of view of undistorted images
FOVundistorted=2arctanNpx2f will
depend on these values. The smaller the value of f, the larger the field of
view, but the more interpolated areas occupy an important part of the image.
Ideal camera configuration. Camera coordinate systems are frontally
aligned with optical axis z1,2 oriented towards zenith. Optical
centers O1,2 are in the same altitude plane. The baseline distance is
denoted b and North bearing of O1O2 axis is denoted β. In this
ideal configuration, assuming that we have identical pinhole centered
cameras, corresponding pixels u1M,v1M and u2M,v2M are
row-aligned on the imagers i.e.,v1M=v2M.
Orientation, stereo calibration and
rectification
At the end of the previous step, we are able to produce two undistorted
stereo images. They are square images of the same size Npx×Npx, for which the center of the image and the principal point
coincide, and which would have been taken by two pinhole cameras with the
same focal length f. The next step consists in orienting them with respect
to each other as accurately as possible. To achieve this, use
landscape features and use the horizon line. and
use the positions of the stars that allows for determining the
orientation of each camera. In addition, they add an algorithmic correction
step based on SIFT stereo-pixel-matching algorithm . In our work,
we develop a visual orientation method assuming that there is no visual
obstacle between the two cameras. Like , this initial
orientation is refined by an algorithmic step.
Orientation and stereo calibration
The cameras optical axis are oriented towards the zenith. The image planes
are at the same altitude, and the horizontal axes of the undistorted images
are aligned. This theoretical orientation of the all-sky stereo system is
called frontally aligned (Fig. ). From the GPS
coordinates, the distance b=O1O2 between the cameras and the angle of
deviation from the North β=NO1O2^ are calculated with
Haversine formulas. Initial orientation of the cameras is previously
described in Fig. and gives an orientation of the cameras
close to the ideal frontally aligned orientation. However, this procedure is
not sufficient to perform an accurate 3-D reconstruction which needs row
alignment of corresponding stereo pixels in the stereo images (see
Fig. ). A refining algorithmic step to calculate the
precise relative orientation of the cameras and consequently rectify the
stereo images is then required. This procedure is usually referred to as
stereo calibration and consists of calculating the components of the
relative rotation R and the relative translation T=O1O2
between camera frames such as
(x,y,z)R1T=R(x,y,z)R2T+T.
Stereo calibration is based on the concepts and theorems of epipolar geometry. In particular, in the case of pinhole cameras with the same focal
length f, it exists a constant matrix 3×3 of rank 2 denoted
E and called essential matrix. This matrix only depends
on R and T and verifies the following constraint:
u′2M,v′2M,1Eu′1M,v′1M,1T=0,
for all pixels u1M,v1M from the left stereo image, and
u2M,v2M from the right stereo image representing the same
physical point M. We use the following stereo calibration methodology:
From the undistorted stereo images, retrieve a set of stereo
matching pixels with the SIFT algorithm .
Using the pairings of step 1, solve the overdetermined system
(Eq. ) whose unknowns are the coefficients of the matrix
E. We use a least median of squares (LMEDS) regression, which
avoids being affected by outliers. The matrix E is determined to
within a scalar factor.
Calculate R and T. For this purpose,
the following equations are used:[T×]2=-EETwith[T×]=0-TzTyTz0-Tx-TyTx0,
which give two opposite solutions T+ and T- and
R=1‖T‖2E2×E3E3×E1E1×E2±[T×]E,
where Ek denotes the kth column of the E matrix. The
uniqueness of the solution is obtained by requiring the scene to be located
in front of the cameras as well as the constraint ‖T‖=b.
Corrective rotations R1 and R2 are defined by
using R and T such that
R1=Rrect-1andR2=Rrect-1R,
where Rrect=e1,e2,e3 is
a rotation matrix such as e1 is oriented in the same direction of
T, and e2 is orthogonal to e1 and to the left
camera optical axis.
Consistency step: initial visual orientation of the cameras is achieved
to be as close as possible to the frontally aligned relative orientation
(i.e., R≃Id and T≃(b,0,0)T;
see Sect. ). In our algorithm, several estimations of the
essential matrix E, and consequently R and T,
are achieved to avoid incorrect solutions which are due to erroneous or
imprecise matches in the SIFT procedure. These estimations of E
are obtained by using several subsets of the matching pixel set given by the
SIFT procedure. Estimations of E matrix, which are not coherent
with the R≃Id and T≃(b,0,0)T
hypothesis, are then rejected. Among the coherent estimations, we choose the
one that leads to minimal corrective rotations.
Rectification
We use R1 and R2 to produce undistorted rectified
images; that is, the images that would have been produced by perfectly
aligned pinhole cameras. These images are produced from all-sky original
images by the following transformation:
RGBrectifiedCAM1,2urectified,vrectified=RGBfisheyeCAM1,2GfisheyeintrinsicparamsCAM1,2⋅Gperspective-1f,Npx/2,Npx/2⋅R1,2urectified,vrectified,1).
Three-dimensional reconstruction
Three-dimensional reconstruction is obtained by triangulation from two pixels
u1M,v1M and u2M,v2M which are known
to represent the same physical point M. Indeed, knowing the projection
functions of each camera, their relative orientations, and the distance
between the cameras, it is possible to estimate the point of intersection of
the optical rays in a given reference frame. Working directly with the
rectified images make this calculation easier because we have a simple
theoretical standard situation: identical pinhole images in a frontally
aligned orientation (Fig. right). In this case, two
matching pixels are located on the same row in the image matrices
i.e.,v1M=v2M. Then, the coordinates xM,yM,hM in the rectified frame of the left camera are given by
hM=fbu2M-u1M=fbδM,xM=hMu′1Mf,yM=hMv′1Mf,
where δM=u2M-u1M is called disparity and is linearly
related to h through the baseline distance between the cameras b, and the
focal distance f.
In addition, a dense 3-D reconstruction of the observed scene assumes that
one is able to generate a dense matching of corresponding pixels across the
stereo images. This is called the dense stereo matching problem. In
the case of rectified images, this problem is greatly simplified by the fact
that v1M=v2M and thus becomes a one-dimensional problem. In this
case, a very common method is the block-matching algorithm ,
which relies on finding maximum correlations between neighborhoods of pixels
across the stereo images. This algorithm is implemented in the OpenCV library
, and is able to describe finely the variations of altitude.
However, it generates noise/speckles in the weakly textured image part, which is a
disadvantage for the type of objects that we consider (clouds, blue sky
background, sun). To avoid this effect, we use several techniques:
Adjusting algorithm parameters:
Disparity range is limited during the pixel-matching process by
setting minimum and maximum bounds for cloud height detection. Note that
disparity bounds are related to height detection bounds with equation
(Eq. ), even if this relationship becomes less relevant
for larger incident angles for which larger horizontal errors occur.
Window correlation size is adjusted to prevent speckles.
Smoothing the signal by reducing the size of the image while taking
advantage of the subpixel resolution of the algorithm.
Using blue sky filtering: we process the altitude map by filtering
the blue sky areas. We use image conversion in the HSV (hue, saturation, value) color management
system. The hue values ranging from 170 to
280∘ (from cyan to violet) are filtered.
Velocity field
The estimation of the cloud field horizontal velocity is carried out by using
two successive rectified images, It1 and It2, coming from the
same camera. Using cross-correlation techniques, the displacement of the
cloud field from one image to another is evaluated in pixel units. This
displacement on the image is converted into velocity by using the previously
calculated height map. In practice, the initial image is divided into
rectangular blocks Ik1,k2t1 indexed by the subscripts k1,2
(Fig. ). The median of heights hk1,k2 is assigned
to these blocks based on the cloud height map. The translation in number of
pixels of each block through two successive shots is denoted by
Δk1,k2. It is related to the block mean horizontal
velocity vk1,k2x,vk1,k2y by
vk1,k2x=hk1,k2fΔk1,k2uΔt,vk1,k2y=hk1,k2fΔk1,k2vΔt,
where Δt=t1-t2 is the time between two shots. Calculating
Δk1,k2 is to determine the position of a Ik1,k2t1 template in the It2 image. This generic computer vision problem
is called template matching. A method developed by and
based on the normal cross correlation index allows performing this search
with a low algorithmic cost in simple cases (no rotation, no scaling). This
algorithm is available in the OpenCV library .
Multiblock tracking algorithm for cloud field velocity estimation.
For each block Ik1,k2, the velocity vector is computed by using the
displacement vector Δk1,k2 expressed in pixels and the
median altitude hk1,k2. Displacement vector is computed by using the
's matching template algorithm. Computations are based on two
successive rectified images: in our case we use the left rectified image at
times t1 and t2.
Note that the technique used here is similar to that used by ,
which evaluates the displacement of a single block centered on the principal
point through two images. In our case, the approach is multiblock, which
generates dispersion but makes it possible to estimate the velocities of
multiple cloud layers.
Segmentation and cloud identification
Segmentation techniques are used in computer vision problems to identify
objects in an image. In our case, the main interest of this technique is to
identify and georeference individual clouds when the situation allows it. The
method that we present here is a contour-based method involving blue sky
filtering which supposes that the clouds are separated (e.g., cumulus cloud
field) and that they do not overlap on the image due to projection (this
would result in merged contours). Segmentation is achieved with the following
steps:
Production of a binarized image from blue sky filter (Sect. ).
Contour detection and segmentation using the binarized image: we use a
contour finding algorithm implemented in OpenCV library and inspired by .
Filtering non-significant/noisy contours: we eliminate contours with a
low inside area, and with a low number of inner triangulated pixels.
Filtering sun: we use a threshold on altitude to remove the sun.
Each segmented region contains pixels that have been triangulated in the 3-D
reconstruction process. This allows assigning (x,y,z) coordinates for each
triangulated pixel. In order to avoid outliers the center of each segmented
cloud, and the cloud base height is estimated with
xcenter=q5(x¯)+q95(x¯)2,ycenter=q5(y¯)+q95(y¯)2,zcloudbase=q10(z¯)
where x¯, y¯, z¯ corresponds to
coordinates of all triangulated pixels within the segmented region. The
notation qr(x¯) (or y¯ and
z¯) denotes the rth quantile of x values (or y and z)
within the segmented region.
OcamCalib calibration results for cameras 1 and 2. Parameters are
described in Eqs. () and () in
Sect. .
Principal point
Radial distortion parameters
Image size
u0
v0
a0
a1
a2
a3
a4
Camera 1
2944 × 2944
1467.6
1468.0
-980.6
0
3.9853×10-4
-1.0973×10-7
1.0861×10-10
Camera 2
2944 × 2944
1452.5
1452.8
-982.4
0
3.5975×10-4
-2.3627×10-8
6.2340×10-11
Affine distortion parameters
Re-projection errors
c
d
e
RMS
Max
Camera 1
0.9999
3.12×10-4
-7.55×10-4
0.7 px
5.3 px
Camera 2
0.9999
5.68×10-4
-9.44×10-4
1.0 px
7.5 px
Uncertainty estimation
Theoretically, in a frontally aligned pinhole stereo system, the uncertainty
on height σh can be related to the uncertainties on the position of
corresponding pixels (u1,v), (u2,v), given by the sensitivity equation
σh=σu1-u2h2fb=σδh2fb.
where σ|δ|=σu1-u2 represents the
uncertainty on disparity (Sect. ). This equation shows
that uncertainty decreases linearly as the baseline distance b increases,
until a distance where the quality of the stereo-pixel-matching degrades. On
the other hand, σh quadratically increases with increasing heights.
In a practical way, the uncertainty related to the 3-D reconstruction of the
cloud field in the Earth's frame has several components: camera resolution,
intrinsic projection/calibration model, position and orientation of the
cameras/stereo calibration, and pixel matching. We quantify the overall
uncertainty on cloud base height experimentally. In this work, we use a
Vaisala CL31 ceilometer, collocated with the all-sky stereo system, as the
reference instrument. It provides information by measuring the cloud base
height at the zenith and identifies up to three cloud layers. Several
aspects must be identified before comparing ceilometer and all-sky stereo
system results:
There is spatial inter-cloud and intra-cloud variability of the cloud base height.
All-sky stereo system computes heights coming from the base as well as the sides of the clouds.
Ceilometer provides a point value at zenith, while the cameras provide a spatial map of the heights.
All-sky stereo system can recover multiple cloud heights only if it can see them.
Several methodologies can be used to compare all-sky spatial data to
ceilometer temporal data. A comparison of height measurements at zenith when
the picture is taken allows estimating uncertainty on height σh,
although this method is limited because it does not represent the uncertainty
on the peripheral parts of the image. Another way is to compare the height–frequency histograms obtained by the all-sky stereo system (heights
calculated for a scene) with the distribution of the heights obtained by the
ceilometer (centered time series). The distribution peaks represent the
representative height of the cloud base for a given cloud layer. The
thickness associated with these peaks is due to the above-mentioned
uncertainties and cloud base variability. The error is estimated by comparing
the peak positions and the standard deviations of the distributions around
these peaks.
In the Earth frame, the uncertainty on (x,y) position can be deducted from
uncertainties on height σh, polar angle σϕ, and
azimuthal angle σθ. Indeed, in spherical coordinates we have x=ρcosθsinϕ, y=ρsinθsinϕ, h=ρcosϕ. By denoting r=x2+y2=htanϕ, the ground projected
distance, we obtain x=hcosθtanϕ and y=hsinθtanϕ,
such that
σx2=(cosθtanϕ)2σh2+(hsinθtanϕ)2σθ2+(hcosθcos-2ϕ)2σϕ2,σy2=(sinθtanϕ)2σh2+(hcosθtanϕ)2σθ2+(hsinθcos-2ϕ)2σϕ2,σr2=tan2ϕσh2+h2cos-4ϕσϕ2.
The angle uncertainties are mainly related to the orientation of the cameras:
initial orientation (GPS position and visual sighting) and algorithmic
correction in the rectification process (Sect. ). In our
study, the estimation of σh is calculated experimentally as
mentioned above. The corrective rotations provided by the rectification
algorithm in different configurations allows estimating σθ and
σϕ, providing σθ=σϕ=2∘.
(a) Radial projected distance r′ as a function of
incident angle ϕ for VIVOTEK camera 1. Function r′(ϕ) is called
representation function as it characterize the projection. It is
compared to the mostly used fisheye parametric representation functions
set with -a0 value for f. (b) difference in pixels between
representation functions of cameras 1 and 2, as a function of incidence
angle ϕ.
For each view and each chessboard corner (which represents an amount
of 30×48 points), difference between corner position on the image,
and corner position computed by re-projection, using OcamCalib
calibration results.
Discussion and future work
The results obtained under the configuration described in this study are
relevant for macroscopically characterizing a cloud field up to 2500 m
altitude, as well as cloud-targeting applications by instrumented UAVs.
Yet,
for precise measurements – morphological parameters of a cloud (width,
vertical extension and variation over time), and precise geolocation (e.g.,
measurements near the base, top, or edges of the cloud) – the all-sky camera
network must be configured to ensure a certain accuracy.
Segmentation and geolocalization results.
Cloud ID
Estimated cloud base
Position (x,y) of cloud centers
r
σr
height (m a.g.l.) ±10 %
in the left rectified coordinate system
3
1440
(-2.69 km, 1.75 km)
3.21 km
±350 m
5
1670
(2.41 km, 1.55 km)
2.87 km
±290 m
6
1420
(-1.83 km, 1.46 km)
2.34 km
±260 m
7
1450
(-1.80 km, -0.23 km)
1.81 km
±170 m
9
1430
(-0.68 km, -1.00 km)
1.21 km
±120 m
10
1450
(1.35 km, -1.57 km)
2.10 km
±210 m
12
1640
(-0.23 km, -2.89 km)
2.90 km
±290 m
Ceilometer cloud base heights measured during a 30 min time
series: 1420–1450–1530–1350–1560–1550–1630–1620 m a.g.l.
Altocumulus/multilayer case. (a, b) Rectified
image (a) with estimated wind speed and direction (b).
(c, d) Triangulated points projected on x-y left camera plane
with altitude color map (c), and with r-incertitude
color map (d).
(a, b) Undistorted and rectified left image with
associated height map. (c) Contours produced by blue filtering
segmentation on left rectified image. (d) Segmented image with cloud
identification number and estimated position of center of cloud base (red
dots). Altitude filter: 4000 m a.g.l.
In addition to optimizing the baseline distance between the cameras, several
strategies can be explored to improve the accuracy of all-sky camera system.
A first strategy is to work on the robustness of the orientation step.
Relative orientation accuracy between stereo cameras plays an important role
in the image rectification process (Sect. ). Indeed,
relative orientation has an impact on 3-D reconstruction accuracy through
pixel-matching hit score, and uncertainty on disparity, as shown with
Eqs. () and (), and experimentally in
Fig. . Moreover, it is important to ensure that cameras
are correctly oriented in the Earth's frame for accurate geolocalization, as
shown in Eq. ().
In previous studies, the camera orientation is based on identified elements
of the landscape, such as stars, trees, buildings and horizon lines. This
consideration of external elements is adapted to the context of a fixed
installation of a camera system, but becomes less suitable in the context of
a measurement campaign in which the all-sky camera network must be mobile and
rapidly operational. The technique used here to initially orient the camera
network is based on GPS for positioning in the Earth frame, leveling for
horizontal adjustment, and vertical sights on the camera housing for
inter-camera alignment, which is a priori less accurate than using landmarks
or stars to establish the orientation. Improving the initial orientation
accuracy can be accomplished using laser sighting or the use of successive
images of a GPS-equipped balloon or UAV loitering in the field of view of the
cameras. In addition, the relative orientation between camera pairs can be
refined by the stereo calibration algorithm using a time series of several
pairs of images, instead of an instantaneous snapshot of a single pair of
images. In addition, improved accuracy can also be achieved by organizing a
network of several cameras . For example, the
arrangement of the cameras on the ground can be used to increase the number
of triangulations of the same object (e.g., square arrangement with four
cameras). Inter-camera spacing can also be organized to accommodate different
cloud layers (e.g., closely spaced cameras for low clouds and farther apart
for high-altitude clouds).
For dense stereo matching, the block-matching algorithm
yields correct results even in weakly textured areas, provided that smoothing
and filtering techniques are implemented (Sect. ). However,
smoothing step impacts accuracy when reconstructing cloud edges.
Block-matching algorithm is a standard method and it would be useful to carry
out a comparative study of the results given by dense matching methods
developed recently. This field of research is very active and there is a
dedicated benchmark online platform described in
. One of the objectives of a future study would
be to use this benchmark to identify and implement methods capable of
accurately characterizing low-textured cloud zones, as well as edges.
In terms of image segmentation (e.g., identification of individual clouds)
and geolocation, the methods and results presented in this article provide an
overview of computer vision techniques to estimate individual cloud positions
and their characteristics in a shallow cumulus cloud field. Segmentation
based on contour detection of neighboring pixels makes it possible to isolate
individual clouds. The cloud segmentation approach used in this study works
well for distinguishable clouds on the image, but its performance is less
reliable if this is not the case. The cloud segmentation method can be
refined by taking into account the altitude map for more complex cloud fields
where different clouds overlap on the image (e.g., multiple cloud layers,
higher cloudiness, or deep convection). We see in Fig. that
the reconstruction algorithm determines low cumulus fractus edges within
overlapping higher cloud layer. For a stratiform cloud layer with high
cloudiness and less contrast, the segmentation approach would be modified to
discern macroscopic differences in the cloud structure. Nonetheless, as
mentioned in the previous paragraph, reconstructing accurate edges in
situations where low-textured objects overlap remains a challenging task in
the computer vision field. The uncertainty with respect to geo-localization
of an individual cloud center position is directly related to uncertainty
estimation on height (Sect. ).
Finally, the use of photogrammetry techniques associated with segmentation
opens the way to the characterization of other parameters of interest in
atmospheric science, such as the width of the cloud base and the vertical
extension of the cloud. The width of cloud base follows its growth and
dissipation, and can be well estimated at low zenith angles. In contrast,
extracting cloud vertical dimensions can be achieved at large zenith angles
as long as the cloud tops are not hidden in the projection .
Consequently, segmentation makes it possible to track individual clouds
through successive images and follow the evolution of the cloud life cycle by
tracking cloud heights and/or cloud base widths.