
Citation: | Mengjie Ye, Zesheng Tang. Registration of Correspondent Points in the Stereo-Pairs of Chang’E-1 Lunar Mission Using SIFT Algorithm. Journal of Earth Science, 2013, 24(3): 371-381. doi: 10.1007/s12583-013-0341-2 |
Linear array CCD sensors are widely used to acquire panchromatic and multispectral images in push-broom mode for photogrammetric and remote sensing applications. Chang'E-1 is the first lunar mission in China, which was successfully launched on Oct. 24, 2007. The CCD stereo camera of Chang'E-1 satellite utilizes the three-line-scanner principle to capture digital image triplets in push-broom mode, which can simultaneously producehigh-resolution (120 m per-pixel) images with three viewing directions (nadir, forward and backward) (Fig. 1) (Li et al., 2010).
In order to get highly precise attitude data through stereo-pairs with different viewpoints, it needs to know the location and orientation of Chang'E-1 stereo camera as well as the correspondence points in the stereo-pairs of images. So the key step to successful and reliable extract the lunar digital elevation model extraction is the registration of Chang'E-1 CCD images.
Registration strategy should take account of the characteristics provided by Chang'E-1 three-line-scanner system. Unlike the traditional frame-based aerial photos, where all pixels in the image are exposed simultaneously, each line of Chang'E-1 CCD images is collected in a push-broom mode at a different instant of time with different location and orientation of the camera. Therefore, the perspective geometry is only valid for each line, whereas it is close to a central projection in along-track direction. Meanwhile, since the dynamical mode of image acquisition, there is in principle a different set of (time-dependent) six exterior orientation parameters for each scanning line. Registration algorithms typically assume that the correlation windows from each image are related by a simple translation, but this assumption is invalid for Chang'E-1 CCD images (Ye et al., 2011). The affine deformations caused by the change in viewpoint should be taken into account.
In recent years, many registration algorithms have been proposed. These algorithms can be classified into two main approaches: area-based methods and feature-based methods. Area-based methods, sometimes called intensity-based methods or template matching, compare intensity patterns in images via correlation metric and match the entire images or subimages. These methods are sensitive to variations in intensity, contrast, illumination and image noise. A feature-based method is based on the extraction of invariant features in the images. Significant regions, lines or points (region corners, line intersections, points on curves with high curvature) are generally selected as features. The advantage of feature-based methods is typically fast since it extracts the salient features of the images, greatly reduced the quantity of data to be handled, and it is not sensitive to image intensity variation (Zitová and Flusser, 2003; Brown, 1992).
Registration approaches for three-line scanner imagery typically use a coarse-to-fine hierarchical solution (Zhang and Gruen, 2006; Wiedemann et al., 1996). Feature extraction is performed on the highest level of the image pyramid. For each feature point of the reference image, the corresponding point of the target image is ascertained by calculating the correlation between the image windows in the surrounding of the two feature points. If the correlation coefficient lies above a certain threshold, the two points are considered as correspondence points. Generally, the accuracy is improved by means of least squares matching and region growing (Otto and Chau, 1989).
So feature extraction is a basic step for image matching and image analysis. Local feature extraction can be divided into two steps: extraction of interest points and computation of descriptors. Traditional interest pointsare commonly named corner which can be defined as the intersection of two edges. An earlier approach to corner detection proposed by Kitchen and Rosenfeld (1982) is to measure "cornertiy" values in gray level images, so that corners can be detected by thresholding these values without prior segmentation. Another kinds of corner detection based on the image intensity are SUSAN operator, Moravec operator and Harris operator (Smith and Brady, 1997; Harris and Stephens, 1988; Moravec, 1981). The Moravec corner detection considers a square window in the image and determines the average change of intensity resulting from shifting the window by a small amount in the 8 principle directions (vertical, horizontal and diagonal). This operation is repeated for each pixel position which is assigned an interest value equal to the minimum change measured by taking the sum of squared differences (SSD) between the two patches produced by these shifts. The position of the local maximum of these interest values is then chosen as a feature point. The Moravec operator is not rotationally invariant as the intensity variation is only calculated at a discrete set of shifts (namely, in the eight principle directions). To remove this limitation, Harris and Stephens improved upon the Moravec operator by computing intensity variation in any direction which can be approximated by a Taylor expansion and using Gaussian window instead of square window. However, it has poor localization on many featuresand is very sensitive to changes in image scale.
Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. As Witkin (1984) introduced the concept, the scale-space representation of an image at different scales is by representing an image as a one-parameter family of Gaussian kernels. Later, it has been shown by Lindeberg (1994), Babaud et al. (1986), and Koenderink (1984) that the only possible linear scale-space kernel is the Gaussian function. Based on the scale-space theory, Lowe (1999) proposed SIFT (scale invariant feature transform) algorithm to achieve scale invariance. This method provided more distinctive features while being less sensitive to viewpoint change. In 2004, Lowe presented a number of improvements in stability and feature invarianceto image scale, rotation, addition of noise, and change of illumination (Lowe, 2004).
All the methods mentioned above have advantages and disadvantages. An appropriate registration strategy should make use of all available and explicit knowledge concerning the sensor model, image format, geometry constraints and practical precision requirement.
In this article, an appropriate registration strategy combined of area-based method and feature-based method has been introduced. Firstly, one subimage was extracted from nadir image as reference image. The size of the subimage can be controlled by user, whereas the experiments in this article use the size of 1 024×512. Using area-based method, another subimage which is called target image can be obtained from backward or forward image overlapping the same region of lunar surface with reference image. Secondly, feature points of each subimage can be extracted by SIFT algorithm. Lastly, for each feature point given in reference image, the position of correspondence in target image can be estimated by making use of the orbital altitude of Chang'E-1. In contrast to standard SIFT matching algorithm, the method proposed in this article can narrow the search space and accelerate the speed of computation while achieving reduction of the percentage of false matches.
Feature points extracted from SIFT algorithm were invariant to image scale and rotation. Meanwhile, these were of stability for addition of noise, change of illumination and affine deformation. This algorithm can be implemented through 4 major stages: (1) scale-space extrema detection; (2) feature points localization; (3) orientation assignment; (4) feature points descriptor.
This stage attempts to identify the locations and scales of the features that are identifiable from different views of the same object. This can be efficiently implemented using a scale-space function.
Formally, the linear scale-space representation of an image is instructed as follows. Let I(x, y) represent any given image. Then, the scale-space representation L(x, y, σ) is defined by
|
(1) |
where σ is the scale parameter and is a natural measure of spatial scale in the smoothed image at scale σ; G(x, y, σ) is the Gaussian kernel and written as
|
(2) |
Various techniques can be used to detect stable keypoint locations in the scale-space. Lowe has proposed a method using DoG (Difference of Gaussian), D(x, y, σ), which can be computed from the difference of adjacent image scales in Gaussian pyramid separated by a constant multiplicative factor
|
(3) |
The initial image, I(x, y), is incrementally convolved with Gaussians G(x, y, σ) to produce Gaussian pyramid with o (o=4) octaves. Each octave can be divided into σ (σ=4) intervals. In order to obtain more distinct feature points, the image of first interval in first octave is double size of initial image. Adjacent interval in same octave is separated by factor k in scale space. While a complete octave has been processed, the first interval in next octave can be obtained by down-samplingfrom the middle interval in previous octave (Fig. 2). And then DoG pyramid can be easily computed from difference of adjacent image scales in Gaussian pyramid shown as Fig. 3. The scale of first interval in DoG pyramid is the same with Gaussian pyramid, and also in other octaves.
To detect the local maxima and minima of D(x, y, σ), each sample point is compared to its 8 neighbors in the current image and 9 neighbors in the scale up and down (Fig. 4). If this value is the minimum or maximum of all these points then this point is an extrema.
Once a feature point candidate has been found by comparing a pixel to its neighbors, the next step is to fit a 3D quadratic function to the nearby data for location (sub-pixel), scale, and ratio of principal curvatures. This information allows points to be rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized along an edge.
The Taylor expansion at the local extrema (x0, y0, σ0) (up to the quadratic terms) of the scale-space function D(x, y, σ)
|
(4) |
where x=(x, y, σ)T is offset from the local extrema. The location of the extremum, x', is determined by taking the derivative of formula (4) with respect to x and setting it to zero, giving
|
(5) |
If the function value at x' computed from formula (6) is below a threshold value then this point is discarded.
|
(6) |
The DoG function will have a strong response along edges. So to eliminated edge response, it is noted that there is a large principle curvature across the edge but a small curvature in the perpendicular direction in the difference of Gaussian function. If this difference is below the ratio of largest to smallest eigenvector, from the 2×2 Hessian matrix at the location and scale of the feature point, this point is rejected.
In order to achieve invariance to image rotation, each feature point can be assigned a consistent orientation based on local image properties. For each image, L(x, y), with the closest scale, the gradient magnitude, m(x, y), and orientation, θ(x, y), can be computed as following formula
|
(7) |
where Lx=L(x+1, y)–L(x–1, y), Ly=L(x, y+1)–L(x, y–1).
Until here, the feature points with image location, scale, and orientation have been obtained.
Firstly, the coordinates of the descriptor and gradient orientations are rotated relative to the feature point orientation to guarantee orientation invariance. Shown as Fig. 5a, central black point represents current feature point location; each block represents the image sample in a region around the feature pointlocation with the same scale; the length of each arrow corresponds to the magnitude. Secondly, computing the gradient magnitude and orientation at each image point which is weighted by a Gaussian window, andcreating orientation histograms over 4×4 sample regionsshown as Fig. 5b. These image points are then accumulated into orientation histograms summarizing the content over 4×4 subregions. Figure 5 shows a 2×2 descriptor array computed from an 8×8 set of image points. However, in order to achieve the best results, Lowe suggested using a 4×4×8=128 element feature vector for each feature point.
In standard SIFT algorithm, a nearest neighbors approach is used to identify possible matching points in an image. The nearest neighbor here is defined as the feature point with minimum Euclidean distance for the invariant descriptor. Due to a number of craters on lunar surface, Chang'E-1 CCD images are consisting of very similar, neighboring texture patterns. Generally, these patterns are images of repetitive structures that have similar shape, size and physical properties. The fact means that there are a lot of similar feature points which tend to confuse matching. Therefore, percentage of false matching should increase if using standard SIFT registration algorithm to find correspondence points in whole candidates of features. So it is necessary to design an approach to improve the correctness of registration.
The registration strategy presented in this article can be completed as following.
(1) Matching area determination: Divide the original nadir images into small image patches as reference image, and determine the rough matching area from backward or forward image using area-based method.
(2) Feature points extraction: For each image patch obtained by step 1), feature points could be extracted using SIFT algorithm.
(3) Search space determination: For each feature point given in reference image, the position range of correspondent points in target image can be estimated by making use of the orbital altitude of Chang'E-1. Then, the registration can be implemented by searching the correspondent points in the estimated small region.
Chang'E-1 satellite with stereo camera can simultaneously produce high-resolution images with three viewing directions (nadir, forward and backward). CCD images with three different view angles are theoretically 100% overlapping in flight direction, but there is offset among each viewing direction. An illustration of Chang'E-1 CCD image has been shown as Fig. 6.
Because a swath of raw CCD image is too large for registration, it should be divided into smaller subimages. In this article, 1 024 row pixels (but not limited) have been extracted from nadir image as reference image with size of 1 024×512. And next step is that using area-based method to find a matching image from backward image overlapping the same region of lunar surface with reference image. Here sum of squared differences (SSD) between these two patches has been used for measurement and given as following.
|
(8) |
where, t represents the reference image extracted from nadir image. The f represents the backward image. The i, j represent the row and column of backward image, respectively.
In general, from the first row of backward image to start the search can be able to find the best match with minimum value of SSD(i, j). But in order to reduce the search time, offset between nadir and backward image can be pre-computed by making use of the parameters of the orbiter. That means range of i can be estimated.
As shown in Fig. 7, backward image and nadir image can overlap after time of Δt, meanwhile let Δs represent the scanning distance within this period. So
|
(9) |
|
(10) |
where, vsrepresents the speed of satellite flight and vs=1.424 9 km/s for Chang'E-1; θ represent backward-nadir view angle; H, fs represent the orbital altitude and scan line frequency, respectively. According to the parameters of Chang'E-1 lunar orbiter, offset between nadir and backward image can be easily computed from formula (10). So the search space can be narrowed in this range to reduce the computation.
Using area-based method, two subimages overlapping the same region of lunar surface have been obtained from nadir and backward image. If image strips acquired in ideal conditions, that is, the trajectory is assumed to be a straight line with the constant flying height, and all three images are assumed to be taken with perfect parallel projection to the standard sphere along the flight direction, then the feature points of nadir and backward image should be one to one in pixel coordinates. However, due to the orbital changes and undulant terrain, the position offset between nadir and backward image has been existed. So in this step, it attempts to determine the position range of correspondence point in backward or forward image for a given feature point in nadir image.
Consider an image triplet of Chang'E-1, as shown in Fig. 8. Given a feature point in the nadir image, an image ray that connects the instant perspective center and this image point can be determined. Given a lunar surface point A, and the projection of A is A0. The height approximation, h, can be computed by interpolation from LAM data of Chang'E-1 lunar mission. Let a represents the correspondence point in backward image. By computing the length of
|
(11) |
where, f represents the focal length of stereo camera; θ, H represent stereo view angle and orbital attitude.
So, for a given feature point, the height h interploted from LAM and the width of the search space can be computed from formula (11). According to the experiment results shown in Fig. 11, the width of 10 pixels is selected.
In this article, 159th swath of Chang'E-1 CCD image with nadir and backward view directions (the same with forward images, not limited) is used in the experiments. The scanning time of the images is from 2007/11/20 07:38:21 to 2007/11/20 09:46:03, and the size is 27 114×512 pixels.
Firstly, 1–1 024 rows of pixels have been extracted from nadir image to construct a subimage as reference image. The orbital attitude H is equal to 180 km. According to the method discussed above, offset between nadir and backward image can be computed as 450. The matching image from backward is 450–1 473 rows.
Feature points extracted using SIFT algorithm are shown in Fig. 9. There are 1 604 feature points in nadir image and 1 267 feature points in backward image. Let ratio be equal to 0.8, the results using standard SIFT matching algorithm is that totally 1 013 pairs have been registered of which 12 pairs are false matching as shown in Fig. 10.
Position offset has been obtained using the method discussed above. Here the elevation of each point can be approximately computed by interpolation from LAM data of Chang'E-1 lunar mission. Let the width of the search space be equal to 10 pixels. Then totally 1 001 pairs matching with no false obtained that is shown in Fig. 11.
In order to obtain dense feature points with even distribution, initial images have been enhanced using Laplacian filter and sharpened image edges and fine detail are much more obvious. After that, there are 3 867 feature points in nadir image and 4 032 feature points in backward image, then totally 2 546 pairs matching with no false obtained that is shown in Fig. 12.
Figure 13 represents Apollo 15 landing site area in 534th swath of Chang'E-1 CCD image with the size of 1 450×512. The scanning time of the images is from 2007/12/2313:20:31 to 2007/12/2313:23:55.
A novel registration approach combined of area-based method and feature-based method has been introduced. Firstly, rough matching area has been obtained using area-based method. Secondly, feature points of each subimage can be extracted by SIFT algorithm. Lastly, for each feature point given in reference image, the position of correspondence in target image can be estimated according to the parameters of Chang'E-1 lunar orbiter. In contrast to standard SIFT matching algorithm, the method proposed in this article can narrow the search space and accelerate the speed of computation while achieving reduction of the percentage of false matches.
The proposed method can provide dense, precision and reliable correspondent points in the stereo-pairs. According to the requirement, Laplacian filter enhancement could be used to obtain much more feature points. However, forsome poor texture area of CCD images, the correspondent points distributed unevenly. This problem should be taken account in the future work.
Babaud, J., Baudin, M., Witkin, A., et al., 1986. Uniqueness of the Gaussian Kernel for Scale-Space Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(1): 26–33 |
Brown, L. G., 1992. A Survey of Image Registration Techniques. ACM Computing Surveys, 24(4): 325–376 doi: 10.1145/146370.146374 |
Harris, C., Stephens, M., 1988. A Combined Corner and Edge Detector. In: Proceedings of the 4th Alvey Vision Conference, Manchester. 147–151 |
Kitchen, L., Rosenfeld, A., 1982. Gray-Level Corner Detection. Pattern Recognition Letters, 1(2): 95–102 doi: 10.1016/0167-8655(82)90020-4 |
Koenderink, J. J., 1984. The Structure of Images. Biological Cybernetics, 50(5): 363–370 doi: 10.1007/BF00336961 |
Lindeberg, T., 1994. Scale-Space Theory: A Basic Tool for Analyzing Structures at Different Scales. Journal of Applied Statistics, 21(2): 225–270 |
Li, C. L., Liu, J. J., Ren, X., et al., 2010. The Global Image of the Moon Obtained by the Chang'E-1: Data Processing and Lunar Cartography. Sci. China Earth Sci., 53(8): 1091–1102 doi: 10.1007/s11430-010-4016-x |
Lowe, D. G., 1999. Object Recognition from Local Scale-Invariant Features. In: Proceedings of International Conference on Computer Vision, Corfu. 1150–1157 |
Lowe, D. G., 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2): 91–100 doi: 10.1023/B:VISI.0000029664.99615.94 |
Moravec, H., 1981. Rover Visual Obstacle Avoidance. In: Proceedings of the 7th International Conference on Artificial Intelligence, Vancouver. 785–790 |
Otto, G. P., Chau, T. K. W., 1988. A Region-Growing Algorithm for Matching of Terrain Images. In: Proceedings of the 4th Alvey Vision Cofference. 123–128 |
Smith, S. M., Brady, J. M., 1997. SUSAM-A Approach to Low Level Image Processing. Journal of Computer Vision, 23(1): 45–78 doi: 10.1023/A:1007963824710 |
Witkin, A. P., 1984. Scale-Space Filtering: A New Approach to Multi-Scale Description: Acoustics, Speech, and Signal Processing. In: Proceedings of IEEE International Conference on ICASSP, IEEE. 150–153 |
Wiedemann, C., Tang, L., Ohlhof, T., 1996. A New Matching Approach for Three-Line Scanner Imagery. International Archives of Photogrammetry and Remote Sension, 31: 940–945 |
Ye, M. J., Li, J., Liang, Y. Y., et al., 2011. Automatic Seamless Stitching Method for CCDImages of Chang'E-1 Lunar Mission. Journal of Earth Science, 5(22): 610–618 doi: 10.1007%2Fs12583-011-0212-7 |
Zhang, L., Gruen, A., 2006. Multi-Image Matching for DSM Generation from IKONOS Imagery. Journal of Photogrammetry and Remote Sensing, 3(60): 195–211 http://www.onacademic.com/detail/journal_1000034077180710_b7d9.html |
Zitová, B., Flusser, J., 2003. Image Registration Methods: A Survey. Image and Vision Computing, 21(11): 977–1000 doi: 10.1016/S0262-8856(03)00137-9 |