This section displays the recent machine learning tools for analysis of hyperspectral images from Raman, Infrared, Fluorescence, or other any spectroscopic type. The machine learning methods are split into two supervised and unsupervised. In hyperspectral data, the most common methods are unsupervised due to the small number of datasets. However, currently, the imaging speed is increasing the new developments or experiments corresponding to supervised and even deep learning will appear.
PCA
VCA
Vertex component analysis. This an unmixing algorithm to find the endmembers of multidimensional datasets. VCA determines the vertexes of a geometrical simplex (extension of the triangle). VCA reduces the dimension of the datasets by principal componets or single value decomposition depending on the signal-noise ratio of the input data. The number of endmembers is defined by the user. After computing the endmembers, the concentrations are determined by algorithms that estimate concentrations, for example, non-negative least squares. Fig. 1 shows the representation of endmembers.
Figure 1 Representation of mixing matrix (green) and endmembers (A, B and C) at two bands.
A linear system is defined as eq.1 shows, M is the endmember matrix, $\alpha$ is the abundance or concentration and $n$ is the noise. VCA assumes the presence of pure endmember pixels and linear behavior.
$r = M\alpha + n$ (1)
HDBSCAN
Hierchial density-based spatial clustering applications with noise (HDBSCAN). It clusters points that keep similar density conditions in a defined number of neighbors k. Fig. 3 illustrates DBSCAN, which defines a fixed distance (d) and thus fixes density. The densiy is calculated by 1/d, thus the noisy points get low density and many points in d provide high densities. HDBSCAN uses a dynamic distance called mutual reachability distance $ d_{reach} $. $ d_{reach} $ finds the maximum distance from a certain number of neighbors at Xj, or Xi or distance from Xi and Xj. Fig. 3 shows HDBSCAN with $ d_{reach} $ calculated from Xi and neighbor point Xj. In this example, the largest distance is d. As the number of neighbors increases, more points are taken with large d values (noisy points). When k is set 1, the algorithm behaves like a hierchical clustering with a single linkage (highly sensitive to noise). By computeing the density, the noisy points obtain low density. The dynamic calculation of density allows to definition a hierarchy as Fig. 4 displays.
After calculating the $ d_{reach} $, a hierarchy of connected components is obetained by varying the threshold levels of density. An efficient method is by the minimum spanning tree of the graph. HDBSCAN creates a robust single linkage. For a suitable cluster extraction, a condensing algorithm is applied to the previous three by chosing a minimum cluster size (smalles size to be considered as a cluster). Finally, the algorithm extracts the stable clusters from the tree. Fig. 3 shows the defintion of k(Xi) and k(Xj), which are the core distances considering a k-th number of neighbors, and d is the distance between Xi and Xj core points.
Figure 3 DBSCAN and HDBSCAN
Figure 4 Hierarchy and Smoothing
Advantages:
- HDBSCAN does not require to specify the number of expected clusters
- It can discover any shape of clusters.
- It defines a noise cluster and is robust to outliers.
- It requires two parametes, the number of neighbors, and the minmium size of clusters.
Disadvantages:
- It is not entirely deterministic.
- It overcomes the disadvantage of DBSCAN for finding clusters with different densities.
then go to github https://github.com/darksiders123/spectramap and download the link and copy the data file with the examples into the jupyter
References
https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html
J. M. P. Nascimento and J. M.
B. Dias, “Vertex component analysis: A fast algorithm to unmix hyperspectral
data,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 4, pp. 898–910,
2005, doi: 10.1109/TGRS.2005.844293.
Comments
Post a Comment