Version 0.1.10 of `largeVis`

adds two features that were not part of the original `LargeVis`

paper:

```
* The SGD phase of the LargeVis algorithm can be accelerated using momentum training.
* The method for calculating weights in the negative sampling phase of the algorithm is now a parameter.
```

In addition, that package includes implementations of the `DBSCAN`

, `OPTICS`

, and `HDBSCAN`

clustering algorithms optomized to take advantage of the neighbor data generated by `largeVis`

.

During the stochastic gradient descent phase of `largeVis`

, nodes that do not have edges to the current point are sampled along with each edge. In the original paper, the negative nodes are selected by sampling weighted according to their degree raised to the \(\frac{3}{4}\) power. In the reference implementation, however, the nodes are weighted according to \(P_n(j) \propto (\sum_{i \iff j \in N_i} w_{ij})^{0.75}\).

Versions of `largeVis`

prior to 0.1.8 (i.e., before the reference implementation became available) used degree. These results may be more aesthetically pleasing to many people. Version 0.1.10 therefore allows this to be set as a parameter. The default is to use edge weights, as in the reference implementation.

The difference is that using edge weights, the resulting clusters tend to be more plainly convex. Using degree encourages a wider variation of irregular shapes. The effect is imperceptible at small data sizes; is subtle noticeable at around 50,000 nodes; but becomes pronounced on datasets of greater than 1 million nodes.

Momentum training is method for optimizing stochastic gradient descent to encourage more rapid convergence. The technique is common in deep learning.

Momentum alters the gradient so that in each iteration the algorithm calculates the update for each parameter taking into account prior update on the same parameter. If the gradient for a parameter has the same direction on one iteration as the previous, momentum will cause it to move further in that direction. If the gradient reverses, the update will be smaller than it would be otherwise.

Without momentum, each iteration updates the low dimensional embedding according to the equation \(\theta_{t + 1} = \theta_t + \rho\frac{dO}{d\theta_t}\)

With momentum, this becomes: \(\theta_{t + 1} = \theta_t + \mu_t\) where \(\mu_t = \rho\frac{dO}{d\theta_t} + \lambda\mu_{t - 1}\)

The momentum paramter, \(\lambda\), controls the rate of decay.

Adding momentum can make it possible to reduce the number of sgd batches, in some cases substantially, without reducing the visual quality of the result.

`largeVis`

also includes three clustering algorithms: `DBSCAN`

, `OPTICS`

, and `HDBSCAN`

.

The implementations here are not intended for general use. Rather, the `LargeVis`

algorithm calculates, and `largeVis`

retains, nearest neighbor information for inputs. Much of the computation time of the three clustering algorithms is spent identifying neighbors and calculating distances. The intent of these implementations is, by resuing the neighbor data generated by `largeVis`

, to make it practical to apply these very computationally expensive algorithms to very large datasets.

All three algorithms attempt to cluster points by linking a point to its nearest neighbors, using a modified definition of distance. The `core distance`

of a point is defined as the distance to its Kth nearest neighbor. It is an (inverse) measure of the density of space around a point. The `reachability distance`

between two points is the greater of the distance between them and `core distances`

. When the neighbors of a point, \(p\), are identified to link it with its nearest neighbors, no point \(q\) can be closer to \(p\) than \(p\)’s `core distance`

. Thus, `reachability distance`

combines both distance and density. (Differences in the definition of `core distance`

and `reachability distance`

among the algorithms are not addressed here.)

`DBSCAN`

(Ester et al. 1996) attempts to achieve a flat clustering using a consistent density threshold, \(\epsilon\). Points with more than `minPts`

neighbors in their \(\epsilon\)-neighborhoods are `core points`

. Where the \(\epsilon\) neighborhoods of corepoints overlap, the neighborhoods are merged into a single cluster. `DBSCAN`

thus takes two hyper-parameters, \(\epsilon\) and `minPts`

.

The objects returned by the `largeVis`

`DBSCAN`

implementation are compatible with the `dbscan`

objects producted by package `dbscan`

. (Hahsler 2016)

The following chart illustrates the effect of the \(\epsilon\) and `minPts`

parameters on the performance of `DBSCAN`

.

`OPTICS`

(Ankerst et al. 1999) applies similar logic as `DBSCAN`

to build a hierarchical clustering. Beginning with a seed point, `OPTICS`

finds the point with the lowest reachability distance (up to the cutoff, \(\epsilon\)) to any point already in the graph, and adds it. The result is an ordering of points by decreasing density. When there are no points with a `reachability distance`

to the graph less than \(\epsilon\), `OPTICS`

begins again with a new seed point.

This implementation of `OPTICS`

has a hyperparameter `useQueue`

which controls the order in which points are processed. In effect, this controls what points will be the first point in newly discovered clusters. The reachability graph above used `useQueue = FALSE`

, which is used by the `dbscan`

package and the `ELKI`

implementation. If `useQueue`

is set to `TRUE`

, then the algorithm will use points in denser regions of the space as the seeds for new clusters.

This is illustrated in the following `reachability plots`

for the spiral dataset:

`OPTICS`

can be thought of as a hierarchical generalization of `DBSCAN`

. `OPTICS`

clusterings are readily convertible to `DBSCAN`

clusterings by cutting clusters when the `reachability distance`

exceeds a threshold. This can be done using the `extractDBSCAN`

function of the `dbscan`

package. The resulting clusters are identical to what would be obtained using `DBSCAN`

with the same parameters, except in assignment of noise points.

The `dbscan`

package has other functions for cutting and visualizing `OPTICS`

clusterings as well. The `optics`

objects produced by `largeVis`

are compatible with the ones produced by the `dbscan`

packag.

`HDBSCAN`

(Campello, Moulavi, and Sander 2013) aims to make the `DBBSCAN`

and `OPTICS`

approach flexible for graphs with different densities in different regions, by building a hierarchical density-bsaed clustering from which flat clusters are extracted. `HDBSCAN`

does not need the \(\epsilon\) hypterparameter, which can be difficult to set. Rather, `HDBSCAN`

, in effect, determines different values for \(\epsilon\) for different parts of the dataset.

The `largeVis`

implementation of `HDBSCAN`

produces flat and hierarchical clusterings simultaneously. The hiearchical clusterings are compatible with `hclust`

and other R hierarchical clustering packages and tools by means of an `as.dendrogram.hdbscan`

function, which converts them to standard R dendrograms. The package also includes a function for visualizing the clustering.

Ankerst, Mihael, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. “OPTICS: Ordering Points to Identify the Clustering Structure.” *SIGMOD Rec.* 28 (2). New York, NY, USA: ACM: 49–60. doi:10.1145/304181.304187.

Campello, Ricardo J. G. B., Davoud Moulavi, and Joerg Sander. 2013. “Density-Based Clustering Based on Hierarchical Density Estimates.” In *Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, Pakdd 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part Ii*, edited by Jian Pei, Vincent S. Tseng, Longbing Cao, Hiroshi Motoda, and Guandong Xu, 160–72. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-37456-2_14.

Ester, Martin, Hans-peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.” In, 226–31. AAAI Press.

Hahsler, Michael. 2016. *Dbscan: Density Based Clustering of Applications with Noise (Dbscan) and Related Algorithms*. https://CRAN.R-project.org/package=dbscan.