The other vignette focuses on reproducing a single clustering workflow that assumes that the number of clusters has been decided. As the app includes a few options for evaluating clusters, some of the functions are also made available in the package. The output of the clustering functions can also be used with other packages.
library(visxhclust)
library(dplyr)
<- iris %>% select(Sepal.Length, Sepal.Width, Petal.Width)
numeric_data <- compute_dmat(numeric_data, "euclidean", TRUE)
dmat <- compute_clusters(dmat, "complete") clusters
For Gap statistic, the optimal number of clusters depends on the
method use to compare cluster solutions. The package cluster includes
the function cluster::maxSE()
to help with that.
<- compute_gapstat(scale(numeric_data), clusters)
gap_results <- cluster::maxSE(gap_results$gap, gap_results$SE.sim)
optimal_k line_plot(gap_results, "k", "gap", xintercept = optimal_k)
The Shiny app also includes the option to compute average silhouette
widths or Dunn index. The function compute_metric
works
similarly to compute_gapstat
, whereas
optimal_score
is similar to maxSE. However,
optimal_score
varies only between first and global minimum
and maximum.
<- compute_metric(dmat, clusters, "dunn")
res <- optimal_score(res$score)
optimal_k line_plot(res, "k", "score", optimal_k)