<- makeGraphsAndClusters(sce, k = c(0.1, 0.25, 0.5, 0.75, 1),
sce dr = "PCA_Harmony", ndims = 20,
sweep_on = "clustering", method = "louvain",
weighting_scheme = "jaccard", prefix = "SNN_",
verbose = TRUE)
Clustering
cellula
offers a wrapper around clustering functions. For now, only SNN-based Louvain[11] and Leiden[12] clustering are implemented. The makeGraphsAndClusters()
function allows users to do parameter sweeps along either the number of neighbors or the resolution of the clustering.
In this example we sweep along the value of the resolution
parameter for a Louvain clustering. For the SNN graph constructions, edges are weighted according to the jaccard index of their shared neighbors, mimicking the Seurat
graph construction and clustering procedure:
If another integration method has been run on the same object (e.g. Seurat integration), then the clustering can be performed on that integrated space by specifying the dr
argument (in this case, "PCA_Seurat"
):
<- makeGraphsAndClusters(sce, k = c(0.1, 0.25, 0.5, 0.75, 1),
sce dr = "PCA_Seurat", ndims = 20,
sweep_on = "clustering", method = "louvain",
weighting_scheme = "jaccard", prefix = "Seurat_SNN_",
verbose = TRUE)
The default value for space
is NULL
and will use the "PCA"
slot from the reducedDim()
accessor.
Plotting clustering results
You can visualize clustering results on the UMAP using the plot_UMAP()
function, adding labels if desired:
plot_UMAP(sce, umap_slot = "UMAP_Harmony", color_by = "SNN_0.5", label_by = "SNN_0.5")
If using the clustree
package[13], the clustering tree can be visualized by using the same prefix defined in makeGraphsAndClusters()
:
library(clustree)
clustree(sce, prefix = "SNN_")
The default arguments to the clustering wrapper include the generation of modularity and approximate silhouette scores for every clustering round. These will be stored in the metadata
of the SCE, named according to the prefix, the resolution, and the "modularity_"
and "silhouette_"
prefixes. Silhouette and modularity can be visualized by using the dedicated functions:
plotSilhouette(sce, "SNN_0.5")
plotModularity(sce, "SNN_0.5")
As seen in the Plotting section, you can plot cluster markers using the presto
package and the plot_dots()
function:
# Install {presto}
::install_github("immunogenomics/presto")
remotes
# Quick and dirty marker calculation
= presto::wilcoxauc(sce, group_by = "SNN_0.5")
markers = split(markers, markers$group)
markerlist
for(i in seq_len(length(markerlist))) {
$deltapct = markerlist[[i]]$pct_in - markerlist[[i]]$pct_out
markerlist[[i]]= markerlist[[i]][order(markerlist[[i]]$deltapct, decreasing = TRUE),]
markerlist[[i]]
}
= lapply(markerlist, function(x) x$feature[1:5])
top5 = Reduce(union, top5)
markergenes
plot_dots(sce, genes = top5, group_by = "SNN_0.5")
Metaclusters
Aditionally, metaclusters can also be identified. A metacluster is a cluster of clusters obtained by different clustering methods. Clusters across methods are linked acording to how many cells they share, and these links become edges of a graph. Then, Louvain clustering is run on the graph and the communities that are identified are metaclusters. These metaclusters show the relationship between clustering methods. Moreover, they can be used to understand cluster stability along different parameters and/or integration methods. A cell can belong to different clusters according to the clustering method (i.e. to the resolution or to the space that was used). If a cell belongs to clusters that are consistently included in a metacluster, then that cell belongs to a “stable” cluster. If instead the cell belongs to clusters that have different metacluster assignments, then it’s in an “unstable” position, meaning it may be clustered differently according to integration methods and/or resolutions.
The metaClusters()
function takes a clusters
argument, which is a vector of column names from the colData of the SCE where clustering results are stored. In this example it is easy to isolate by using grep()
and searching for the prefix “SNN_”.
<- colnames(colData(sce))[grep("SNN_", colnames(colData(sce)))]
clusterlabels <- metaCluster(sce, clusters = clusterlabels) sce
Every cell will belong to a series of clusters, which in turn belong to a metacluster. For every cell, we count how many times they are assigned to a particular metacluster, and the maximum metacluster is assigned, together with a “metacluster score” (i.e. the frequency of assignment to the maximum metacluster) and whether this score is above or below a certain threshold (0.5 by default). These columns are saved in the colData
slot of the SCE.
plot_UMAP(sce, umap_slot = "UMAP_Harmony", color_by = "metacluster_score", label_by = "SNN_0.5")