= scRNAseq::NestorowaHSCData()
sce2
= cellula(sce2, name = "nestorowa", do_qc = FALSE)
sce2
= makeGraphsAndClusters(sce2, space = reducedDim(sce2, "PCA")[,1:20])
sce2
plot_UMAP(sce2, umap_slot = "UMAP", color_by = "SNN_0.64", label_by = "SNN_0.64")
Trajectory inference
At the time of writing cellula
only implements a wrapper around the slingshot
method[20] and the monocle3
method [21] for pseudotemporal trajectory inference, and the testPseudotime()
method from TSCAN
[22] for differential expression along a trajectory. More trajectory inference and differential expression methods will be implemented in the future.
The monocle3
method has been originally developed to work on 2D embeddings such as UMAP, but given the distortion introduced by these embeddings this function allows users to input any embedding (such as PCA). Moreover, the monocle3
method originally allows partitions to be specified. In the current implementation partitions have been disabled.
The findTrajectories()
function takes a SingleCellExperiment
object as input, and requires the user to specify the method (one of "slingshot"
or "monocle"
), the cluster label which will be used as an input to the MST creation in slingshot
, or to identify the starting node in monocle3
.
Other important parameters are:
dr
- the reduced dimensional embedding in which trajectories will be estimated (default is “PCA”)start
- the starting cluster for trajectory estimation, defaulting to “auto” for an entropy-based methodomega
-slingshot
only. Whether or not to use a synthetic cluster to estimate disjointed trajectories, as detailed in?slingshot::getLineages
Monocle_lg_control
-monocle
only. A list of control parameters for thelearn_graph()
function frommonocle3
.invert
-monocle
only. Whether or not to invert the direction of the pseudotime vector.dr_embed
- an additional 2D dimensional reduction slot in which to embed the trajectories (slingshot
curves ormonocle
principal graph). Yields slightly different results according to the method.doDE
- whether or not to perform differential expression on every trajectory.
The output is the same object used in the input, with some additional fields.
the
colData
slot will contain a column containing theslingPseudotime_N
pseudotime values for each lineage, where N is the number of the lineage (slingshot
) or just a column namedmonoclePseudotime
with pseudotime values for each cell (monocle
).the
metadata
slot will several new elements depending on the method.For
slingshot
:Slingshot_lineages
: the output ofslingLineages()
, i.e. the result of the MST constructionSlingshot_embedded_curves
: the output ofembedCurves()
, i.e. the projection of low-dimensional principal curves onto a 2D embedding such as UMAP. Optional and only created if the parameterdr_embed
is an available dimensionality reduction slot in the object.Slingshot_MST
: the MST used to determine lineages byslingshot
. Optional and only added if the parameteradd_metadata
is set to TRUE.Slingshot_curves
: the principal curves constructed byslingshot
. Optional and only added if the parameteradd_metadata
is set to TRUE.Slingshot_weights
: the weights assigned per cell within each lineage byslingshot
. Optional and only added if the parameteradd_metadata
is set to TRUE.Slingshot_params
: the params used in theslingshot
call. Optional and only added if the parameteradd_metadata
is set to TRUE.pseudotime_DE
: the results of the differential expression test. This is a list where each lineage is tested separately. Optional and only created ifdoDE
is set to TRUE.
For
monocle
:Monocle_embedded_curves
: the projection of the nodes of the principal graph to their nearest neighbor in 2D. Optional and only created if the parameterdr_embed
is an available dimensionality reduction slot in the object.Monocle_principal_graph
: theigraph
object with the principal graph.Monocle_principal_graph_aux
: the rest of theCellDataSet
object with additional information on the principal graph.
In this example we download a dataset that captures early haematopoietic differentiation from Nestorowa et al. [23] and we do a quick processing and clustering.
Then we take an arbitrary starting point (cluster 7) and calculate trajectories with both slingshot
and monocle
using the same space (PCA, first 20 components).
= findTrajectories(sce2, dr = "PCA",
sce2 method = "monocle",
ndims = 20, clusters = "SNN_0.64",
dr_embed = "UMAP", start = 7)
= findTrajectories(sce2, dr = "PCA",
sce2 method = "slingshot",
ndims = 20, clusters = "SNN_0.64",
dr_embed = "UMAP", start = 7)
Now we can visualize results on the UMAP since we specified a dr_embed
parameter. Notice how slingshot
will identify different pseudotimes (one per lineage) while monocle3
will identify a single pseudotime. Moreover, the plot_UMAP()
function can include a trajectory
argument which points to the name of the element in the metadata
slot containing segments to draw trajectories.
plot_UMAP(sce2, umap_slot = "UMAP", color_by = "slingPseudotime_1",
label_by = "SNN_0.64", trajectory = "Slingshot_embedded_curves")
plot_UMAP(sce2, umap_slot = "UMAP", color_by = "monoclePseudotime",
label_by = "SNN_0.64", trajectory = "Monocle_embedded_curves")
Since the 2D embedding of monocle3
PCA-derived trajectories may be hard to understand, given the distortions introduced by UMAP, cellula
includes an additional 2D embedding method, dr_embed = "FR"
, inspired båy the PAGA embedding initialization technique [24].
Briefly, once the principal graph has been calculated, it is laid out in 2D using the Fruchterman-Reingold algorithm. Then each cell is randomly plotted around their closest vertex in the graph, and reordered according to pseudotime value. This semi-random layout is used as an initialization for UMAP, which will optimize the point positions. The resulting layout is more visually pleasing and reflects more accurately the positions of cells with respect to the trajectory (although not necessarily to each other).
This FR-initialized UMAP is stored in a reducedDim
slot named UMAP_FR
.
= findTrajectories(sce2, clusters = "SNN_0.64", method = "monocle",
sce2 dr = "PCA", ndims = 20, start = "7", dr_embed = "FR")
plot_UMAP(sce2, umap_slot = "UMAP_FR", color_by = "monoclePseudotime",
label_by = "SNN_0.64", trajectories = "Monocle_embedded_curves")
It should be noted that this layout is, in a way, optimized for trajectories rather than global cell-cell similarity and as always should only be treated as a visualization tool.
Metacells
In order to speed up calculations and overcome sparsity, cells can be aggregated into metacells using k-means clustering with a high k.
Clustering is carried out in the reduced dimensional space of choice selected through the space
argument, and it is carried out in each level of group
separately.
Rather than selecting the number of clusters k, the function takes an average number of cells per cluster w which is used to determine k (default is w = 10 cells per cluster). Read counts are aggregated by gene across all cells within each cluster, resulting in metacells. These can be used as input to findTrajectories()
or other operations such as clustering, downsampling, signature scoring, etc.
In this example we create metacells aggregating (on average) 10 cells, within each cluster from the “SNN_0.5” clustering results:
<- makeMetacells(sce, group = "SNN_0.5", space = "PCA_Harmony", w = 10) sce_meta