# BiocManager::install("scRNAseq")
<- scRNAseq::SegerstolpePancreasData()
sce colnames(rowData(sce)) = c("Symbol", "ID")
Usage
The pipelines in cellula
assume that you are working with the output of CellRanger
(or something similar) and you imported it into a SingleCellExperiment
object (hereafter SCE) using the TENxIO
package. This is relevant for gene identifiers, since the rowData
slot of the SCE will have a “Symbol” and a “ID” column.
For demo purposes we can use a publicly available dataset, Segerstolpe et al. 2016[2], which we retrieve using the scRNAseq
package:
Assuming that you have formed a SCE object containing the “individual” column that identifies different batches, you can run an integration pipeline as follows:
<- cellula(sce, name = "myproject", batch = "individual",
sce integration_method = "Harmony",
verbose = TRUE, save_plots = TRUE)
The name
argument defines the name of the folder that will be created to store files and plots. We set verbose = TRUE
to print the progress of the pipeline. Setting save_plots = TRUE
will create a few QC plots in the name/plots
folder: total UMI, total genes detected, UMI x genes; optionally % MT, % Ribo and %MALAT1, total UMI x doublet class. Plots are separated according to whether the cells were discarded or not in the filtering step.
The cellula()
function is a wrapper around a few modules or sub-pipelines that have different degrees of customization.
There are other independent functions that are not run through cellula()
as they need some user input, e.g. findTrajectories()
requires the user to specify the starting cluster (through makeGraphsAndClusters()
), or the cluster labels to use.
The scheme is:
cellula()
├── Quality Control [QC]
| ├── run emptyDrops (optional) [QC/EMPTY]
| ├── score mito/ribo/malat1 subsets (optional)
| ├── filter out (optional)
| └── doublet finding (optional) [QC/DBL]
├── Normalization and dimensionality reduction [NOR]
| ├── pre-clustering
| ├── computing pooled factors
| ├── log-normalization (simple or multi-batch)
| ├── HVG finding (simple or multi-batch)
| ├── PCA
| └── UMAP
├── Integration [INT] (optional) - choose one method
| ├── fastMNN [INT/MNN]
| | ├── integration
| | └── UMAP
| ├── Seurat [INT/SEURAT]
| | ├── conversion to Seurat
| | ├── normalization and HVG finding
| | ├── find integration anchors
| | ├── integrate data
| | ├── scale data
| | ├── PCA
| | └── UMAP
| ├── LIGER [INT/LIGER]
| | ├── conversion to LIGER
| | ├── normalization
| | ├── HVG finding
| | ├── scale data
| | ├── NMF
| | ├── quantile normalization
| | └── UMAP
| ├── Harmony [INT/HARMONY]
| | ├── Harmony matrix (on PCA)
| | └── UMAP
| └── Regression [INT/regression]
| | ├── regression on logcounts
| | ├── PCA
| | └── UMAP
| ├── scMerge2 [INT/scMerge2]
| | ├── Pseudobulk and RUV
| | └── UMAP
| └── STACAS [INT/STACAS]
| ├── conversion to Seurat
| ├── normalization and HVG finding
| ├── STACAS integration
| ├── PCA
| └── UMAP
└── Cell type annotation [ANNO] (optional) - choose one method
├── Seurat AddModuleScore
├── ssGSEA
├── UCell
├── AUCell
└── Jaitin
makeGraphsAndClusters()
└── Multi-resolution clustering [CLU]
├── sweep on Louvain/Leiden resolution or SNN neighbor numbers
├── calculate modularity (optional)
└── calculate silhouette (optional)
findTrajectories()
└── Trajectory estimation [TRAJ]
├── slingshot [TRAJ/slingshot]
| ├── get lineages
| ├── calculate principal curves
| ├── embed in 2D (optional)
| └── calculate per-lineage DE (optional)
└── monocle3 [TRAJ/monocle]
├── convert to CellDataSet
├── learn graph
└── embed in 2D (optional)
├── populate FR layout (if dr_embed = "FR")
└── UMAP on FR layout (if dr_embed = "FR")
Most of the choices can be made around the integration method. cellula
has implemented 5 methods: fastMNN
[3],[4] and regressBatches
from the batchelor
package, Harmony
[5], the CCA-based Seurat
[6] method, non-negative matrix factorization (NMF) from LIGER
[7] through the rliger
and RccPlancs
packages, a pseudobulk and RUV-based method, scMerge2
from the scMerge
package[8], and the Seurat
-based STACAS
integration method from the eponymous package[9].
LIGER
and Seurat
integration methods require an intermediate step where package-specific objects are created and some pre-processing steps are repeated again according to the best practices published by the authors of those packages.
Each step of the pipeline can be called independently on the object:
<- doQC(sce, name = "segerstolpe", batch = "individual", save_plots = TRUE)
sce <- doNormAndReduce(sce, name = "segerstolpe", batch = "individual")
sce <- integrateSCE(sce, batch = "individual", method = "Seurat") sce
Doublet identification is carried out through the scDblFinder
package[10] using standard defaults.