Plotting

cellula implements a few simple plotting functions for exploratory data analysis.

plot_UMAP() to plot a UMAP (or any other 2D dimensional reduction)
plot_Coldata() to plot data from the colData slot as a boxplot, scatterplot or confusion matrix
plot_dots() to plot a dot plot of gene expression

`plot_UMAP()`

You can choose point color using the color_by argument, and facetting is supported via the group_by argument. Additionally you can choose a shape_by for symbols, and label_by to place labels on the plot. Note that shape, group, and labels need to be categorical (i.e. factor) variables, whereas color can be numeric. The color palette is automatically generated, but it can be set by the user through the color_palette argument.

plot_UMAP(sce, umap_slot = "UMAP_Harmony", color_by = "individual", group_by = "disease")

plot_UMAP(sce, umap_slot = "UMAP_Harmony", color_by = "sum")

`plot_Coldata()`

Takes as input x and y as column names from colData(sce), with an optional color_by and group_by argument for facetting.

This function returns different plots depending on the class of the 2 colData columns selected: - if y is a numeric and x is categorical (character or factor), it returns a combined violin-boxplot with one plot per level of x.

plot_Coldata(sce, x = "individual", y = "sum") + scale_y_log10()

Additionally, if the color_by argument specifies another column, every x will be divided by levels of color_by. With the appropriate use of the x, color_by and group_by variables once an look at 3 different groupings of y at once.

plot_Coldata(sce, x = "individual", y = "sum", color_by = "disease", group_by = "cell type") + scale_y_log10()

if y and x are both categorical, it returns a heatmap of the confusion matrix where every value is the pairwise Jaccard index between sets for any given level pair (this is mostly useful to check for differences in clustering/annotations)
if y and x are both numeric, it returns a scatterplot with an optional 2D kernel density contour plot overlaid.

plot_Coldata(sce, x = "sum", y = "detected") + scale_x_log10()

`plot_dots()`

You can also use the plot_dots() function to plot the popular dot-plot for marker genes.

This function takes in a SingleCellExperiment object, together with a vector of genes (matched to the rownames of the object), and a grouping variable specified by the group_by argument. Additionally, dots can be ordered by hierarchical clustering on either genes, groups, or both (set cluster_genes and/or cluster_groups to TRUE, which is the default). Colors can also be customized via the color_palette argument. Finally, the user can choose whether they want genes to be columns (format = "wide", the default) or rows (format = "tall").

plot_dots(sce, genes = top5, group_by = "SNN_0.5")

Color palettes

cellula has a few standard color palettes for data visualization, which were lifted from different packages. All plotting functions allow the user to input their custom palettes.

Standard qualitative

The standard qualitative palette comes from iterations of the qualpalr package in the default implementation of qualpal(). These palettes are optimized for maximal color differences in a perceptual space, i.e. by finding the N farthest points in the DIN99 color space. Since the number of colors will change the coordinates, using 2, 3, 4, … N colors will create different palettes.

This is automatically selected by default when the data is categorical, e.g. for clusters, samples, etc. It can be selected manually by specifying color_palette = "Qualpal" where possible.

CVD-adjusted qualitative

There are two qualitative palettes from qualpalr adjusted for Color Vision Deficiency (CVD) using a severity of 0.5. One is adjusted for protanopia and one for tritanopia.

They can be selected by specifying color_palette = "Protan" or color_palette = "Tritan" where possible.

Other qualitative palettes

cellula offers a few alternative qualitative palettes out of the box, with different sizes.

Tableau: the Tableau palette contains 10 colors. This palette is not optimized for CVD and is not maximally separated, just pleasant to look at. The Tableau palette comes from base R.

Pear: the Pear18 palette contains 18 colors. This palette is not optimized for CVD and is not maximally separated, just pleasant to look at. The Pear18 palette comes from a subset of the original Pear36 palette on Lospec by user PineappleOnPizza.
Polychrome: the Polychrome24 palette contains 24 colors. This palette is not optimized for CVD and is not maximally separated, just pleasant to look at. The Polychrome24 palette comes from a subset of the Polychrome36 palette from base R.
Polylight: the Polychrome24 palette containst 24 colors. This palette is not optimized for CVD and is not maximally separated, just pleasant to look at. This is the Polychrome24 palette lightened by a factor of 0.4 using colorspace::lighten().

Quantitative palettes

The Sunset, Heat and truncated Yellow-Green-Blue (YlGnBu) quantitative palette comes from the colorspace package, through the sequential_hcl() function using 25 colors (Sunset and Heat), 40 colors truncated to the last 30 (YlGnBu).

The Parula and Turbo quantitative palettes come from the pals package, using the parula() and turbo() functions respectively, both using 25 colors.

Sunset is the default palette for quantitative data in dimensionality reductions. YlGnBu is the default palette for quantitative data in dot plots and heatmaps. Heat 2 is the palette for kernel densities on scatterplots. Parula and Turbo are there just for testing purposes but can be chosen using color_palette = "Parula" or color_palette = "Turbo" where possible.