Get started with Pseudotimecascade
Pseudotimecascade_tutorial.Rmd
Example: Pseudotimecascade Analysis on human bone marrow samples
This tutorial introduces the Pseudotimecascade
R
package, a toolkit for modeling gene expression dynamics along
pseudotime trajectories in single-cell RNA-seq data. The method
identifies genes with switch-like temporal expression patterns and
supports downstream biological interpretation through GO enrichment
analysis.
We demonstrate the complete workflow starting from a Seurat object
with clustering and dimensionality reduction. The key steps include: -
Computing pseudotime using TSCAN
or other tools
- Fitting gene trajectories with fitData()
- Classifying gene patterns with genePattern()
- Visualizing dynamic genes with heatmaps
- Performing enrichment analysis (group-based and bin-based)
- Integrating multi-sample results to assess reproducibility
The pipeline is modular and compatible with any pseudotime method, as
long as cells are assigned a numeric pseudotime value. While we
illustrate the process using TSCAN
and specific marker
genes from hematopoietic lineages, the same framework can be applied to
other systems and datasets.
All steps shown here are directly reproducible using your own Seurat object. Replace file names and cluster IDs as needed to fit your biological context.
Let’s get started.
Step 1: Load data and generate pseudotime for cells
In this tutorial, we start from a processed Seurat object that contains gene expression, and dimensionality reduction (e.g., PCA, UMAP). From this object, users can apply a trajectory inference method such as TSCAN, Monocle3, Slingshot, or RNA velocity to obtain a biologically meaningful ordering of cells along pseudotime. The only requirement is that each cell receives a numeric pseudotime value, which defines its position along the trajectory. Using this ordering, the expression matrix can be arranged so that rows correspond to genes and columns correspond to cells ordered by pseudotime, with expression values log-normalized and scaled for comparability across genes. Here, we include a pre-processed example object with pseudotime values, but in practice users can start from their own scRNA-seq data and apply the same workflow with any pseudotime inference method.
Step 2: Fit pseudotime expression using
Pseudotimecascade
With cells ordered by pseudotime and the corresponding expression
matrix prepared, we next call the core fitting function
fitData()
. Here expr_df
is obtained from the
RNA@data slot of the Seurat
object, with columns restricted to the cells that have been sorted by
their pseudotime values. This gives a gene-by-cell expression matrix
where each row is a gene and each column is a cell ordered along the
trajectory, effectively capturing the pseudo-temporal dynamics of gene
expression.
The pseudotime vector pt
provides the numeric position
of each cell along this trajectory and must be aligned with the same
cell order used in expr_df
. The argument
new_data
specifies the pseudotime grid on which predictions
will be made, and mc.cores sets the number of parallel processes. The
output fit_data_list
contains the fitted expression
trajectories, estimated switch points, and statistical metrics that form
the basis for downstream pattern classification and enrichment
analysis.
library(Pseudotimecascade)
# Ensure cells are ordered by pseudotime
cells_order <- rownames(obj@meta.data[order(obj$tscan_pseudotime), ])
expr_df <- obj@assays$RNA@data[, cells_order]
# Fit gene curves
fit_data_list <- fitData(
as.matrix(expr_df),
pt = obj$tscan_pseudotime[cells_order],
new_data = data.frame(pt = seq(1, nrow(obj@meta.data))),
mc.cores = 4
)
Tip: Because this step involves fitting curves for thousands of genes, it can be computationally intensive; for example, running with mc.cores = 4 typically requires around three hours for one thousand genes.
Step 3: Classify gene patterns
Once smooth trajectories have been fitted, the next step is to
identify the major temporal expression patterns across genes. The
function genePattern()
takes the fitted expression matrix
from fitData()
and classifies each gene into a discrete
category, such as increasing, decreasing, or biphasic.
These categories provide an intuitive summary of how genes behave along pseudotime, highlighting switch-like dynamics or more complex expression changes. The output is a data frame where each row corresponds to a gene and columns provide its assigned pattern, the estimated switch point (if applicable), and a ranking statistic for visualization.
In this tutorial, we store the fitted expression matrix, the list of
fit results, and the gene-level pattern assignments together in a single
object (pseudo_list
), which will serve as the input for
downstream heatmaps and enrichment analyses.
gene_group <- genePattern(as.data.frame(fit_data_list[["data"]]))
pseudo_list <- list(
expr_df = expr_df,
fit_data = fit_data_list,
gene_group = gene_group
)
Step 4: Select genes and plot Pseudotimecascade heatmap
To make the visualization clearer and computationally efficient, we do not plot all genes at once. Instead, we select the top 1,000 most significant genes based on their q-values from the fitting step, and then subsample cells by keeping every tenth cell along pseudotime. This produces a reduced expression matrix that still preserves the global dynamics but avoids overplotting.
In addition to these filtered genes, we also highlight a set of manually chosen marker genes relevant for hematopoietic differentiation. These marker genes are annotated on the heatmap, making it easier to track known regulators and to interpret the overall expression trends in a biological context.
The function PseudotimeHeatmap()
automatically orders
genes by their assigned expression pattern and estimated switch point
location, and produces a heatmap where each row is a gene and each
column a pseudotime-sampled cell. This view provides a compact summary
of dynamic gene expression programs along the trajectory.
library(Pseudotimecascade)
pseudo_list <- readRDS(
system.file("extdata", "pseudo_list.rds", package = "Pseudotimecascade")
)
# Match and sort gene pattern labels
hsc_genes <- c('ERG', 'HOXA5', 'HOXA9', 'HOXA10', 'LCOR', 'RUNX1', 'SPI1', "CD34")
cmp_genes <- c('GATA2', 'CEBPA', 'GATA1', 'SPI1', 'EKLF', 'FLI1','ZFPM1',
'TAL1', 'GFI1', 'JUN', 'EGR1', 'EGR2', 'NAB2')
ery_genes <- c('GATA1', 'TAL1', 'KLF1', 'LDB1', 'ZFPM1', 'ZBTB7A', 'MYB', "HBB", "HBA1", "HBA2")
mon_genes <- c('SPI1', 'IRF8', 'KLF4', 'ERG1', 'JUN', 'JUNB', 'STAT1', 'STAT3', 'CEBPB')
marked_genes <- unique(c(hsc_genes, cmp_genes, ery_genes))
# Plot heatmap
p <- PseudotimeHeatmap(x = pseudo_list$fit_data,
gl = marked_genes,
annotation = as.matrix(pseudo_list$gene_group)[, "pattern"])
p
Step 5: Enrichment analysis
We identify enriched biological processes for pseudotime-dynamic genes using two complementary approaches. Group-based enrichment applies GO analysis to genes grouped by temporal expression pattern (e.g., “I”, “D”, “ID”), while bin-based enrichment uses a sliding window along switch points to detect transient functional signals. Both approaches are run on the same set of top-ranked genes (e.g., top 1000 by q-value), ordered by pattern and switch point.
5.1: Group-Based Enrichment
Group-based enrichment in Pseudotimecascade
is carried
out using the enrichPattern()
function, which is
specifically designed for temporal gene expression analysis. Given a
gene grouping table produced by genePattern()
, users can
call enrichPattern()
to test for functional
overrepresentation within one specific pattern (e.g., “I” or “D”) or
across all patterns at once.
This function automatically extracts the genes belonging to the
chosen pattern and performs GO enrichment against a user-defined
background (the universe
). In practice, the universe is set
to all genes detected in the dataset after preprocessing, ensuring that
enrichment is interpreted relative to the expressed gene set. This
analysis links temporal gene expression dynamics to biological
processes, highlighting enriched functions associated with distinct
pseudotime patterns.
library(dplyr)
# Order gene pattern labels
ggene_group <- pseudo_list[["gene_group"]][rownames(pseudo_list[["fit_data"]][["data"]]), ]
gene_group <- gene_group[order(gene_group$pattern, gene_group$rank_point), ]
# Perform GO enrichment for each pattern
enrich_group_list <- enrichPattern(
gene.group = gene_group,
species = "human",
universe = universe,
ont = "BP"
)
# Save results
saveRDS(enrich_group_list, "pseudo_group_enrichment.rds")
Tip: You may later visualize these results as shown in Step 6.1.
5.2: Bin-Based Enrichment
Bin-based enrichment is implemented in Pseudotimecascade
through the function compareEnrichBin()
. Unlike group-based
enrichment, which aggregates all genes in a pattern, this method applies
a sliding window along pseudotime within each pattern. The window is
defined by two parameters: bin.width (the size of each window along
pseudotime) and stride (the step size between windows). This allows us
to detect biological processes that are transiently enriched at specific
points in the trajectory.
In the example below we demonstrate how to run
compareEnrichBin()
on genes assigned to the
"I"
pattern. The background gene set
(universe
) is the same as in the group-based analysis.
# Example: perform bin-based enrichment on "I" pattern genes
pattern <- "I"
bin.width <- 0.2
stride <- 0.1
# Run bin-based enrichment
genes_bin_enrich <- compareEnrichBin(
gene_group,
pattern = pattern,
bin.width = bin.width,
stride = stride,
species = "human",
ont = "BP",
universe = universe
)
# Save results
saveRDS(genes_bin_enrich, "pseudo_bin_enrichment.rds")
Tip: While we demonstrate bin-based enrichment using the
"I"
pattern here, the full analysis can be performed across
all expression patterns. Visualization of bin-based enrichment in
Pattern "I"
is shown in Step 6.2.
Step 6: Visualization of GO Enrichment Results
After identifying gene patterns using Pseudotimecascade
,
we visualize enriched GO terms associated with each pattern. Here we
demonstrate both group-based and
bin-based enrichment results.
6.1: Group-Based Enrichment Visualization
Group enrichment analyzes the overrepresentation of GO terms among
genes from a specific pattern (e.g., "I"
, "D"
,
"ID"
, etc.). In Pseudotimecascade
, this is
implemented through the enrichment functions described in Step 5, and
the results can be visualized to highlight key GO terms through the
function plotEnrichGroup()
.
Here we demonstrate how to visualize the enrichment results for the
"I"
pattern, where users can either supply a predefined
list of GO terms to highlight specific processes or rely on top-ranked
terms returned by the enrichment analysis.
# Load enrichment result
obj_enrich <- readRDS(system.file("extdata", "pseudo_group_enrichment.rds", package = "Pseudotimecascade"))
# Pattern of interest (e.g., "I" or "D")
group <- "I"
# Select GO terms
terms <- c("GO:0048821", "GO:0030218", "GO:0030099", "GO:0020027", "GO:0043249", "GO:0070482")
# Visualize group-based enrichment result
p <- plotEnrichGroup(obj_enrich[[group]],
terms = terms)
p
This plot shows GO enrichment for genes with increasing expression along
pseudotime. Dot size indicates the number of genes (Count), the x-axis
shows the fraction of increasing genes annotated to each GO term
(GeneRatio), and color reflects enrichment significance (q-value).
Tip: In addition to manual selection, users may also automatically display the top N enriched GO terms ranked by q-value for unbiased exploration.
6.2: Bin-Based Enrichment Visualization
In addition to group-wise enrichment, Pseudotimecascade
supports bin-based enrichment through the function
compareEnrichBin()
, which evaluates how functional
categories appear at different pseudotime windows within each expression
pattern. The results can then be visualized using the companion function
plotEnrichBin()
, which generates a bubble plot of enriched
terms across bins. This workflow allows users to detect transient
biological processes that may be enriched only within specific
pseudotime windows, complementing the broader insights from group-based
enrichment.
# Load bin-based enrichment result
genes_bin_enrich <- readRDS(system.file("extdata", "pseudo_bin_enrichment.rds", package = "Pseudotimecascade"))
# Pattern of interest (e.g., "I" or "D")
pattern <- "I"
n <- 5
qval <- 0.05
p <- plotEnrichBin(genes_bin_enrich[[pattern]], n = n, qval_cutoff = qval)
p
The x-axis shows pseudotime bins (clusters of genes grouped by their switch point location), and the y-axis lists GO terms that are significantly enriched within each bin. Dot size reflects the number of genes (Count) annotated to each GO term, while dot color indicates enrichment significance (q-value). For clarity, only the top 5 GO terms per bin with q-value ≤ 0.05 are shown. This visualization highlights biological processes that are transiently enriched at different stages along pseudotime.
Tip: You can adjust pattern, n, and qval_cutoff to explore different enrichment structures or other gene dynamics.
Step 7: Multi-sample Pseudotimecascade Analysis
In this section, we demonstrate how to integrate
Pseudotimecascade
results across multiple samples to
identify reproducible gene patterns and switch point trends. This allows
robust functional inference across donors or replicates.
We first merge the gene-level trends across samples, resulting in a
list containing average pseudotime expression (mean_expr
),
consensus pattern labels (mean_pattern
), and sample-wise
switching intervals (df_switch_point
). These outputs are
used for enrichment analysis and heatmap visualization.
Below we visualize selected lineage marker genes from the top 1000
most significant genes, using the PseudotimeHeatmapMS()
function.
# Load Pseudotimecascade results from multi-sample integration
gene_mean_list <- readRDS(system.file("extdata", "pseudo_list_multi_sample.rds", package = "Pseudotimecascade"))
# Define marker genes
hsc_genes <- c('ERG', 'HOXA5', 'HOXA9', 'HOXA10', 'LCOR', 'RUNX1', 'SPI1', "CD34")
cmp_genes <- c('GATA2', 'CEBPA', 'GATA1', 'SPI1', 'EKLF', 'FLI1','ZFPM1',
'TAL1', 'GFI1', 'JUN', 'EGR1', 'EGR2', 'NAB2')
ery_genes <- c('GATA1', 'TAL1', 'KLF1', 'LDB1', 'ZFPM1', 'ZBTB7A', 'MYB', "HBB", "HBA1", "HBA2")
marked_genes <- unique(c(hsc_genes, cmp_genes, ery_genes))
# Draw heatmap
p <- PseudotimeHeatmapMS(
x = gene_mean_list[["mean_expr"]],
gl = marked_genes,
annotation = as.matrix(gene_mean_list[["mean_pattern"]])[, "pattern"],
interval = gene_mean_list[["df_switch_point"]],
use_raster = FALSE
)
p
Enrichment analysis can also be applied to the multi-sample results
in the same way as for a single sample (see Step 5). Specifically, both
group-based enrichment (using enrichPattern()
) and
bin-based enrichment (using compareEnrichBin()
) can be
applied to the mean_pattern matrix.
For visualization of enriched GO terms, we recommend reusing the approaches from Step 6. Together, these visualizations will reveal how functional categories are enriched in specific patterns or transiently emerge at distinct pseudotime windows, providing a dynamic view of biological processes along the trajectory.