Trajectory Inference with scTour

Trajectory Inference with scTour#

Using endocrine pancreas development as an example, this tutorial demonstrates scTour latent-time learning from raw UMI counts and trajectory modeling with neural ordinary differential equations.

Method background#

See the scTour documentation and the original Genome Biology paper.

import scanpy as sc
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import omicverse as ov
ov.plot_set(font_path='Arial')

%load_ext autoreload
%autoreload 2

🔬 Starting plot initialization...
Using already downloaded Arial font from: /var/folders/rv/3jnfbs0d6r7d0c5bfj7ft5k00000gn/T/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
✅ Apple Silicon MPS detected
    • [MPS] Apple Silicon GPU - Metal Performance Shaders available

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.1.3rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

adata = ov.datasets.pancreatic_endocrinogenesis()

⚠️ File ./data/endocrinogenesis_day15.h5ad already exists
 Loading data from ./data/endocrinogenesis_day15.h5ad
✅ Successfully loaded: 3696 cells × 27998 genes

adata = ov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=3000)
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features]
ov.pp.scale(adata)
ov.pp.pca(adata, layer='scaled', n_pcs=50)

🔍 [2026-05-12 15:49:23] Running preprocessing in 'cpu' mode...
Begin robust gene identification
    After filtration, 17750/27998 genes are kept.
    Among 17750 genes, 16426 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson

🔍 Count Normalization:
   Target sum: 500000.0
   Exclude highly expressed: True
   Max fraction threshold: 0.2
⚠️ Excluding 1 highly-expressed genes from normalization computation
   Excluded genes: ['Ghrl']

✅ Count Normalization Completed Successfully!
   ✓ Processed: 3,696 cells × 16,426 genes
   ✓ Runtime: 0.07s

🔍 Highly Variable Genes Selection (Experimental):
   Method: pearson_residuals
   Target genes: 3,000
   Theta (overdispersion): 100
✅ Experimental HVG Selection Completed Successfully!
   ✓ Selected: 3,000 highly variable genes out of 16,426 total (18.3%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'highly_variable_rank': Float vector (adata.var)
     • 'highly_variable_nbatches': Int vector (adata.var)
     • 'highly_variable_intersection': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'variances': Float vector (adata.var)
     • 'residual_variances': Float vector (adata.var)
    Time to analyze data in cpu: 0.46 seconds.
✅ Preprocessing completed successfully.
    Added:
        'highly_variable_features', boolean vector (adata.var)
        'means', float vector (adata.var)
        'variances', float vector (adata.var)
        'residual_variances', float vector (adata.var)
        'counts', raw counts layer (adata.layers)
    End of size normalization: shiftlog and HVGs selection pearson

╭─ SUMMARY: preprocess ──────────────────────────────────────────────╮
│  Duration: 0.5675s                                                 │
│  Shape:    3,696 x 27,998 -> 3,696 x 16,426                        │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
│  ● VAR    │ ✚ highly_variable (bool)                               │
│           │ ✚ highly_variable_features (bool)                      │
│           │ ✚ highly_variable_rank (float)                         │
│           │ ✚ means (float)                                        │
│           │ ✚ n_cells (int)                                        │
│           │ ✚ percent_cells (float)                                │
│           │ ✚ residual_variances (float)                           │
│           │ ✚ robust (bool)                                        │
│           │ ✚ variances (float)                                    │
│                                                                    │
│  ● UNS    │ ✚ REFERENCE_MANU                                       │
│           │ ✚ _ov_provenance                                       │
│           │ ✚ history_log                                          │
│           │ ✚ hvg                                                  │
│           │ ✚ log1p                                                │
│           │ ✚ status                                               │
│           │ ✚ status_args                                          │
│                                                                    │
│  ● LAYERS │ ✚ counts (sparse matrix, 3696x16426)                   │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯
╭─ SUMMARY: scale ───────────────────────────────────────────────────╮
│  Duration: 0.2826s                                                 │
│  Shape:    3,696 x 3,000 (Unchanged)                               │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
│  ● LAYERS │ ✚ scaled (array, 3696x3000)                            │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯
computing PCA🔍
    with n_comps=50
   🖥️ Using sklearn PCA for CPU computation
   🖥️ sklearn PCA backend: CPU computation
   📊 PCA input data type: ArrayView, shape: (3696, 3000), dtype: float64
🔧 PCA solver used: covariance_eigh
    finished✅ (8.68s)

╭─ SUMMARY: pca ─────────────────────────────────────────────────────╮
│  Duration: 8.6833s                                                 │
│  Shape:    3,696 x 3,000 (Unchanged)                               │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
│  ● UNS    │ ✚ scaled|original|cum_sum_eigenvalues                  │
│           │ ✚ scaled|original|pca_var_ratios                       │
│                                                                    │
│  ● OBSM   │ ✚ scaled|original|X_pca (array, 3696x50)               │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯

We first inspect the variance explained by principal components to choose a practical PC range for neighbor graph construction.

ov.utils.plot_pca_variance_ratio(adata, n_pcs=15)

../../../_images/8a3e9adcafabf41714666852c0a3ff53663af1962b773f163734045a2109cc59.png

ov.pl.umap(
    adata,
    color='clusters'
)

X_umap converted to UMAP to visualize and saved to adata.obsm['UMAP']
if you want to use X_umap, please set convert=False

../../../_images/354001f7eb832d53ea8686de6661e19b9c0c363bc5ca9699a4c61145cced1157.png

scTour#

scTour models developmental pseudotime, vector fields, and latent space in one framework to characterize cellular dynamics from multiple perspectives. We now train the scTour model. The default loss_mode is negative binomial conditioned likelihood (nb), which expects raw UMI counts in AnnData.X. By default, scTour normalizes the counts internally when needed.

adata.X=adata.layers['counts'].copy()
sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)

Traj=ov.single.TrajInfer(
    adata,basis='X_umap',
    groupby='clusters',
    use_rep='scaled|original|X_pca',
    n_comps=50
)
Traj.inference(
    method='sctour',
    alpha_recon_lec=0.5,
    alpha_recon_lode=0.5
)

ov.pl.embedding(
    adata,
    basis='X_umap',
    color=['clusters','sctour_pseudotime'],
    frameon='small',
    cmap='Reds'
)

../../../_images/1af16219cc38509ea7581b33bf3be083439e705677bb99cf31bb6c1dc930d040.png

adata.obs['sctour_pseudotime']=1-adata.obs['sctour_pseudotime']

ov.pl.embedding(
    adata,basis='X_umap',
    color=['clusters','sctour_pseudotime'],
    frameon='small',
    cmap='Reds'
)

../../../_images/9998ebcaa0b1f4d67714a96ddd17cf6897298f5a809c7f7a07cb2af19ecafd93.png

import os

os.makedirs('data', exist_ok=True)
adata.write('data/traj_tutorial.h5ad')

adata = ov.read('data/traj_tutorial.h5ad')
adata

AnnData object with n_obs × n_vars = 3696 × 3000
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'n_genes_by_counts', 'total_counts', 'sctour_pseudotime'
    var: 'highly_variable_genes', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'means', 'variances', 'residual_variances', 'highly_variable_rank', 'highly_variable', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
    uns: 'REFERENCE_MANU', '_ov_provenance', 'clusters_coarse_colors', 'clusters_colors', 'clusters_sizes', 'day_colors', 'history_log', 'hvg', 'log1p', 'neighbors', 'paga', 'paga_graph', 'pca', 'scaled|original|cum_sum_eigenvalues', 'scaled|original|pca_var_ratios', 'status', 'status_args'
    obsm: 'UMAP', 'X_TNODE', 'X_VF', 'X_pca', 'X_umap', 'scaled|original|X_pca'
    varm: 'PCs', 'scaled|original|pca_loadings'
    layers: 'counts', 'scaled', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'

Branch-aware pseudotime stream plot#

ov.pl.branch_streamplot only needs pseudotime and cell-state labels, so it can also be used for this trajectory method. Ribbon width shows where each cell type is enriched along pseudotime, and the branch center lines help show where endocrine fates separate.

fig, ax = ov.pl.branch_streamplot(
    adata,
    group_key='clusters',
    pseudotime_key='sctour_pseudotime',
    show=False,
)
plt.show()

../../../_images/c09e554ea434cc46a914a98a335369b4056aa1ae14d073ec95e33fd177dbccdf.png

Analyze scTour pseudotime with `dynamic_features` / `dynamic_trends`#

The same GAM trend layer can be applied to sctour_pseudotime. We first draw global marker trends with raw points colored by cluster, then compare late Alpha and Beta branches to show branch-associated expression changes.

import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display

sctour_genes = ['Sox9', 'Neurog3', 'Fev', 'Gcg', 'Arx', 'Pax4', 'Ins2', 'Pdx1', 'Sst', 'Hhex']

sctour_dyn = ov.single.dynamic_features(
    adata,
    genes=sctour_genes,
    pseudotime='sctour_pseudotime',
    use_raw=adata.raw is not None,
    distribution='normal',
    link='identity',
    n_splines=8,
    store_raw=True,
    raw_obs_keys=['clusters'],
)

🔍 Dynamic feature analysis:
   Views: 1 | Features: 10
   Pseudotime: sctour_pseudotime
   Stored raw obs keys: ['clusters']
   Expression source: adata.raw
   GAM: normal-identity | splines=8
✅ Dynamic feature analysis completed!
   ✓ Successful fits: 10/10
   ✓ Fitted rows: 2000
   ✓ Raw observations stored: 36960

Single-line Global Trends#

Each gene is fitted with one global trend line while raw points are colored by cell annotation. This view separates the overall pseudotime expression pattern from the cell states contributing the observations.

fig, _ = ov.pl.dynamic_trends(
    sctour_dyn,
    genes=['Sox9', 'Neurog3', 'Gcg', 'Ins2'],
    add_point=True,
    point_color_by='clusters',
    figsize=(5, 3.5),
    legend_loc='right margin',
    legend_fontsize=8,
    ncols=2,
    return_fig=True,
)
display(fig)
plt.close(fig)

🔍 Dynamic trend plotting:
   Features: 4 | Groups: 1
   compare_features=False | compare_groups=False
✅ Dynamic trend plotting completed!

../../../_images/4df6dc769dfccad3508b5b2dbbf55c47897145ca29e3188953e7cd448f896bad.png

Multi-marker Trend Comparison#

Multiple marker curves are overlaid on one pseudotime axis to compare activation and decay order across programs.

fig, _ = ov.pl.dynamic_trends(
    sctour_dyn,
    genes=['Sox9', 'Neurog3', 'Gcg', 'Ins2'],
    compare_features=True,
    add_point=True,
    point_color_by='clusters',
    line_style_by='features',
    figsize=(7, 4),
    linewidth=2.2,
    legend_loc='right margin',
    legend_fontsize=8,
    return_fig=True,
)
display(fig)
plt.close(fig)

🔍 Dynamic trend plotting:
   Features: 4 | Groups: 1
   compare_features=True | compare_groups=False
✅ Dynamic trend plotting completed!

../../../_images/55ad9dbb388a7511b62964cd2444ca5ea9d7234a4669d82a7cf91f05fa8ae1b0.png

branch_clusters = ['Alpha', 'Beta']
split_mask = adata.obs['clusters'].astype(str).isin(['Ngn3 high EP', 'Pre-endocrine'])

sctour_branch_dyn = ov.single.dynamic_features(
    adata,
    genes=['Gcg', 'Ins2', 'Pax4', 'Sox9'],
    pseudotime='sctour_pseudotime',
    groupby='clusters',
    groups=branch_clusters,
    use_raw=adata.raw is not None,
    distribution='normal',
    link='identity',
    n_splines=8,
    store_raw=True,
)
split_time = float(np.nanmedian(adata.obs.loc[split_mask, 'sctour_pseudotime'])) if split_mask.any() else float(np.nanmedian(adata.obs['sctour_pseudotime']))

🔍 Dynamic feature analysis:
   Views: 2 | Features: 4
   Pseudotime: sctour_pseudotime
   Grouping: clusters
   Expression source: adata.raw
   GAM: normal-identity | splines=8
✅ Dynamic feature analysis completed!
   ✓ Successful fits: 8/8
   ✓ Fitted rows: 1600
   ✓ Raw observations stored: 4288

fig, _ = ov.pl.dynamic_trends(
    sctour_branch_dyn,
    genes=['Gcg', 'Ins2', 'Pax4'],
    compare_groups=True,
    split_time=split_time,
    shared_trunk=True,
    add_point=True,
    point_color_by='group',
    figsize=(4.2, 3),
    ncols=3,
    linewidth=2.2,
    legend_loc='right margin',
    legend_fontsize=8,
    title='scTour branch-aware marker trends',
    return_fig=True,
)
display(fig)
plt.close(fig)

🔍 Dynamic trend plotting:
   Features: 3 | Groups: 2
   compare_features=False | compare_groups=True
✅ Dynamic trend plotting completed!

../../../_images/7eac2ad33bf3a566036b4908d78ee5d8ad391bf4db39622d3dd7c6bed3cf954d.png

Summarize scTour marker programs with `dynamic_heatmap`#

ov.pl.dynamic_heatmap compresses marker programs into a pseudotime-ordered heatmap, allowing progenitor, Alpha, Beta, and Delta programs to be checked for the expected order along scTour pseudotime.

sctour_marker = {
    'Endocrine progenitor': ['Sox9', 'Neurog3', 'Fev'],
    'Alpha fate': ['Gcg', 'Arx'],
    'Beta fate': ['Pax4', 'Ins2', 'Pdx1'],
    'Delta fate': ['Sst', 'Hhex'],
}

g = ov.pl.dynamic_heatmap(
    adata,
    var_names=sctour_marker,
    pseudotime='sctour_pseudotime',
    use_raw=adata.raw is not None,
    use_cell_columns=False,
    cell_annotation='clusters',
    cell_bins=180,
    smooth_window=17,
    fitted_window=31,
    figsize=(5, 4),
    standard_scale='var',
    cmap='RdBu_r',
    use_fitted=True,
    border=False,
    show=False,
)

🔍 Dynamic heatmap:
   Candidate features: 10
   Pseudotime: sctour_pseudotime
   Cell annotation: clusters
   use_fitted=True | cell_bins=180 | cmap=RdBu_r
✅ Dynamic heatmap completed!
   ✓ Matrix shape: 10 features × 180 columns

../../../_images/ed629df72f73df864690314807dff1915fc2cbc2126284009c05c12b18081fed.png