Trajectory Inference with Palantir#

Using endocrine pancreas development as an example, this tutorial shows how to infer pseudotime with Palantir, inspect branch structure, summarize gene trends, and draw branch-aware pseudotime stream plots with ov.pl.branch_streamplot.

Method background#

See the Palantir documentation and the original Nature Biotechnology paper.

import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import omicverse as ov
ov.plot_set(font_path='Arial')

%load_ext autoreload
%autoreload 2
🔬 Starting plot initialization...
Using already downloaded Arial font from: /var/folders/rv/3jnfbs0d6r7d0c5bfj7ft5k00000gn/T/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
✅ Apple Silicon MPS detected
    • [MPS] Apple Silicon GPU - Metal Performance Shaders available

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.1.3rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.
adata = ov.datasets.pancreatic_endocrinogenesis()
⚠️ File ./data/endocrinogenesis_day15.h5ad already exists
 Loading data from ./data/endocrinogenesis_day15.h5ad
✅ Successfully loaded: 3696 cells × 27998 genes
adata = ov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=3000)
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features]
ov.pp.scale(adata)
ov.pp.pca(adata, layer='scaled', n_pcs=50)
🔍 [2026-05-12 15:48:18] Running preprocessing in 'cpu' mode...
Begin robust gene identification
    After filtration, 17750/27998 genes are kept.
    Among 17750 genes, 16426 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson

🔍 Count Normalization:
   Target sum: 500000.0
   Exclude highly expressed: True
   Max fraction threshold: 0.2
⚠️ Excluding 1 highly-expressed genes from normalization computation
   Excluded genes: ['Ghrl']

✅ Count Normalization Completed Successfully!
   ✓ Processed: 3,696 cells × 16,426 genes
   ✓ Runtime: 0.08s

🔍 Highly Variable Genes Selection (Experimental):
   Method: pearson_residuals
   Target genes: 3,000
   Theta (overdispersion): 100
✅ Experimental HVG Selection Completed Successfully!
   ✓ Selected: 3,000 highly variable genes out of 16,426 total (18.3%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'highly_variable_rank': Float vector (adata.var)
     • 'highly_variable_nbatches': Int vector (adata.var)
     • 'highly_variable_intersection': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'variances': Float vector (adata.var)
     • 'residual_variances': Float vector (adata.var)
    Time to analyze data in cpu: 0.49 seconds.
✅ Preprocessing completed successfully.
    Added:
        'highly_variable_features', boolean vector (adata.var)
        'means', float vector (adata.var)
        'variances', float vector (adata.var)
        'residual_variances', float vector (adata.var)
        'counts', raw counts layer (adata.layers)
    End of size normalization: shiftlog and HVGs selection pearson

╭─ SUMMARY: preprocess ──────────────────────────────────────────────╮
  Duration: 0.5872s                                                 
  Shape:    3,696 x 27,998 -> 3,696 x 16,426                        
                                                                    
  CHANGES DETECTED                                                  
  ────────────────                                                  
   VAR    highly_variable (bool)                               
 highly_variable_features (bool)                      
 highly_variable_rank (float)                         
 means (float)                                        
 n_cells (int)                                        
 percent_cells (float)                                
 residual_variances (float)                           
 robust (bool)                                        
 variances (float)                                    
                                                                    
   UNS    REFERENCE_MANU                                       
 _ov_provenance                                       
 history_log                                          
 hvg                                                  
 log1p                                                
 status                                               
 status_args                                          
                                                                    
   LAYERS counts (sparse matrix, 3696x16426)                   
                                                                    
╰────────────────────────────────────────────────────────────────────╯
╭─ SUMMARY: scale ───────────────────────────────────────────────────╮
  Duration: 0.3006s                                                 
  Shape:    3,696 x 3,000 (Unchanged)                               
                                                                    
  CHANGES DETECTED                                                  
  ────────────────                                                  
   LAYERS scaled (array, 3696x3000)                            
                                                                    
╰────────────────────────────────────────────────────────────────────╯
computing PCA🔍
    with n_comps=50
   🖥️ Using sklearn PCA for CPU computation
   🖥️ sklearn PCA backend: CPU computation
   📊 PCA input data type: ArrayView, shape: (3696, 3000), dtype: float64
🔧 PCA solver used: covariance_eigh
    finished✅ (8.85s)

╭─ SUMMARY: pca ─────────────────────────────────────────────────────╮
  Duration: 8.849s                                                  
  Shape:    3,696 x 3,000 (Unchanged)                               
                                                                    
  CHANGES DETECTED                                                  
  ────────────────                                                  
   UNS    scaled|original|cum_sum_eigenvalues                  
 scaled|original|pca_var_ratios                       
                                                                    
   OBSM   scaled|original|X_pca (array, 3696x50)               
                                                                    
╰────────────────────────────────────────────────────────────────────╯

We first inspect the variance explained by principal components to choose a practical PC range for neighbor graph construction.

ov.utils.plot_pca_variance_ratio(adata, n_pcs=15)
ov.pl.umap(
    adata,
    color='clusters'
)
X_umap converted to UMAP to visualize and saved to adata.obsm['UMAP']
if you want to use X_umap, please set convert=False
../../../_images/354001f7eb832d53ea8686de6661e19b9c0c363bc5ca9699a4c61145cced1157.png

Palantir#

Palantir needs an approximate early cell. It can infer terminal states automatically; in this dataset the major terminal states are known, so we specify them through terminal_states. Here ov.single.TrajInfer is used to build the analysis object.

Traj=ov.single.TrajInfer(
    adata,
    basis='X_umap',
    groupby='clusters',
    use_rep='scaled|original|X_pca',
    n_comps=50
)
Traj.set_origin_cells('Ductal')
Traj.set_terminal_cells(["Alpha","Beta","Delta","Epsilon"])
Traj.inference(method='palantir',num_waypoints=500)
**finished identifying marker genes by COSG**
Sampling and flocking waypoints...
Time for determining waypoints: 0.00039654970169067383 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
Time for shortest paths: 0.1240071177482605 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 0.9998
Correlation at iteration 2: 1.0000
Entropy and branch probabilities...
Markov chain construction...
Computing fundamental matrix and absorption probabilities...
Project results to all cells...
<omicverse.external.palantir.presults.PResults at 0x16573cd10>

Palantir results can be projected onto tSNE or UMAP with plot_palantir_results.

Traj.palantir_plot_pseudotime(
    embedding_basis='X_umap',
    cmap='RdBu_r',
    s=3,
    n_cols=4,
    figsize=(8, 6),
)

After selecting branch cells, it is useful to map them back to the pseudotime trajectory and confirm that the intended branch was selected. plot_branch_selection performs this check.

Traj.palantir_cal_branch(
    eps=0,
    plot_kwargs={
        'figsize': (6, 4),
        'selected_color': '#1f77b4',
        'deselected_color': '#d9d9d9',
        's': 4,
    },
)
ov.external.palantir.plot.plot_trajectory(
    adata,
    "Alpha",
    cell_color="palantir_entropy",
    n_arrows=10,
    color="red",
    scanpy_kwargs=dict(cmap="RdBu_r"),
)
[2026-05-12 15:48:53,998] [INFO    ] Using sparse Gaussian Process since n_landmarks (50) < n_samples (805) and rank = 1.0.
[2026-05-12 15:48:53,998] [INFO    ] Using covariance function Matern52(ls=1.262711524963379).
[2026-05-12 15:48:54,026] [INFO    ] Computing 50 landmarks with k-means clustering (random_state=42).
[2026-05-12 15:48:55,108] [INFO    ] Sigma interpreted as element-wise standard deviation.
<Axes: title={'center': 'Branch: Alpha'}, xlabel='UMAP1', ylabel='UMAP2'>
../../../_images/10ae6fa8a30c95f4a2cfcb6dcb832060a65abf06430c8af5fbbdd743eb3db7fa.png

Branch-oriented pseudotime stream plot#

After computing palantir_pseudotime, cluster occupancy along pseudotime can be smoothed into KDE ribbons and mapped onto a compact branch skeleton. This provides a trajectory-level overview suitable for figures.

fig, ax = ov.pl.branch_streamplot(
    adata,
    group_key='clusters',
    pseudotime_key='palantir_pseudotime',
    show=False,
)
plt.show()

Palantir uses the Mellon Function Estimator to fit gene-expression trends along lineages. The following code computes marker trends for all lineages; a subset can also be selected with the lineages argument.

adata.layers['lognorm'] = adata.X.copy()
# MAGIC currently conflicts with the NumPy version in the dev environment,
# so we keep a stable smoothed-expression placeholder layer for downstream trends/heatmaps.
adata.layers['MAGIC_imputed_data'] = adata.layers['lognorm'].copy()
gene_trends = Traj.palantir_cal_gene_trends(
    layers="MAGIC_imputed_data",
)
Delta
[2026-05-12 15:48:57,000] [INFO    ] Using sparse Gaussian Process since n_landmarks (500) < n_samples (624) and rank = 1.0.
[2026-05-12 15:48:57,001] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-05-12 15:48:58,026] [INFO    ] Sigma interpreted as element-wise standard deviation.
Beta
[2026-05-12 15:48:58,241] [INFO    ] Using sparse Gaussian Process since n_landmarks (500) < n_samples (939) and rank = 1.0.
[2026-05-12 15:48:58,241] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-05-12 15:48:58,667] [INFO    ] Sigma interpreted as element-wise standard deviation.
Alpha
[2026-05-12 15:48:58,794] [INFO    ] Using sparse Gaussian Process since n_landmarks (500) < n_samples (805) and rank = 1.0.
[2026-05-12 15:48:58,794] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-05-12 15:48:59,159] [INFO    ] Sigma interpreted as element-wise standard deviation.
Epsilon
[2026-05-12 15:48:59,282] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (500) >= n_samples (324) and rank = 1.0.
[2026-05-12 15:48:59,282] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-05-12 15:48:59,684] [INFO    ] Sigma interpreted as element-wise standard deviation.
Traj.palantir_plot_gene_trends(
    ['Pax4', 'Ins2'],
    layers='MAGIC_imputed_data',
    figsize=(4.5, 3),
    compare_groups=True,
    linewidth=2.2,
)
plt.show()
🔍 Dynamic feature analysis:
   Views: 4 | Features: 2
   Pseudotime: palantir_pseudotime
   Layer: MAGIC_imputed_data
   GAM: normal-identity | splines=8
✅ Dynamic feature analysis completed!
   ✓ Successful fits: 8/8
   ✓ Fitted rows: 1600

🔍 Dynamic trend plotting:
   Features: 2 | Groups: 4
   compare_features=False | compare_groups=True
✅ Dynamic trend plotting completed!
../../../_images/13a6e986c7cbe5f9198ac99c5c0bbfb0b8f54338d658ade7fad1f5ee3e0b7db1.png
Traj.palantir_plot_gene_trends(
    ['Pax4', 'Ins2'],
    lineages=['Beta', 'Alpha'],
    layers='MAGIC_imputed_data',
    figsize=(4.5, 3),
    compare_groups=True,
    linewidth=2.2,
)
plt.show()
🔍 Dynamic feature analysis:
   Views: 2 | Features: 2
   Pseudotime: palantir_pseudotime
   Layer: MAGIC_imputed_data
   GAM: normal-identity | splines=8
✅ Dynamic feature analysis completed!
   ✓ Successful fits: 4/4
   ✓ Fitted rows: 800

🔍 Dynamic trend plotting:
   Features: 2 | Groups: 2
   compare_features=False | compare_groups=True
✅ Dynamic trend plotting completed!
../../../_images/a6d6972e08b7f482accdd22d8e3be0926e5fb21733311e706fb0e9d9dd4ec259.png