Trajectory Inference with PAGA or Palantir¶
Diffusion maps were introduced by Ronald Coifman and Stephane Lafon, and the underlying idea is to assume that the data are samples from a diffusion process.
Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination.
Note that both methods require the input of cells in their initial state, and we will introduce other methods that do not require the input of artificial information, such as pyVIA, in subsequent analyses.
Preprocess data¶
As an example, we apply differential kinetic analysis to dentate gyrus neurogenesis, which comprises multiple heterogeneous subpopulations.
import scanpy as sc
import scvelo as scv
import matplotlib.pyplot as plt
import omicverse as ov
ov.plot_set()
Global seed set to 0
____ _ _ __ / __ \____ ___ (_)___| | / /__ _____________ / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ / /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/ \____/_/ /_/ /_/_/\___/ |___/\___/_/ /____/\___/ Version: 1.5.10, Tutorials: https://omicverse.readthedocs.io/
import scvelo as scv
adata=scv.datasets.dentategyrus()
adata
AnnData object with n_obs × n_vars = 2930 × 13913 obs: 'clusters', 'age(days)', 'clusters_enlarged' uns: 'clusters_colors' obsm: 'X_umap' layers: 'ambiguous', 'spliced', 'unspliced'
adata=ov.pp.preprocess(adata,mode='shiftlog|pearson',n_HVGs=3000,)
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features]
ov.pp.scale(adata)
ov.pp.pca(adata,layer='scaled',n_pcs=50)
Begin robust gene identification After filtration, 13264/13913 genes are kept. Among 13264 genes, 13189 genes are robust. End of robust gene identification. Begin size normalization: shiftlog and HVGs selection pearson normalizing counts per cell The following highly-expressed genes are not considered during normalization factor computation: ['Hba-a1', 'Malat1', 'Ptgds', 'Hbb-bt'] finished (0:00:00) extracting highly variable genes --> added 'highly_variable', boolean vector (adata.var) 'highly_variable_rank', float vector (adata.var) 'highly_variable_nbatches', int vector (adata.var) 'highly_variable_intersection', boolean vector (adata.var) 'means', float vector (adata.var) 'variances', float vector (adata.var) 'residual_variances', float vector (adata.var) Time to analyze data in cpu: 1.541149377822876 seconds. End of size normalization: shiftlog and HVGs selection pearson ... as `zero_center=True`, sparse input is densified and may lead to large memory consumption
Let us inspect the contribution of single PCs to the total variance in the data. This gives us information about how many PCs we should consider in order to compute the neighborhood relations of cells. In our experience, often a rough estimate of the number of PCs does fine.
Trajectory inference with diffusion map¶
Here, we used ov.single.TrajInfer
to construct a Trajectory Inference object.
Traj=ov.single.TrajInfer(adata,basis='X_umap',groupby='clusters',
use_rep='scaled|original|X_pca',n_comps=50,)
Traj.set_origin_cells('nIPC')
Traj.inference(method='diffusion_map')
computing neighbors finished: added to `.uns['neighbors']` `.obsp['distances']`, distances for each pair of neighbors `.obsp['connectivities']`, weighted adjacency matrix (0:00:05) computing Diffusion Maps using n_comps=15(=n_dcs) computing transitions finished (0:00:00) eigenvalues of transition matrix [1. 0.9997221 0.99916786 0.9990551 0.99901944 0.99856067 0.9955311 0.9947327 0.9926525 0.9906649 0.97991765 0.97849923 0.9777416 0.97599846 0.9694986 ] finished: added 'X_diffmap', diffmap coordinates (adata.obsm) 'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:00) computing neighbors finished: added to `.uns['neighbors']` `.obsp['distances']`, distances for each pair of neighbors `.obsp['connectivities']`, weighted adjacency matrix (0:00:00) drawing single-cell graph using layout 'fa' WARNING: Package 'fa2' is not installed, falling back to layout 'fr'.To use the faster and better ForceAtlas2 layout, install package 'fa2' (`pip install fa2`). finished: added 'X_draw_graph_fr', graph_drawing coordinates (adata.obsm) (0:00:08) computing Diffusion Pseudotime using n_dcs=10 finished: added 'dpt_pseudotime', the pseudotime (adata.obs) (0:00:00) computing neighbors finished: added to `.uns['neighbors']` `.obsp['distances']`, distances for each pair of neighbors `.obsp['connectivities']`, weighted adjacency matrix (0:00:00)
PAGA graph abstraction has benchmarked as top-performing method for trajectory inference. It provides a graph-like map of the data topology with weighted edges corresponding to the connectivity between two clusters.
Here, PAGA is extended by neighbor directionality.
ov.utils.cal_paga(adata,use_time_prior='dpt_pseudotime',vkey='paga',
groups='clusters')
running PAGA using priors: ['dpt_pseudotime'] finished added 'paga/connectivities', connectivities adjacency (adata.uns) 'paga/connectivities_tree', connectivities subtree (adata.uns) 'paga/transitions_confidence', velocity transitions (adata.uns)
ov.utils.plot_paga(adata,basis='umap', size=50, alpha=.1,title='PAGA LTNN-graph',
min_edge_width=2, node_size_scale=1.5,show=False,legend_loc=False)
Trajectory inference with Slingshot¶
Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.
Traj=ov.single.TrajInfer(adata,basis='X_umap',groupby='clusters',
use_rep='scaled|original|X_pca',n_comps=50)
Traj.set_origin_cells('nIPC')
#Traj.set_terminal_cells(["Granule mature","OL","Astrocytes"])
If you only need the proposed timing and not the lineage of the process, then you can leave the debug_axes parameter unset.
Traj.inference(method='slingshot',num_epochs=10)
Reversing from leaf to root Averaging branch @2 with lineages: [0, 1] [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51160>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51df0>] Averaging branch @5 with lineages: [0, 1, 2, 3] [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51d90>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51dc0>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833c2ecd0>] Averaging branch @13 with lineages: [0, 1, 2, 3, 4] [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833c2ec40>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51e50>] Shrinking branch @13 with curves: [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833c2ec40>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51e50>] Shrinking branch @5 with curves: [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51d90>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51dc0>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833c2ecd0>] Shrinking branch @2 with curves: [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51160>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833e51df0>]
else, you can set debug_axes
to visualize the lineage
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(8, 8))
Traj.inference(method='slingshot',num_epochs=1,debug_axes=axes)
Lineages: [Lineage[13, 9, 5, 2, 8], Lineage[13, 9, 5, 2, 7, 3], Lineage[13, 9, 5, 4, 1], Lineage[13, 9, 5, 6, 11, 10], Lineage[13, 12, 0]]
0%| | 0/1 [00:00<?, ?it/s]
Reversing from leaf to root Averaging branch @2 with lineages: [0, 1] [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca8a00>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833eca790>] Averaging branch @5 with lineages: [0, 1, 2, 3] [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833a4ce80>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca8a30>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca89d0>] Averaging branch @13 with lineages: [0, 1, 2, 3, 4] [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833a15490>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca8a90>] Shrinking branch @13 with curves: [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833a15490>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca8a90>] Shrinking branch @5 with curves: [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833a4ce80>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca8a30>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca89d0>] Shrinking branch @2 with curves: [<pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833ca8a00>, <pcurvepy2.pcurve.PrincipalCurve object at 0x7fa833eca790>]
sc.pp.neighbors(adata,use_rep='scaled|original|X_pca')
ov.utils.cal_paga(adata,use_time_prior='slingshot_pseudotime',vkey='paga',
groups='clusters')
computing neighbors finished: added to `.uns['neighbors']` `.obsp['distances']`, distances for each pair of neighbors `.obsp['connectivities']`, weighted adjacency matrix (0:00:00) running PAGA using priors: ['slingshot_pseudotime'] finished added 'paga/connectivities', connectivities adjacency (adata.uns) 'paga/connectivities_tree', connectivities subtree (adata.uns) 'paga/transitions_confidence', velocity transitions (adata.uns)
ov.utils.plot_paga(adata,basis='umap', size=50, alpha=.1,title='PAGA Slingshot-graph',
min_edge_width=2, node_size_scale=1.5,show=False,legend_loc=False)
Trajectory inference with Palantir¶
Palantir can be run by specifying an approxiate early cell.
Palantir can automatically determine the terminal states as well. In this dataset, we know the terminal states and we will set them using the terminal_states parameter
Here, we used ov.single.TrajInfer
to construct a Trajectory Inference object.
Traj=ov.single.TrajInfer(adata,basis='X_umap',groupby='clusters',
use_rep='scaled|original|X_pca',n_comps=50)
Traj.set_origin_cells('nIPC')
Traj.set_terminal_cells(["Granule mature","OL","Astrocytes"])
Traj.inference(method='palantir',num_waypoints=500)
Palantir results can be visualized on the tSNE or UMAP using the plot_palantir_results function
Once the cells are selected, it's often helpful to visualize the selection on the pseudotime trajectory to ensure we've isolated the correct cells for our specific trend. We can do this using the plot_branch_selection function:
ov.palantir.plot.plot_trajectory(adata, "Granule mature",
cell_color="palantir_entropy",
n_arrows=10,
color="red",
scanpy_kwargs=dict(cmap="RdBu_r"),
)
Palantir uses Mellon Function Estimator to determine the gene expression trends along different lineages. The marker trends can be determined using the following snippet. This computes the trends for all lineages. A subset of lineages can be used using the lineages parameter.
gene_trends = Traj.palantir_cal_gene_trends(
layers="MAGIC_imputed_data",
)
Granule mature [2024-05-09 00:28:06,193] [INFO ] Using covariance function Matern52(ls=1.0). Astrocytes [2024-05-09 00:28:07,603] [INFO ] Using covariance function Matern52(ls=1.0). OL [2024-05-09 00:28:08,414] [INFO ] Using covariance function Matern52(ls=1.0).
We can also use paga to visualize the cell stages
ov.utils.cal_paga(adata,use_time_prior='palantir_pseudotime',vkey='paga',
groups='clusters')
running PAGA using priors: ['palantir_pseudotime'] finished added 'paga/connectivities', connectivities adjacency (adata.uns) 'paga/connectivities_tree', connectivities subtree (adata.uns) 'paga/transitions_confidence', velocity transitions (adata.uns)
ov.utils.plot_paga(adata,basis='umap', size=50, alpha=.1,title='PAGA LTNN-graph',
min_edge_width=2, node_size_scale=1.5,show=False,legend_loc=False)