Cell interaction with CellPhoneDB¶

CellPhoneDB is a publicly available repository of curated receptors, ligands and their interactions in HUMAN. CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own single-cell transcriptomics data.

Paper: Single-cell reconstruction of the early maternal–fetal interface in humans

Code: https://github.com/ventolab/CellphoneDB

Colab_Reproducibility：https://colab.research.google.com/drive/1SH-GgG6pxGYyk1I4niSQXWGxwwZcsMec?usp=sharing

This notebook will demonstrate how to use CellPhoneDB on scRNA data and visualize it.

In [1]:

Copied!





import scanpy as sc
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import omicverse as ov
import anndata
import os

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=80, facecolor='white')
import scanpy as sc
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import omicverse as ov
import anndata
import os

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=80, facecolor='white')

The EVT Data¶

Th EVT data have finished the celltype annotation, it can be download from the tutorial of CellPhoneDB.

Download: https://github.com/ventolab/CellphoneDB/blob/master/notebooks/data_tutorial.zip

In [2]:

Copied!

adata=sc.read('data/data_tutorial/normalised_log_counts.h5ad')
adata
adata=sc.read('data/data_tutorial/normalised_log_counts.h5ad')
adata

Out[2]:

AnnData object with n_obs × n_vars = 3312 × 30800
    obs: 'n_genes', 'n_counts', 'cell_labels'
    var: 'gene_ids', 'feature_types'
    uns: 'neighbors_scVI_n_latent_14_sample_n_layers_3', 'neighbors_scVI_n_latent_20_sample_n_layers_3', 'umap'
    obsm: 'X_scVI_n_latent_14_sample_n_layers_3', 'X_scVI_n_latent_20_sample_n_layers_3', 'X_umap', 'X_umap_scVI_n_latent_14_sample_n_layers_3', 'X_umap_scVI_n_latent_20_sample_n_layers_3'
    obsp: 'neighbors_scVI_n_latent_14_sample_n_layers_3_connectivities', 'neighbors_scVI_n_latent_14_sample_n_layers_3_distances', 'neighbors_scVI_n_latent_20_sample_n_layers_3_connectivities', 'neighbors_scVI_n_latent_20_sample_n_layers_3_distances'

In [3]:

Copied!

adata.X.max()
adata.X.max()

Out[3]:

7.7138753

We can clearly see that the maximum value of the data is a floating point number less than 10. The fact that the maximum value is not an integer means that it has been normalised, and the fact that it is less than 10 means that it has been logarithmised. Note that our data cannot be scaled.

Export the anndata object¶

As the input to CellPhoneDB only requires the expression matrix and cell type, we extracted only the expression matrix and cell type from adata for the next step of analysis

In [4]:

Copied!





sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata1=anndata.AnnData(adata.X,obs=pd.DataFrame(index=adata.obs.index),
                          var=pd.DataFrame(index=adata.var.index))
adata1.write_h5ad('data/data_tutorial/norm_log.h5ad',compression='gzip')
adata1
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata1=anndata.AnnData(adata.X,obs=pd.DataFrame(index=adata.obs.index),
                          var=pd.DataFrame(index=adata.var.index))
adata1.write_h5ad('data/data_tutorial/norm_log.h5ad',compression='gzip')
adata1

filtered out 6989 genes that are detected in less than 3 cells

Out[4]:

AnnData object with n_obs × n_vars = 3312 × 23811

Export the meta info of cells¶

we construct a DataFrame object to export the meta info of cells. In EVT adata object, the celltypes were stored in the obs['cell_labels']

In [5]:

Copied!





#meta导出
df_meta = pd.DataFrame(data={'Cell':list(adata[adata1.obs.index].obs.index),
                             'cell_type':[ i for i in adata[adata1.obs.index].obs['cell_labels']]
                            })
df_meta.set_index('Cell', inplace=True)
df_meta.to_csv('data/data_tutorial/meta.tsv', sep = '\t')
#meta导出
df_meta = pd.DataFrame(data={'Cell':list(adata[adata1.obs.index].obs.index),
                             'cell_type':[ i for i in adata[adata1.obs.index].obs['cell_labels']]
                            })
df_meta.set_index('Cell', inplace=True)
df_meta.to_csv('data/data_tutorial/meta.tsv', sep = '\t')

Cell interaction analysis¶

Now, we prepare the meta info of cells meta.tsv and matrix of scRNA-eq norm_log.h5ad, we can use the method of CellPhoneDB to calculate the interaction of each celltype in scRNA-seq data.

Importantly, to avoid a series of bugs, we set the absolute path for CellPhoneDB analysis. we use os.getcwd() to get the path now analysis.

In [6]:

Copied!

import os
os.getcwd()
import os
os.getcwd()

Out[6]:

'/Users/fernandozeng/Desktop/Pyomic/tutorial'

Another thing to note is that we need to download the cellphonedb.zip file from cellphonedb-data for further analysis. I have placed it in the data/CPDB directory, but you can place it in any path you are interested in

Downloads: https://github.com/ventolab/cellphonedb-data

In [7]:

Copied!





cpdb_file_path = os.getcwd()+'/data/CPDB/cellphonedb.zip'
meta_file_path = os.getcwd()+'/data/data_tutorial/meta.tsv'
counts_file_path = os.getcwd()+'/data/data_tutorial/norm_log.h5ad'
microenvs_file_path = None
out_path =os.getcwd()+'/data/data_tutorial/test_cellphone'
cpdb_file_path = os.getcwd()+'/data/CPDB/cellphonedb.zip'
meta_file_path = os.getcwd()+'/data/data_tutorial/meta.tsv'
counts_file_path = os.getcwd()+'/data/data_tutorial/norm_log.h5ad'
microenvs_file_path = None
out_path =os.getcwd()+'/data/data_tutorial/test_cellphone'

Now we run cpdb_statistical_analysis_method to predicted the cell interaction in scRNA-seq

In [8]:

Copied!





from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
    microenvs_file_path = microenvs_file_path,       # optional (default: None): defines cells per microenvironment.
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 4,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
    cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
    meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
    counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
    counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
    microenvs_file_path = microenvs_file_path,       # optional (default: None): defines cells per microenvironment.
    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
    threads = 4,                                     # number of threads to use in the analysis.
    debug_seed = 42,                                 # debug randome seed. To disable >=0.
    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
    pvalue = 0.05,                                   # P-value threshold to employ for significance.
    subsampling = False,                             # To enable subsampling the data (geometri sketching).
    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
    separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
    output_path = out_path,                          # Path to save results.
    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )

Reading user files...
The following user files were loaded successfully:
/Users/fernandozeng/Desktop/Pyomic/tutorial/data/data_tutorial/norm_log.h5ad
/Users/fernandozeng/Desktop/Pyomic/tutorial/data/data_tutorial/meta.tsv
[ ][CORE][02/04/23-01:01:30][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:4 Precision:3
[ ][CORE][02/04/23-01:01:30][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][02/04/23-01:01:31][INFO] Running Real Analysis
[ ][CORE][02/04/23-01:01:31][INFO] Running Statistical Analysis

100%|███████████████████████████████████████| 1000/1000 [00:44<00:00, 22.69it/s]

[ ][CORE][02/04/23-01:02:15][INFO] Building Pvalues result

[ ][CORE][02/04/23-01:02:17][INFO] Building results
Saved deconvoluted to /Users/fernandozeng/Desktop/Pyomic/tutorial/data/data_tutorial/test_cellphone/statistical_analysis_deconvoluted_04_02_2023_01:02:18.txt
Saved means to /Users/fernandozeng/Desktop/Pyomic/tutorial/data/data_tutorial/test_cellphone/statistical_analysis_means_04_02_2023_01:02:18.txt
Saved pvalues to /Users/fernandozeng/Desktop/Pyomic/tutorial/data/data_tutorial/test_cellphone/statistical_analysis_pvalues_04_02_2023_01:02:18.txt
Saved significant_means to /Users/fernandozeng/Desktop/Pyomic/tutorial/data/data_tutorial/test_cellphone/statistical_analysis_significant_means_04_02_2023_01:02:18.txt

Network of celltype analysis¶

It is worth noting that we will be using ov for all downstream analysis, starting with cell network analysis, where we provide the ov.single.cpdb_network_cal function to extract interactions, and the ov.single.cpdb_plot_network function for very elegant visualization

In [9]:

Copied!

interaction=ov.single.cpdb_network_cal(adata = adata,
        pvals = pvalues,
        celltype_key = "cell_labels",)
interaction=ov.single.cpdb_network_cal(adata = adata,
        pvals = pvalues,
        celltype_key = "cell_labels",)

In [10]:

Copied!

interaction['interaction_edges'].head()
interaction['interaction_edges'].head()

Out[10]:

	SOURCE	TARGET	COUNT
0	B_cells	B_cells	16
1	B_cells	DC	32
2	B_cells	EVT_1	13
3	B_cells	EVT_2	26
4	B_cells	Endo_F	37

In [11]:

Copied!





ov.single.cpdb_plot_network(adata=adata,
                  interaction_edges=interaction['interaction_edges'],
                  celltype_key='cell_labels',
                  nodecolor_dict=None,title='EVT Network',
                  edgeswidth_scale=25,nodesize_scale=10,
                  pos_scale=1,pos_size=10,figsize=(10,10),
                  legend_ncol=3,legend_bbox=(0.8,0.2),legend_fontsize=10)
ov.single.cpdb_plot_network(adata=adata,
                  interaction_edges=interaction['interaction_edges'],
                  celltype_key='cell_labels',
                  nodecolor_dict=None,title='EVT Network',
                  edgeswidth_scale=25,nodesize_scale=10,
                  pos_scale=1,pos_size=10,figsize=(10,10),
                  legend_ncol=3,legend_bbox=(0.8,0.2),legend_fontsize=10)

Out[11]:

<AxesSubplot: title={'center': 'EVT Network'}>

No description has been provided for this image

Sometimes, the whole network you don't want to use for analysis, the sub-network is useful for analysis. we can exacted the sub-network from it.

We need to exacted the sub-interaction first, we assumed that the five celltypes ['EVT_1','EVT_2','dNK1','dNK2','dNK3'] which is interested

In [12]:

Copied!

sub_i=interaction['interaction_edges']
sub_i=sub_i.loc[sub_i['SOURCE'].isin(['EVT_1','EVT_2','dNK1','dNK2','dNK3'])]
sub_i=sub_i.loc[sub_i['TARGET'].isin(['EVT_1','EVT_2','dNK1','dNK2','dNK3'])]
sub_i=interaction['interaction_edges']
sub_i=sub_i.loc[sub_i['SOURCE'].isin(['EVT_1','EVT_2','dNK1','dNK2','dNK3'])]
sub_i=sub_i.loc[sub_i['TARGET'].isin(['EVT_1','EVT_2','dNK1','dNK2','dNK3'])]

Then, we exacted the sub-anndata object

In [13]:

Copied!

sub_adata=adata[adata.obs['cell_labels'].isin(['EVT_1','EVT_2','dNK1','dNK2','dNK3'])]
sub_adata
sub_adata=adata[adata.obs['cell_labels'].isin(['EVT_1','EVT_2','dNK1','dNK2','dNK3'])]
sub_adata

Out[13]:

View of AnnData object with n_obs × n_vars = 336 × 23811
    obs: 'n_genes', 'n_counts', 'cell_labels'
    var: 'gene_ids', 'feature_types', 'n_cells'
    uns: 'neighbors_scVI_n_latent_14_sample_n_layers_3', 'neighbors_scVI_n_latent_20_sample_n_layers_3', 'umap'
    obsm: 'X_scVI_n_latent_14_sample_n_layers_3', 'X_scVI_n_latent_20_sample_n_layers_3', 'X_umap', 'X_umap_scVI_n_latent_14_sample_n_layers_3', 'X_umap_scVI_n_latent_20_sample_n_layers_3'
    obsp: 'neighbors_scVI_n_latent_14_sample_n_layers_3_connectivities', 'neighbors_scVI_n_latent_14_sample_n_layers_3_distances', 'neighbors_scVI_n_latent_20_sample_n_layers_3_connectivities', 'neighbors_scVI_n_latent_20_sample_n_layers_3_distances'

Now we plot the sub-interaction network between the cells in scRNA-seq

In [14]:

Copied!





ov.single.cpdb_plot_network(adata=sub_adata,
                  interaction_edges=sub_i,
                  celltype_key='cell_labels',
                  nodecolor_dict=None,title='Sub-EVT Network',
                  edgeswidth_scale=25,nodesize_scale=1,
                  pos_scale=1,pos_size=10,figsize=(5,5),
                  legend_ncol=3,legend_bbox=(0.8,0.2),legend_fontsize=10)
ov.single.cpdb_plot_network(adata=sub_adata,
                  interaction_edges=sub_i,
                  celltype_key='cell_labels',
                  nodecolor_dict=None,title='Sub-EVT Network',
                  edgeswidth_scale=25,nodesize_scale=1,
                  pos_scale=1,pos_size=10,figsize=(5,5),
                  legend_ncol=3,legend_bbox=(0.8,0.2),legend_fontsize=10)

Out[14]:

<AxesSubplot: title={'center': 'Sub-EVT Network'}>

The ligand-receptor exacted¶

We can set EVT as ligand or receptor to exacted the ligand-receptor proteins from the result of CellPhoneDB.

EVT as ligand¶

The most important step is that we need to extract the results of the analysis with eEVT as the ligand, and here we use ov's function ov.single.cpdb_submeans_exacted to do this

In [17]:

Copied!

sub_means=ov.single.cpdb_submeans_exacted(means,cell_names='eEVT',cell_type='ligand')
sub_means=ov.single.cpdb_submeans_exacted(means,cell_names='eEVT',cell_type='ligand')

In [18]:

Copied!





ax=ov.single.cpdb_plot_interaction(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11", 
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
    genes = None,
    keep_significant_only=True,
    figsize = (2,8),
    title = "eEVT as Ligand",
    max_size = 2,
    highlight_size = 0.75,
    standard_scale = True,
    cmap_name='Reds',
    ytickslabel_fontsize=6,xtickslabel_fontsize=8,title_fontsize=10
)
ax=ov.single.cpdb_plot_interaction(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11", 
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
    genes = None,
    keep_significant_only=True,
    figsize = (2,8),
    title = "eEVT as Ligand",
    max_size = 2,
    highlight_size = 0.75,
    standard_scale = True,
    cmap_name='Reds',
    ytickslabel_fontsize=6,xtickslabel_fontsize=8,title_fontsize=10
)

Sometimes we want to analyse ligand-receptor pathway enrichment or function, so we need to extract ligand-receptor pairs from the significant ligand-receptors filtered out above, and ov provides an easy function ov.single.cpdb_interaction_filtered to do this here

In [19]:

Copied!





cp=ov.single.cpdb_interaction_filtered(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11",
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
)
cp[:10]
cp=ov.single.cpdb_interaction_filtered(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11",
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
)
cp[:10]

Out[19]:

['SEMA4C-PLXNB2',
 'COL4A1-ADGRG6',
 'NRXN1-CLSTN2',
 'FN1-integrin-a4b1-complex',
 'COL27A1-integrin-a1b1-complex',
 'COL4A2-integrin-a1b1-complex',
 'COL4A2-ADGRG6',
 'LPAR2-ADGRE5',
 'COL19A1-integrin-a1b1-complex',
 'AGRN-PTPRS']

EVT as Receptor¶

In [20]:

Copied!

sub_means=ov.single.cpdb_submeans_exacted(means,cell_names='eEVT',cell_type='receptor')
sub_means=ov.single.cpdb_submeans_exacted(means,cell_names='eEVT',cell_type='receptor')

In [21]:

Copied!





ax=ov.single.cpdb_plot_interaction(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11",
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
    genes = None,
    keep_significant_only=True,
    figsize = (2,8),
    title = "eEVT as Receptor",
    max_size = 2,
    highlight_size = 0.75,
    standard_scale = True,
    cmap_name='Blues',
    ytickslabel_fontsize=6,xtickslabel_fontsize=8,title_fontsize=10
)
ax=ov.single.cpdb_plot_interaction(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11",
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
    genes = None,
    keep_significant_only=True,
    figsize = (2,8),
    title = "eEVT as Receptor",
    max_size = 2,
    highlight_size = 0.75,
    standard_scale = True,
    cmap_name='Blues',
    ytickslabel_fontsize=6,xtickslabel_fontsize=8,title_fontsize=10
)

In [22]:

Copied!





cp=ov.single.cpdb_interaction_filtered(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11",
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
)
cp[:10]
cp=ov.single.cpdb_interaction_filtered(
    adata = adata,
    cell_type1 = "eEVT",
    cell_type2 = "PV MYH11|PV STEAP4|PV MMPP11",
    means = sub_means,
    pvals = pvalues,
    celltype_key = "cell_labels",
)
cp[:10]

Out[22]:

['Glutamate-byGLS-and-SLC1A1-GRM8',
 'GDF11-TGFR-AVR2A',
 'SEMA4C-PLXNB2',
 'GAS6-MERTK',
 'COL4A1-ADGRG6',
 'COL4A5-ADGRG6',
 'APOA1-ABCA1',
 'COL14A1-integrin-a1b1-complex',
 'CSF1-CSF1R',
 'NECTIN1-NECTIN4']

In [23]:

Copied!





ligand_receptor_li=[]
for i in cp:
    ligand_receptor_li.append(i.split('-')[0])
    ligand_receptor_li.append(i.split('-')[1])
ligand_receptor_li=list(set(ligand_receptor_li))
len(ligand_receptor_li)
ligand_receptor_li=[]
for i in cp:
    ligand_receptor_li.append(i.split('-')[0])
    ligand_receptor_li.append(i.split('-')[1])
ligand_receptor_li=list(set(ligand_receptor_li))
len(ligand_receptor_li)

Out[23]:

In [ ]: