WGCNA (Weighted gene co-expression network analysis) analysis¶
Weighted gene co-expression network analysis (WGCNA) is a systems biology approach to characterize gene association patterns between different samples and can be used to identify highly synergistic gene sets and identify candidate biomarker genes or therapeutic targets based on the endogeneity of the gene sets and the association between the gene sets and the phenotype.
Paper: WGCNA: an R package for weighted correlation network analysis
Code: Reproduce by Python. Raw is http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA
Colab_Reproducibility:https://colab.research.google.com/drive/1EbP-Tq1IwYO9y1_-zzw23XlPbzrxP0og?usp=sharing
Here, you will be briefly guided through the basics of how to use omicverse to perform wgcna anlysis. Once you are set
import omicverse as ov
ov.utils.ov_plot_set()
/Users/fernandozeng/miniforge3/envs/scbasset/lib/python3.8/site-packages/phate/__init__.py
Load the data¶
The analysis is based on the in-built WGCNA tutorial data.
import pandas as pd
data=ov.utils.read_csv(filepath_or_buffer='https://raw.githubusercontent.com/Starlitnightly/ov/master/sample/LiverFemale3600.csv',
index_col=0)
data.head()
F2_2 | F2_3 | F2_14 | F2_15 | F2_19 | F2_20 | F2_23 | F2_24 | F2_26 | F2_37 | ... | F2_324 | F2_325 | F2_326 | F2_327 | F2_328 | F2_329 | F2_330 | F2_332 | F2_355 | F2_357 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
substanceBXH | |||||||||||||||||||||
MMT00000044 | -0.01810 | 0.0642 | 0.000064 | -0.0580 | 0.04830 | -0.151974 | -0.00129 | -0.23600 | -0.0307 | -0.02610 | ... | 0.047700 | -0.0488 | 0.0168 | -0.0309 | 0.02740 | -0.031 | 0.0660 | -0.0199 | -0.0146 | 0.065000 |
MMT00000046 | -0.07730 | -0.0297 | 0.112000 | -0.0589 | 0.04430 | -0.093800 | 0.09340 | 0.02690 | -0.1330 | 0.07570 | ... | -0.049200 | -0.0350 | -0.0738 | -0.1730 | -0.07380 | -0.201 | -0.0820 | -0.0939 | 0.0192 | -0.049900 |
MMT00000051 | -0.02260 | 0.0617 | -0.129000 | 0.0871 | -0.11500 | -0.065026 | 0.00249 | -0.10200 | 0.1420 | -0.10200 | ... | 0.000612 | 0.1210 | 0.0996 | 0.1090 | 0.02730 | 0.120 | -0.0629 | -0.0395 | 0.1090 | 0.000253 |
MMT00000076 | -0.00924 | -0.1450 | 0.028700 | -0.0439 | 0.00425 | -0.236100 | -0.06900 | 0.01440 | 0.0363 | -0.01820 | ... | -0.270000 | 0.0803 | 0.0424 | 0.1610 | 0.05120 | 0.241 | 0.3890 | 0.0251 | -0.0348 | 0.114000 |
MMT00000080 | -0.04870 | 0.0582 | -0.048300 | -0.0371 | 0.02510 | 0.085043 | 0.04450 | 0.00167 | -0.0680 | 0.00567 | ... | 0.113000 | -0.0859 | -0.1340 | 0.0639 | 0.00731 | 0.124 | -0.0212 | 0.0870 | 0.0512 | 0.024300 |
5 rows × 135 columns
Correlation matrix calculate¶
We can calculate the direct correlation matrix by each gene
gene_wgcna=ov.bulk.pyWGCNA(data,save_path='result')
gene_wgcna.calculate_correlation_direct(method='pearson',save=False)
...correlation coefficient matrix is being calculated
In pyWGCNA module, we need to trans the direct correlation matrix to indirect correlation matrix to calculate the soft threshold, Soft thresholds can help us convert the original correlation network into a scale-free network
gene_wgcna.calculate_correlation_indirect(save=False)
...indirect correlation matrix is being calculated
gene_wgcna.calculate_soft_threshold(save=False)
...soft_threshold is being calculated ...appropriate soft_thresholds: 5
beta | r2 | meank | |
---|---|---|---|
1 | 1 | 0.001434 | 866.510550 |
2 | 2 | 0.223522 | 373.883499 |
3 | 3 | 0.434748 | 217.589775 |
4 | 4 | 0.655388 | 153.295645 |
5 | 5 | 0.889141 | 123.977608 |
6 | 6 | 0.916398 | 111.535289 |
7 | 7 | 0.897792 | 109.220186 |
8 | 8 | 0.852653 | 114.496535 |
9 | 9 | 0.859802 | 126.781878 |
10 | 10 | 0.838656 | 146.668526 |
11 | 11 | 0.838382 | 175.697242 |
The left vertical coordinate is the evaluation metric r2 for a scale-free network. the closer r2 is to 1, the closer the network is to a scale-free network, usually requiring >0.8 or 0.9. the right vertical coordinate is the average connectivity, which decreases as β increases. Combining the two graphs, the β value is usually chosen when r^2 first reaches 0.8 or 0.9 or more. With the β value one can convert the correlation matrix into an adjacency matrix according to Eq.
Then we can construct the Topological Overlap Matrix
gene_wgcna.calculate_corr_matrix()
Building a network of co-expressions¶
We use the dynamicTree to build the co-expressions module
gene_wgcna.calculate_distance()
gene_wgcna.calculate_geneTree()
gene_wgcna.calculate_dynamicMods()
module=gene_wgcna.calculate_gene_module()