Batch correction with scPoli (scArches conditional VAE)#
scPoli (De Donno et al., Nat Methods 2023) is a conditional VAE from the scArches ecosystem that learns one or more per-condition prototypes in the latent space. When given cell-type labels (cell_type_keys), the prototypes anchor each cell type across batches, improving batch removal and enabling prototype-based label transfer to query datasets.
This is one of the omicverse batch-correction zoo tutorials. See batch/index for the overview / decision tree, or ../t_single_batch for the side-by-side comparison of every backend on a real benchmark.
Load a 2-batch demo from pbmc3k#
We use the canonical 10x pbmc3k dataset and synthesise a 2-batch label by random assignment, then plant a gene-shift on batch_B so the uncorrected UMAP shows a visible batch effect. This keeps the notebook self-contained and fast (~2 min end-to-end) — for a real multi-donor benchmark with [scib-metrics] scoring, see ../t_single_batch.
import omicverse as ov
ov.style()
import anndata as ad
import numpy as np
import pandas as pd
# NeurIPS 2021 multimodal hematopoiesis dataset — 3 real donor batches
# (s1d3, s2d1, s3d7), pre-annotated `cell_type` and raw `layers['counts']`.
# Same datasets used by the overview notebook ../t_single_batch.ipynb.
adata1 = ov.datasets.get_adata(
'https://figshare.com/ndownloader/files/41932005',
filename='neurips2021_s1d3.h5ad',
)
adata2 = ov.datasets.get_adata(
'https://figshare.com/ndownloader/files/41932008',
filename='neurips2021_s2d1.h5ad',
)
adata3 = ov.datasets.get_adata(
'https://figshare.com/ndownloader/files/41932011',
filename='neurips2021_s3d7.h5ad',
)
adata = ad.concat([adata1, adata2, adata3], merge='same')
adata.var_names_make_unique()
adata.obs_names_make_unique()
adata.obs['batch'] = adata.obs['batch'].astype('category')
# Subsample to a quick-running ~6 000 cells × 3 batches so the CPU
# backends (harmony / combat / scanorama / cca) finish in well under a
# minute. Drop this line for a full-resolution run.
_rng = np.random.default_rng(0)
_sel = _rng.choice(adata.n_obs, 6000, replace=False)
adata = adata[_sel].copy()
adata
🔬 Starting plot initialization...
🧬 Detecting GPU devices…
✅ NVIDIA CUDA GPUs detected: 1
• [CUDA 0] NVIDIA H100 80GB HBM3
Memory: 79.1 GB | Compute: 9.0
____ _ _ __
/ __ \____ ___ (_)___| | / /__ _____________
/ / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \
/ /_/ / / / / / / / /__ | |/ / __/ / (__ ) __/
\____/_/ /_/ /_/_/\___/ |___/\___/_/ /____/\___/
🔖 Version: 2.2.1rc1 📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.
⚠️ File ./data/neurips2021_s1d3.h5ad already exists
Loading data from ./data/neurips2021_s1d3.h5ad
✅ Successfully loaded: 5935 cells × 13953 genes
⚠️ File ./data/neurips2021_s2d1.h5ad already exists
Loading data from ./data/neurips2021_s2d1.h5ad
✅ Successfully loaded: 10258 cells × 13953 genes
⚠️ File ./data/neurips2021_s3d7.h5ad already exists
Loading data from ./data/neurips2021_s3d7.h5ad
✅ Successfully loaded: 11230 cells × 13953 genes
AnnData object with n_obs × n_vars = 6000 × 13953
obs: 'GEX_n_genes_by_counts', 'GEX_pct_counts_mt', 'GEX_size_factors', 'GEX_phase', 'ADT_n_antibodies_by_counts', 'ADT_total_counts', 'ADT_iso_count', 'cell_type', 'batch', 'ADT_pseudotime_order', 'GEX_pseudotime_order', 'Samplename', 'Site', 'DonorNumber', 'Modality', 'VendorLot', 'DonorID', 'DonorAge', 'DonorBMI', 'DonorBloodType', 'DonorRace', 'Ethnicity', 'DonorGender', 'QCMeds', 'DonorSmoker', 'is_train'
var: 'feature_types', 'gene_id'
obsm: 'ADT_X_pca', 'ADT_X_umap', 'ADT_isotype_controls', 'GEX_X_pca', 'GEX_X_umap'
layers: 'counts'
Preprocess + PCA + cluster#
Same QC → HVG-pearson → log-norm → PCA pipeline shared across every backend in the zoo. A quick Leiden cluster gives a synthetic celltype label that scANVI / scPoli can use as a prototype anchor.
# Standard omicverse preprocess (QC → HVG-via-pearson → log-norm → PCA).
# QC thresholds are loose because the NeurIPS data is already filtered.
adata = ov.pp.qc(adata, tresh={'mito_perc': 0.2, 'nUMIs': 200,
'detected_genes': 100})
ov.utils.store_layers(adata, layers='counts')
adata = ov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=2000,
batch_key=None)
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features].copy()
ov.pp.scale(adata)
ov.pp.pca(adata, layer='scaled', n_pcs=30)
# Neighbours graph for the pre-correction UMAP.
ov.pp.neighbors(adata, use_rep='scaled|original|X_pca', n_neighbors=15)
# The NeurIPS adata already carries a real `cell_type` annotation —
# rename it to `celltype` for the wrapper's expected schema. No Leiden
# needed; the labels are pre-annotated by the dataset authors.
adata.obs['celltype'] = adata.obs['cell_type'].astype('category')
adata
🖥️ Using CPU mode for QC...
Auto-detected mitochondrial prefix: 'MT-'
📊 Step 1: Calculating QC Metrics
✓ Gene Family Detection:
┌──────────────────────────────┬────────────────────┬────────────────────┐
│ Gene Family │ Genes Found │ Detection Method │
├──────────────────────────────┼────────────────────┼────────────────────┤
│ Mitochondrial │ 13 │ Auto (MT-) │
├──────────────────────────────┼────────────────────┼────────────────────┤
│ Ribosomal │ 94 │ Auto (RPS/RPL) │
├──────────────────────────────┼────────────────────┼────────────────────┤
│ Hemoglobin │ 11 │ Auto (regex) │
└──────────────────────────────┴────────────────────┴────────────────────┘
✓ QC Metrics Summary:
┌─────────────────────────┬────────────────────┬─────────────────────────┐
│ Metric │ Mean │ Range (Min - Max) │
├─────────────────────────┼────────────────────┼─────────────────────────┤
│ nUMIs │ 20693 │ 1999 - 627052 │
├─────────────────────────┼────────────────────┼─────────────────────────┤
│ Detected Genes │ 1346 │ 104 - 5543 │
├─────────────────────────┼────────────────────┼─────────────────────────┤
│ Mitochondrial % │ 6.6% │ 0.0% - 20.0% │
├─────────────────────────┼────────────────────┼─────────────────────────┤
│ Ribosomal % │ 24.5% │ 0.1% - 63.4% │
├─────────────────────────┼────────────────────┼─────────────────────────┤
│ Hemoglobin % │ 10.1% │ 0.0% - 96.1% │
└─────────────────────────┴────────────────────┴─────────────────────────┘
📈 Original cell count: 6,000
🔧 Step 2: Quality Filtering (SEURAT)
Thresholds: mito≤0.2, nUMIs≥200, genes≥100
📊 Seurat Filter Results:
• nUMIs filter (≥200): 0 cells failed (0.0%)
• Genes filter (≥100): 0 cells failed (0.0%)
• Mitochondrial filter (≤0.2): 1 cells failed (0.0%)
✓ Filters applied successfully
✓ Combined QC filters: 1 cells removed (0.0%)
🎯 Step 3: Final Filtering
Parameters: min_genes=200, min_cells=3
Ratios: max_genes_ratio=1, max_cells_ratio=1
✓ Final filtering: 44 cells, 0 genes removed
🔍 Step 4: Doublet Detection
💡 Running pyscdblfinder (Python port of R scDblFinder)
🔍 Running scdblfinder detection...
[ScDblFinder] wrote scDblFinder_score + scDblFinder_class — threshold=0.040
✓ scDblFinder completed: 169 doublets removed (2.8%)
╭─ SUMMARY: qc ──────────────────────────────────────────────────────╮
│ Duration: 27.1331s │
│ Shape: 6,000 x 13,953 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● OBS │ ✚ cell_complexity (float) │
│ │ ✚ detected_genes (int) │
│ │ ✚ hb_perc (float) │
│ │ ✚ mito_perc (float) │
│ │ ✚ nUMIs (float) │
│ │ ✚ n_counts (float) │
│ │ ✚ n_genes (int) │
│ │ ✚ n_genes_by_counts (int) │
│ │ ✚ passing_mt (bool) │
│ │ ✚ passing_nUMIs (bool) │
│ │ ✚ passing_ngenes (bool) │
│ │ ✚ pct_counts_hb (float) │
│ │ ✚ pct_counts_mt (float) │
│ │ ✚ pct_counts_ribo (float) │
│ │ ✚ ribo_perc (float) │
│ │ ✚ total_counts (float) │
│ │
│ ● VAR │ ✚ hb (bool) │
│ │ ✚ mt (bool) │
│ │ ✚ ribo (bool) │
│ │
╰────────────────────────────────────────────────────────────────────╯
......The X of adata have been stored in counts
🔍 [2026-05-29 05:07:25] Running preprocessing in 'cpu' mode...
Begin robust gene identification
After filtration, 13953/13953 genes are kept.
Among 13953 genes, 13953 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson
🔍 Count Normalization:
Target sum: 500000.0
Exclude highly expressed: True
Max fraction threshold: 0.2
⚠️ Excluding 6 highly-expressed genes from normalization computation
Excluded genes: ['IGKC', 'HBB', 'MALAT1', 'HBA2', 'IGLC2', 'IGLC3']
✅ Count Normalization Completed Successfully!
✓ Processed: 5,786 cells × 13,953 genes
✓ Runtime: 0.26s
🔍 Highly Variable Genes Selection (Experimental):
Method: pearson_residuals
Target genes: 2,000
Theta (overdispersion): 100
✅ Experimental HVG Selection Completed Successfully!
✓ Selected: 2,000 highly variable genes out of 13,953 total (14.3%)
✓ Results added to AnnData object:
• 'highly_variable': Boolean vector (adata.var)
• 'highly_variable_rank': Float vector (adata.var)
• 'highly_variable_nbatches': Int vector (adata.var)
• 'highly_variable_intersection': Boolean vector (adata.var)
• 'means': Float vector (adata.var)
• 'variances': Float vector (adata.var)
• 'residual_variances': Float vector (adata.var)
Time to analyze data in cpu: 1.99 seconds.
✅ Preprocessing completed successfully.
Added:
'highly_variable_features', boolean vector (adata.var)
'means', float vector (adata.var)
'variances', float vector (adata.var)
'residual_variances', float vector (adata.var)
'counts', raw counts layer (adata.layers)
End of size normalization: shiftlog and HVGs selection pearson
╭─ SUMMARY: preprocess ──────────────────────────────────────────────╮
│ Duration: 2.1775s │
│ Shape: 5,786 x 13,953 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● VAR │ ✚ highly_variable (bool) │
│ │ ✚ highly_variable_features (bool) │
│ │ ✚ highly_variable_rank (float) │
│ │ ✚ means (float) │
│ │ ✚ n_cells (int) │
│ │ ✚ percent_cells (float) │
│ │ ✚ residual_variances (float) │
│ │ ✚ robust (bool) │
│ │ ✚ variances (float) │
│ │
│ ● UNS │ ✚ history_log │
│ │ ✚ hvg │
│ │ ✚ log1p │
│ │
╰────────────────────────────────────────────────────────────────────╯
╭─ SUMMARY: scale ───────────────────────────────────────────────────╮
│ Duration: 0.1515s │
│ Shape: 5,786 x 2,000 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● LAYERS │ ✚ scaled (array, 5786x2000) │
│ │
╰────────────────────────────────────────────────────────────────────╯
computing PCA🔍
with n_comps=30
🖥️ Using sklearn PCA for CPU computation
🖥️ sklearn PCA backend: CPU computation
📊 PCA input data type: ArrayView, shape: (5786, 2000), dtype: float64
🔧 PCA solver used: covariance_eigh
finished✅ (2.41s)
╭─ SUMMARY: pca ─────────────────────────────────────────────────────╮
│ Duration: 2.409s │
│ Shape: 5,786 x 2,000 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● UNS │ ✚ pca │
│ │ └─ params: {'zero_center': True, 'use_highly_variable': Tr...│
│ │ ✚ scaled|original|cum_sum_eigenvalues │
│ │ ✚ scaled|original|pca_var_ratios │
│ │
│ ● OBSM │ ✚ X_pca (array, 5786x30) │
│ │ ✚ scaled|original|X_pca (array, 5786x30) │
│ │
╰────────────────────────────────────────────────────────────────────╯
🖥️ Using Scanpy CPU to calculate neighbors...
🔍 K-Nearest Neighbors Graph Construction:
Mode: cpu
Neighbors: 15
Method: umap
Metric: euclidean
Representation: scaled|original|X_pca
🔍 Computing neighbor distances...
🔍 Computing connectivity matrix...
💡 Using UMAP-style connectivity
✓ Graph is fully connected
✅ KNN Graph Construction Completed Successfully!
✓ Processed: 5,786 cells with 15 neighbors each
✓ Results added to AnnData object:
• 'neighbors': Neighbors metadata (adata.uns)
• 'distances': Distance matrix (adata.obsp)
• 'connectivities': Connectivity matrix (adata.obsp)
╭─ SUMMARY: neighbors ───────────────────────────────────────────────╮
│ Duration: 7.7794s │
│ Shape: 5,786 x 2,000 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● UNS │ ✚ neighbors │
│ │ └─ params: {'n_neighbors': 15, 'method': 'umap', 'random_s...│
│ │
│ ● OBSP │ ✚ connectivities (sparse matrix, 5786x5786) │
│ │ ✚ distances (sparse matrix, 5786x5786) │
│ │
╰────────────────────────────────────────────────────────────────────╯
AnnData object with n_obs × n_vars = 5786 × 2000
obs: 'GEX_n_genes_by_counts', 'GEX_pct_counts_mt', 'GEX_size_factors', 'GEX_phase', 'ADT_n_antibodies_by_counts', 'ADT_total_counts', 'ADT_iso_count', 'cell_type', 'batch', 'ADT_pseudotime_order', 'GEX_pseudotime_order', 'Samplename', 'Site', 'DonorNumber', 'Modality', 'VendorLot', 'DonorID', 'DonorAge', 'DonorBMI', 'DonorBloodType', 'DonorRace', 'Ethnicity', 'DonorGender', 'QCMeds', 'DonorSmoker', 'is_train', 'nUMIs', 'mito_perc', 'ribo_perc', 'hb_perc', 'detected_genes', 'cell_complexity', 'n_counts', 'total_counts', 'n_genes', 'n_genes_by_counts', 'pct_counts_mt', 'pct_counts_ribo', 'pct_counts_hb', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'predicted_doublet', 'doublet_score', 'scdblfinder_doublet', 'scdblfinder_score', 'celltype'
var: 'feature_types', 'gene_id', 'mt', 'ribo', 'hb', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'means', 'variances', 'residual_variances', 'highly_variable_rank', 'highly_variable'
uns: 'status', 'status_args', 'REFERENCE_MANU', '_ov_provenance', 'layers_counts', 'history_log', 'log1p', 'hvg', 'pca', 'scaled|original|pca_var_ratios', 'scaled|original|cum_sum_eigenvalues', 'neighbors'
obsm: 'ADT_X_pca', 'ADT_X_umap', 'ADT_isotype_controls', 'GEX_X_pca', 'GEX_X_umap', 'X_pca', 'scaled|original|X_pca'
varm: 'PCs', 'scaled|original|pca_loadings'
layers: 'counts', 'scaled'
obsp: 'distances', 'connectivities'
Uncorrected baseline#
The planted batch effect is visible in the uncorrected UMAP:
# Pre-correction UMAP shows the planted batch effect.
ov.pp.umap(adata, min_dist=0.3)
adata.obsm['X_umap_uncorrected'] = adata.obsm['X_umap'].copy()
ov.pl.embedding(adata, basis='X_umap_uncorrected',
color=['batch', 'celltype'],
frameon='small', wspace=0.5)
🔍 [2026-05-29 05:07:38] Running UMAP in 'cpu' mode...
🖥️ Using Scanpy CPU UMAP...
🔍 UMAP Dimensionality Reduction:
Mode: cpu
Method: umap
Components: 2
Min distance: 0.3
{'n_neighbors': 15, 'method': 'umap', 'random_state': 0, 'metric': 'euclidean', 'use_rep': 'scaled|original|X_pca'}
🔍 Computing UMAP parameters...
🔍 Computing UMAP embedding (classic method)...
✅ UMAP Dimensionality Reduction Completed Successfully!
✓ Embedding shape: 5,786 cells × 2 dimensions
✓ Results added to AnnData object:
• 'X_umap': UMAP coordinates (adata.obsm)
• 'umap': UMAP parameters (adata.uns)
✅ UMAP completed successfully.
╭─ SUMMARY: umap ────────────────────────────────────────────────────╮
│ Duration: 0.9129s │
│ Shape: 5,786 x 2,000 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● UNS │ ✚ umap │
│ │ └─ params: {'a': np.float64(0.9921756195894755), 'b': np.f...│
│ │
│ ● OBSM │ ✚ X_umap (array, 5786x2) │
│ │
╰────────────────────────────────────────────────────────────────────╯
Run ov.single.batch_correction(methods='scpoli')#
For the scvi-tools family backends, the wrapper auto-routes **kwargs between the model’s __init__ (architecture) and .train() (optimisation) destinations. See the Key parameters section below.
# Workaround: scArches 0.6.1 scPoli.get_latent passes a torch bool
# tensor to scipy sparse indexing, which scipy >=1.16 rejects with
# AttributeError. Densify .X (HVG-only, ~46 MB for this demo)
# before training — fixes the latent extraction call.
import scipy.sparse as sp
if sp.issparse(adata.X):
adata.X = adata.X.toarray()
model = ov.single.batch_correction(
adata,
batch_key='batch',
methods='scPoli',
# Optional but recommended — enables prototype learning:
cell_type_keys='celltype',
# All scPoli architecture + training kwargs left at defaults.
)
model
...Begin using scPoli to correct batch effect
Embedding dictionary:
Num conditions: [3]
Embedding dim: [10]
Encoder Architecture:
Input Layer in, out and cond: 2000 45 10
Mean/Var Layer in/out: 45 10
Decoder Architecture:
First Layer in, out and cond: 10 45 10
Output Layer in/out: 45 2000
Initializing dataloaders
Starting training
|--------------------| 1.0% - val_loss: 2640.22 - val_cvae_loss: 2640.22
|--------------------| 2.0% - val_loss: 2455.53 - val_cvae_loss: 2455.53
|--------------------| 3.0% - val_loss: 2413.36 - val_cvae_loss: 2413.36
|--------------------| 4.0% - val_loss: 2357.41 - val_cvae_loss: 2357.41
|█-------------------| 5.0% - val_loss: 2302.52 - val_cvae_loss: 2302.52
|█-------------------| 6.0% - val_loss: 2279.78 - val_cvae_loss: 2279.78
|█-------------------| 7.0% - val_loss: 2258.94 - val_cvae_loss: 2258.94
|█-------------------| 8.0% - val_loss: 2245.30 - val_cvae_loss: 2245.30
|█-------------------| 9.0% - val_loss: 2209.81 - val_cvae_loss: 2209.81
|██------------------| 10.0% - val_loss: 2200.75 - val_cvae_loss: 2200.75
|██------------------| 11.0% - val_loss: 2173.53 - val_cvae_loss: 2173.53
|██------------------| 12.0% - val_loss: 2167.76 - val_cvae_loss: 2167.76
|██------------------| 13.0% - val_loss: 2157.01 - val_cvae_loss: 2157.01
|██------------------| 14.0% - val_loss: 2151.53 - val_cvae_loss: 2151.53
|███-----------------| 15.0% - val_loss: 2134.89 - val_cvae_loss: 2134.89
|███-----------------| 16.0% - val_loss: 2121.27 - val_cvae_loss: 2121.27
|███-----------------| 17.0% - val_loss: 2120.40 - val_cvae_loss: 2120.40
|███-----------------| 18.0% - val_loss: 2115.19 - val_cvae_loss: 2115.19
|███-----------------| 19.0% - val_loss: 2115.03 - val_cvae_loss: 2115.03
|████----------------| 20.0% - val_loss: 2101.47 - val_cvae_loss: 2101.47
|████----------------| 21.0% - val_loss: 2095.83 - val_cvae_loss: 2095.83
|████----------------| 22.0% - val_loss: 2077.96 - val_cvae_loss: 2077.96
|████----------------| 23.0% - val_loss: 2095.88 - val_cvae_loss: 2095.88
|████----------------| 24.0% - val_loss: 2084.09 - val_cvae_loss: 2084.09
|█████---------------| 25.0% - val_loss: 2079.87 - val_cvae_loss: 2079.87
|█████---------------| 26.0% - val_loss: 2084.62 - val_cvae_loss: 2084.62
|█████---------------| 27.0% - val_loss: 2049.43 - val_cvae_loss: 2049.43
|█████---------------| 28.0% - val_loss: 2060.75 - val_cvae_loss: 2060.75
|█████---------------| 29.0% - val_loss: 2050.24 - val_cvae_loss: 2050.24
|██████--------------| 30.0% - val_loss: 2050.57 - val_cvae_loss: 2050.57
|██████--------------| 31.0% - val_loss: 2058.36 - val_cvae_loss: 2058.36
|██████--------------| 32.0% - val_loss: 2045.13 - val_cvae_loss: 2045.13
|██████--------------| 33.0% - val_loss: 2052.00 - val_cvae_loss: 2052.00
|██████--------------| 34.0% - val_loss: 2051.32 - val_cvae_loss: 2051.32
|███████-------------| 35.0% - val_loss: 2039.11 - val_cvae_loss: 2039.11
|███████-------------| 36.0% - val_loss: 2039.33 - val_cvae_loss: 2039.33
|███████-------------| 37.0% - val_loss: 2031.23 - val_cvae_loss: 2031.23
|███████-------------| 38.0% - val_loss: 2027.71 - val_cvae_loss: 2027.71
|███████-------------| 39.0% - val_loss: 2037.33 - val_cvae_loss: 2037.33
|████████------------| 40.0% - val_loss: 2032.83 - val_cvae_loss: 2032.83
|████████------------| 41.0% - val_loss: 2035.49 - val_cvae_loss: 2035.49
|████████------------| 42.0% - val_loss: 2033.66 - val_cvae_loss: 2033.66
|████████------------| 43.0% - val_loss: 2031.99 - val_cvae_loss: 2031.99
|████████------------| 44.0% - val_loss: 2016.61 - val_cvae_loss: 2016.61
|█████████-----------| 45.0% - val_loss: 2014.81 - val_cvae_loss: 2014.81
|█████████-----------| 46.0% - val_loss: 2014.30 - val_cvae_loss: 2014.30
|█████████-----------| 47.0% - val_loss: 2017.34 - val_cvae_loss: 2017.34
|█████████-----------| 48.0% - val_loss: 2005.31 - val_cvae_loss: 2005.31
|█████████-----------| 49.0% - val_loss: 2010.43 - val_cvae_loss: 2010.43
|██████████----------| 50.0% - val_loss: 2001.22 - val_cvae_loss: 2001.22
|██████████----------| 51.0% - val_loss: 2008.68 - val_cvae_loss: 2008.68
|██████████----------| 52.0% - val_loss: 2003.48 - val_cvae_loss: 2003.48
|██████████----------| 53.0% - val_loss: 1975.68 - val_cvae_loss: 1975.68
|██████████----------| 54.0% - val_loss: 1984.75 - val_cvae_loss: 1984.75
|███████████---------| 55.0% - val_loss: 1996.85 - val_cvae_loss: 1996.85
|███████████---------| 56.0% - val_loss: 2003.41 - val_cvae_loss: 2003.41
|███████████---------| 57.0% - val_loss: 1981.81 - val_cvae_loss: 1981.81
|███████████---------| 58.0% - val_loss: 1999.58 - val_cvae_loss: 1999.58
|███████████---------| 59.0% - val_loss: 2000.44 - val_cvae_loss: 2000.44
|████████████--------| 60.0% - val_loss: 2004.10 - val_cvae_loss: 2004.10
|████████████--------| 61.0% - val_loss: 1991.64 - val_cvae_loss: 1991.64
|████████████--------| 62.0% - val_loss: 1999.72 - val_cvae_loss: 1999.72
|████████████--------| 63.0% - val_loss: 1982.52 - val_cvae_loss: 1982.52
|████████████--------| 64.0% - val_loss: 1988.75 - val_cvae_loss: 1988.75
|█████████████-------| 65.0% - val_loss: 1975.21 - val_cvae_loss: 1975.21
|█████████████-------| 66.0% - val_loss: 1995.25 - val_cvae_loss: 1995.25
|█████████████-------| 67.0% - val_loss: 1987.07 - val_cvae_loss: 1987.07
|█████████████-------| 68.0% - val_loss: 1975.46 - val_cvae_loss: 1975.46
|█████████████-------| 69.0% - val_loss: 1981.28 - val_cvae_loss: 1981.28
|██████████████------| 70.0% - val_loss: 1967.91 - val_cvae_loss: 1967.91
|██████████████------| 71.0% - val_loss: 1978.26 - val_cvae_loss: 1978.26
|██████████████------| 72.0% - val_loss: 1989.08 - val_cvae_loss: 1989.08
|██████████████------| 73.0% - val_loss: 1987.24 - val_cvae_loss: 1987.24
|██████████████------| 74.0% - val_loss: 1969.58 - val_cvae_loss: 1969.58
|███████████████-----| 75.0% - val_loss: 1969.91 - val_cvae_loss: 1969.91
|███████████████-----| 76.0% - val_loss: 1983.17 - val_cvae_loss: 1983.17
|███████████████-----| 77.0% - val_loss: 1966.93 - val_cvae_loss: 1966.93
|███████████████-----| 78.0% - val_loss: 1976.18 - val_cvae_loss: 1976.18
|███████████████-----| 79.0% - val_loss: 1962.97 - val_cvae_loss: 1962.97
|████████████████----| 80.0% - val_loss: 1966.84 - val_cvae_loss: 1966.84
|████████████████----| 81.0% - val_loss: 1972.58 - val_cvae_loss: 1972.58
|████████████████----| 82.0% - val_loss: 1980.91 - val_cvae_loss: 1980.91
|████████████████----| 83.0% - val_loss: 1972.36 - val_cvae_loss: 1972.36
|████████████████----| 84.0% - val_loss: 1962.76 - val_cvae_loss: 1962.76
|█████████████████---| 85.0% - val_loss: 1972.33 - val_cvae_loss: 1972.33
|█████████████████---| 86.0% - val_loss: 1970.84 - val_cvae_loss: 1970.84
|█████████████████---| 87.0% - val_loss: 1949.76 - val_cvae_loss: 1949.76
|█████████████████---| 88.0% - val_loss: 1955.17 - val_cvae_loss: 1955.17
|█████████████████---| 89.0% - val_loss: 1950.95 - val_cvae_loss: 1950.95
|██████████████████--| 90.0% - val_loss: 1970.00 - val_cvae_loss: 1970.00
|██████████████████--| 91.0% - val_loss: 1997.65 - val_cvae_loss: 1977.33 - val_prototype_loss: 20.32 - val_labeled_loss: 20.32
|██████████████████--| 92.0% - val_loss: 1982.34 - val_cvae_loss: 1963.86 - val_prototype_loss: 18.48 - val_labeled_loss: 18.48
|██████████████████--| 93.0% - val_loss: 1976.19 - val_cvae_loss: 1961.00 - val_prototype_loss: 15.19 - val_labeled_loss: 15.19
|██████████████████--| 94.0% - val_loss: 1983.09 - val_cvae_loss: 1968.92 - val_prototype_loss: 14.17 - val_labeled_loss: 14.17
|███████████████████-| 95.0% - val_loss: 1987.37 - val_cvae_loss: 1974.02 - val_prototype_loss: 13.35 - val_labeled_loss: 13.35
|███████████████████-| 96.0% - val_loss: 1975.96 - val_cvae_loss: 1963.42 - val_prototype_loss: 12.54 - val_labeled_loss: 12.54
|███████████████████-| 97.0% - val_loss: 1981.71 - val_cvae_loss: 1969.68 - val_prototype_loss: 12.03 - val_labeled_loss: 12.03
|███████████████████-| 98.0% - val_loss: 1969.35 - val_cvae_loss: 1958.46 - val_prototype_loss: 10.89 - val_labeled_loss: 10.89
|███████████████████-| 99.0% - val_loss: 1970.54 - val_cvae_loss: 1959.50 - val_prototype_loss: 11.04 - val_labeled_loss: 11.04
|████████████████████| 100.0% - val_loss: 1990.40 - val_cvae_loss: 1979.16 - val_prototype_loss: 11.24 - val_labeled_loss: 11.24
╭─ SUMMARY: batch_correction ────────────────────────────────────────╮
│ Duration: 33.0495s │
│ Shape: 5,786 x 2,000 (Unchanged) │
│ │
│ CHANGES DETECTED │
│ ──────────────── │
│ ● OBS │ ✚ conditions_combined (category) │
│ │
│ ● OBSM │ ✚ X_scPoli (array, 5786x10) │
│ │
╰────────────────────────────────────────────────────────────────────╯
<scarches.models.scpoli.scpoli_model.scPoli at 0x7f2745795450>
Corrected embedding#
Every backend writes its corrected representation to a stable obsm key — for this one it is adata.obsm['X_scPoli']. We project via ov.utils.mde for a lightweight UMAP-style display.
Key parameters#
Optional (recommended):
cell_type_keys— obs column(s) for prototype learning.
Architecture (→ scPoli.__init__):
embedding_dims,recon_loss∈ {'nb','zinb','mse'},latent_dim.
Optimisation (→ scPoli.train):
n_epochs,pretraining_epochs,eta,early_stopping_kwargs.