Batch-correction — backend zoo#

This zoo holds one tutorial per ov.single.batch_correction(methods=...) backend. Every tutorial follows the same template — load → preprocess → call → embedding plot → key params → related — so you can swap methods by changing one line.

All ten notebooks ship with executed outputs rendered against real data — there is no code-only mode. Demos run on a ~6 000-cell subsample of the NeurIPS 2021 multi-batch hematopoiesis dataset (3 real donors, pre-annotated cell_type), except totalVI which uses scvi.data.pbmcs_10x_cite_seq because it needs real protein counts.

CPU-friendly backends#

Train on CPU in under a minute on the demo dataset.

Method

Tutorial

Family

Strength

Harmony

t_batch_harmony

embedding (iterative clustering)

Fast default; out-of-core; atlas-scale.

ComBat

t_batch_combat

empirical-Bayes (matrix-level)

Returns a corrected expression matrix.

Scanorama

t_batch_scanorama

MNN panorama-stitch

Differing compositions across batches.

Seurat-CCA

t_batch_cca

Canonical correlation analysis

Two-batch pairwise; Seurat parity, no R / rpy2.

Variance-decomposition backend#

Method

Tutorial

Optional dep

Family

Notes

CellANOVA

t_batch_cellanova

cellanova

Variance decomposition

Requires control_dict={pool_name: [batch_labels]} mapping cells expected to be biologically homogeneous across batches.

For the side-by-side comparison of every backend on the same dataset with scib-metrics scoring at the end, see ../t_single_batch.

Architecture#

Every backend writes its corrected representation to a stable obsm slot:

adata.obsm['X_pca_harmony']    # methods='harmony'
adata.obsm['X_combat']         # methods='combat'
adata.obsm['X_scanorama']      # methods='scanorama'
adata.obsm['X_scVI']           # methods='scVI'
adata.obsm['X_scANVI']         # methods='scANVI'
adata.obsm['X_totalVI']        # methods='totalVI'
adata.obsm['X_scPoli']         # methods='scPoli'
adata.obsm['X_cellanova']      # methods='CellANOVA'
adata.obsm['X_concord']        # methods='Concord'
adata.obsm['X_cca']            # methods='cca' / 'seurat_cca'

The mapping lives in omicverse.single._batch._BATCH_OBSM and drives both the per-method tutorials and the tracked decorator’s diagnostic-viz auto-attachment. Downstream tools (cluster, UMAP, CCC) consume any backend’s output via this schema — no if method == ... branching in your downstream code.