zoom.core.ZOOM_SC

class zoom.core.ZOOM_SC(adata: PathLike | anndata.AnnData, expression: PathLike | pandas.DataFrame, SBP: PathLike | pandas.DataFrame, SBP_perm: PathLike | pandas.DataFrame, best_comp: int | None = None, cv: int = 10, processed: bool = False, DS: PathLike | pandas.DataFrame | None = None, QC: bool = True, min_genes: int = 250, min_cells: int = 50, d: int = 50, gss_limit: int = 200, seed: int = 123, n_jobs: int = -1)[source]

The ZOOM_SC class extends traditional imaging-transcriptomics paradigm (ZOOM class) by link SBP-relevant gene sets with single-cell RNA sequencing dataset.

Specifically, this class provides: - Preprocess scRNA-seq for this analysis during initialization. - Calculate single-cell enrichment score of SBP-relevant gene sets and infer

statistical significance at single-cell resolution.

Downstream analyses based on single-cell enrichment scores: - Differential expressed gene analysis. - Region enrichment analysis.

adata

Path to .h5ad file or AnnData object, where scRNA-seq dataset is stored.

Type:: str or AnnData

SBP_scores

Single-cell enrichment scores of SBP-relevant gene sets.

Type:: pd.DataFrame

DS

Gene-level differential stability(DS).

Type:: pd.DataFrame

processed

If False, perprocess scRNA-seq data, skip preprocess pipeline else.

Type:: bool

QC

If True, filter low-quality genes and cells through scanpy - sc.pp.filter_cells(adata, min_genes=min_genes) - sc.pp.filter_genes(adata, min_cells=min_cells)

Type:: bool

d

Number of nearest neighbors for each cell.

Type:: int

gss_limit

Allowed maximum GSS value to avoid over-representation.

Type:: int

References

[1] Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations: of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572-1580 (2022).
[2] Song, L., Chen, W., Hou, J., Guo, M. & Yang, J. Spatially resolved mapping: of cells associated with human complex traits. Nature 641 932-941 (2025).
[3] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-positive: gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021).

This class also learned from references in class ZOOM. If you use functions in class ZOOM, please also cite them.

GSEA(gene_sets: Dict[str, Dict[str, List[str]] | List[str]], min_size: int = 30, max_size: int = 500, one_sided: bool = True) → None

Implementation of spatial permutation test-based GSEA.

Parameters:

gene_sets (dict) – Gene set for enrichment analysis, must be organized as {‘Term1’: [Gene1, Gene2,…],…}, or a dict of the above gene set.
max_size (min_size &) – Minimum and maximum size of target gene set to be included in GSEA analysis
one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.

Returns:

Results of this function are stored on self.gsea_res.

Return type:

None

References

[1] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-: positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021)
[2] Martins, D. et al. Imaging transcriptomics: convergent cellular,: transcriptomic, and molecular neuroimaging signatures in the healthy adult human brain. Cell Rep. 37, 110173 (2021).

cv_PLSR(ncomps: List[int] | numpy.ndarray = range(1, 16), repeats_cv: int = 30, repeats_pred: int = 101) → None

Class method for the implementation of cross-validation (CV) partial least squares regression (PLS-R). This function undergoes 3 stages: - Evaluate optimal component number if it is not previously provided. - Evaluate PLS-R prediction performance under optimal parameter. - Infer statistical significance of prediction performance through

spatial permutation test.

Parameters:

ncomps ({List[int], np.ndarray}, optional) – Optimal component number candidates.
repeats_cv (int) – How many times should optimal component evaluation be run?
repeats_pred (int) – How many times should model performance evaluation be run?

Returns:

Results of this function are stored on self.PLS_performance, self.PLS_r, self.PLS_Q2 and self.PLS_p_perm.

Return type:

None

References

Wang, Y. et al. Spatio-molecular profiles shape the human cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).

downstream_ans(flag_DEG: bool = True, alpha: float = 0.1, min_score: float = 3.0, group: str | None = None, rank_method: str = 'logreg', max_iter: int = 10000, flag_region: bool = True, region_col: str | None = None, batch_col: str | None = None, dataset: List | None = None, indvd_col: str | None = None) → None[source]

Perform downstream analyses based on single-cell enrichment scores.

Parameters:

flag_DEG (bool) – Whether to perform differential expression analysis.
alpha (float) – Significance threshold for enrichment scores.
min_score (float) – Minimum enrichment score threshold.
group (str) – Column name of grouping variable for DEG analysis, must be present in self.adata.obs.
rank_method (str) – Ranking method for DEG analysis (default: “logreg”).
max_iter (int) – Maximum iterations for DEG ranking.
flag_region (bool) – Whether to perform region enrichment analysis, must be present in self.adata.obs.
region_col (str, optional) – Column specifying region identity, must be present in self.adata.obs.
batch_col (str, optional) – Column specifying batch identity, must be present in self.adata.obs.
dataset (list, optional) – Dataset identifiers for region enrichment, must be present in self.adata.obs[batch_col].
indvd_col (str, optional) – Column specifying individual identity, must be present in self.adata.obs.

Returns:

Updates self.adata.

Return type:

None

References

[1] Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell: gene expression data analysis. Genome Biol. 19, 15 (2018).
[2] Yang, L. et al. Projection-TAGs enable multiplex projection tracing and: multi-modal profiling of projection neurons. Nat. Commun. 16, 5557 (2025).

get_SBP_score(direction: bool, gene_size: int, ctrl_match_key: str = 'gss_max', weight_opt: str = 'DS', n_genebin: int = 20, return_ctrl_raw_score: bool = False, return_ctrl_norm_score: bool = False, fdr_method: str = 'fdr_bh', pval: str = 'pval', alpha: float = 0.1, group: str | None = None) → None[source]

Calculate single-cell SBP-relevant enrichment scores and infer statistical significance.

Parameters:

direction (bool) – If True, find gene set relevant to the positive direction of given SBP else negative direction.
gene_size (int) – Number of genes used for scoring.
ctrl_match_key (str) – Gene-level statistic used for matching control and SBP-relevant genes, must be present in adata.uns[‘GENE_STATS’].
weight_opt (str) – Gene-level statistic used for re-weighting SBP-relevant genes, must be present in adata.uns[‘GENE_STATS’].
n_genebin (int) – Number of bins for dividing genes by ctrl_match_key.
return_ctrl_raw_score (bool) – If True, return raw scores for control gene sets.
return_ctrl_norm_score (bool) – If True, return normalized scores for control gene sets.
fdr_method (str) – Method for multiple testing correction.
pval (str) – Column name indicating the cell-level p-values, must be present in self.SBP_scores.
alpha (float) – Significance level for multiple testing correction.
group (str, optional) – Column name indicating the cell groups, based on which p-values are adjusted, must be present in self.adata.obs, only used for group_bh.

Returns:

Results stored in self.SBP_scores.

Return type:

None

References

[1] Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations

of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572-1580 (2022).

[2] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-positive

gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021).

[3] Hu, J. X., Zhao, H. & Zhou, H. H. False discovery rate control with groups.

Am. Stat. Assoc. 105, 1215-1227 (2010).

get_gene_contrib(metric: str = 'VIP', n_boot: int | None = 1000, one_sided: bool = 'True') → None

Compute gene-level contribution to PLS-R prediction and infer gene- level statistical significance against spatial autocorrelation.

Parameters:

metric (str) – The statistical metric to be used. Must be one of {“PLS1”, “RC”, “VIP”}. - PLS1: PLS1 weights. - RC: Regression coefficient. - VIP: Variable importance in projection.
n_boot (int or None, default=1000) – Number of bootstrap iterations to perform if the metric is PLS1.
one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.

Returns:

Results of this function are stored on self.PLS_report, self.weight_perm and self.sign_perm.

Return type:

None

References

[1] Whitaker, K. J., Vértes, P. E., Romero-Garcia, R & Bullmore, E. T.: Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc. Natl Acad. Sci. USA 113, 9105-9110 (2016).
[2] Wang, Y. et al. Spatio-molecular profiles shape the human: cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).
[3] Mahieu, B., Qannari, E. M. & Jaillais, B. Extension and: significance testing of Variable Importance in Projection (VIP) indices in Partial Least Squares regression and Principal Components Analysis. Chemom. Intell. Lab. Syst. 242, 104986 (2023).