zoom.core.ZOOM_SC
- class zoom.core.ZOOM_SC(adata: PathLike | anndata.AnnData, expression: PathLike | pandas.DataFrame, SBP: PathLike | pandas.DataFrame, SBP_perm: PathLike | pandas.DataFrame, best_comp: int | None = None, cv: int = 10, processed: bool = False, DS: PathLike | pandas.DataFrame | None = None, QC: bool = True, min_genes: int = 250, min_cells: int = 50, d: int = 50, gss_limit: int = 200, seed: int = 123, n_jobs: int = -1)[source]
The ZOOM_SC class extends traditional imaging-transcriptomics paradigm (ZOOM class) by link SBP-relevant gene sets with single-cell RNA sequencing dataset.
Specifically, this class provides: - Preprocess scRNA-seq for this analysis during initialization. - Calculate single-cell enrichment score of SBP-relevant gene sets and infer
statistical significance at single-cell resolution.
Downstream analyses based on single-cell enrichment scores: - Differential expressed gene analysis. - Region enrichment analysis.
- adata
Path to .h5ad file or AnnData object, where scRNA-seq dataset is stored.
- Type:
str or AnnData
- SBP_scores
Single-cell enrichment scores of SBP-relevant gene sets.
- Type:
pd.DataFrame
- DS
Gene-level differential stability(DS).
- Type:
pd.DataFrame
- processed
If False, perprocess scRNA-seq data, skip preprocess pipeline else.
- Type:
bool
- QC
If True, filter low-quality genes and cells through scanpy - sc.pp.filter_cells(adata, min_genes=min_genes) - sc.pp.filter_genes(adata, min_cells=min_cells)
- Type:
bool
- d
Number of nearest neighbors for each cell.
- Type:
int
- gss_limit
Allowed maximum GSS value to avoid over-representation.
- Type:
int
References
- [1] Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations
of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572-1580 (2022).
- [2] Song, L., Chen, W., Hou, J., Guo, M. & Yang, J. Spatially resolved mapping
of cells associated with human complex traits. Nature 641 932-941 (2025).
- [3] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-positive
gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021).
This class also learned from references in class ZOOM. If you use functions in class ZOOM, please also cite them.
- GSEA(gene_sets: Dict[str, Dict[str, List[str]] | List[str]], min_size: int = 30, max_size: int = 500, one_sided: bool = True) None
Implementation of spatial permutation test-based GSEA.
- Parameters:
gene_sets (dict) – Gene set for enrichment analysis, must be organized as {‘Term1’: [Gene1, Gene2,…],…}, or a dict of the above gene set.
max_size (min_size &) – Minimum and maximum size of target gene set to be included in GSEA analysis
one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.
- Returns:
Results of this function are stored on self.gsea_res.
- Return type:
None
References
- [1] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-
positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021)
- [2] Martins, D. et al. Imaging transcriptomics: convergent cellular,
transcriptomic, and molecular neuroimaging signatures in the healthy adult human brain. Cell Rep. 37, 110173 (2021).
- cv_PLSR(ncomps: List[int] | numpy.ndarray = range(1, 16), repeats_cv: int = 30, repeats_pred: int = 101) None
Class method for the implementation of cross-validation (CV) partial least squares regression (PLS-R). This function undergoes 3 stages: - Evaluate optimal component number if it is not previously provided. - Evaluate PLS-R prediction performance under optimal parameter. - Infer statistical significance of prediction performance through
spatial permutation test.
- Parameters:
ncomps ({List[int], np.ndarray}, optional) – Optimal component number candidates.
repeats_cv (int) – How many times should optimal component evaluation be run?
repeats_pred (int) – How many times should model performance evaluation be run?
- Returns:
Results of this function are stored on self.PLS_performance, self.PLS_r, self.PLS_Q2 and self.PLS_p_perm.
- Return type:
None
References
Wang, Y. et al. Spatio-molecular profiles shape the human cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).
- downstream_ans(flag_DEG: bool = True, alpha: float = 0.1, min_score: float = 3.0, group: str | None = None, rank_method: str = 'logreg', max_iter: int = 10000, flag_region: bool = True, region_col: str | None = None, batch_col: str | None = None, dataset: List | None = None, indvd_col: str | None = None) None[source]
Perform downstream analyses based on single-cell enrichment scores.
- Parameters:
flag_DEG (bool) – Whether to perform differential expression analysis.
alpha (float) – Significance threshold for enrichment scores.
min_score (float) – Minimum enrichment score threshold.
group (str) – Column name of grouping variable for DEG analysis, must be present in self.adata.obs.
rank_method (str) – Ranking method for DEG analysis (default: “logreg”).
max_iter (int) – Maximum iterations for DEG ranking.
flag_region (bool) – Whether to perform region enrichment analysis, must be present in self.adata.obs.
region_col (str, optional) – Column specifying region identity, must be present in self.adata.obs.
batch_col (str, optional) – Column specifying batch identity, must be present in self.adata.obs.
dataset (list, optional) – Dataset identifiers for region enrichment, must be present in self.adata.obs[batch_col].
indvd_col (str, optional) – Column specifying individual identity, must be present in self.adata.obs.
- Returns:
Updates self.adata.
- Return type:
None
References
- [1] Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell
gene expression data analysis. Genome Biol. 19, 15 (2018).
- [2] Yang, L. et al. Projection-TAGs enable multiplex projection tracing and
multi-modal profiling of projection neurons. Nat. Commun. 16, 5557 (2025).
- get_SBP_score(direction: bool, gene_size: int, ctrl_match_key: str = 'gss_max', weight_opt: str = 'DS', n_genebin: int = 20, return_ctrl_raw_score: bool = False, return_ctrl_norm_score: bool = False, fdr_method: str = 'fdr_bh', pval: str = 'pval', alpha: float = 0.1, group: str | None = None) None[source]
Calculate single-cell SBP-relevant enrichment scores and infer statistical significance.
- Parameters:
direction (bool) – If True, find gene set relevant to the positive direction of given SBP else negative direction.
gene_size (int) – Number of genes used for scoring.
ctrl_match_key (str) – Gene-level statistic used for matching control and SBP-relevant genes, must be present in adata.uns[‘GENE_STATS’].
weight_opt (str) – Gene-level statistic used for re-weighting SBP-relevant genes, must be present in adata.uns[‘GENE_STATS’].
n_genebin (int) – Number of bins for dividing genes by ctrl_match_key.
return_ctrl_raw_score (bool) – If True, return raw scores for control gene sets.
return_ctrl_norm_score (bool) – If True, return normalized scores for control gene sets.
fdr_method (str) – Method for multiple testing correction.
pval (str) – Column name indicating the cell-level p-values, must be present in self.SBP_scores.
alpha (float) – Significance level for multiple testing correction.
group (str, optional) – Column name indicating the cell groups, based on which p-values are adjusted, must be present in self.adata.obs, only used for group_bh.
- Returns:
Results stored in self.SBP_scores.
- Return type:
None
References
- [1] Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations
of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572-1580 (2022).
- [2] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-positive
gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021).
- [3] Hu, J. X., Zhao, H. & Zhou, H. H. False discovery rate control with groups.
Am. Stat. Assoc. 105, 1215-1227 (2010).
- get_gene_contrib(metric: str = 'VIP', n_boot: int | None = 1000, one_sided: bool = 'True') None
Compute gene-level contribution to PLS-R prediction and infer gene- level statistical significance against spatial autocorrelation.
- Parameters:
metric (str) – The statistical metric to be used. Must be one of {“PLS1”, “RC”, “VIP”}. - PLS1: PLS1 weights. - RC: Regression coefficient. - VIP: Variable importance in projection.
n_boot (int or None, default=1000) – Number of bootstrap iterations to perform if the metric is PLS1.
one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.
- Returns:
Results of this function are stored on self.PLS_report, self.weight_perm and self.sign_perm.
- Return type:
None
References
- [1] Whitaker, K. J., Vértes, P. E., Romero-Garcia, R & Bullmore, E. T.
Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc. Natl Acad. Sci. USA 113, 9105-9110 (2016).
- [2] Wang, Y. et al. Spatio-molecular profiles shape the human
cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).
- [3] Mahieu, B., Qannari, E. M. & Jaillais, B. Extension and
significance testing of Variable Importance in Projection (VIP) indices in Partial Least Squares regression and Principal Components Analysis. Chemom. Intell. Lab. Syst. 242, 104986 (2023).