zoom.core.ZOOM
- class zoom.core.ZOOM(expression: PathLike | pandas.DataFrame, SBP: PathLike | pandas.DataFrame, SBP_perm: PathLike | pandas.DataFrame, best_comp: int | None = None, cv: int = 10, seed: int = 123, n_jobs: int = -1)[source]
The ZOOM class serves as fundation for this package.It implements a framework for partial least squares regression (PLS-R) applied to anatomically comprehensive transcriptomics and spatial brain phenotypic data.
Specifically, this class provides:
Initialization of input data (expression, SBP, and permutation-based SBP).
Configuration of PLS-R parameters such as component number, cross-validation folds, random seed, and parallelization settings.
Methods for cross-validation of PLS-R models to determine optimal latent components, assess prediction accuracy, and evaluate statistical significance via spatial permutation tests.
Procedures for quantifying gene-level contributions to prediction performance using multiple metrics (PLS1 weights, regression coefficients, and variable importance in projection), with spatial permutation strategies to infer statistical significance.
Relate SBP-relevant genes to pre-defined gene set, such as GO, KEGG pathways, gene co-expression modules, and so on.
- expression
Gene expression matrix (e.g., AHBA data).
- Type:
pd.DataFrame
- SBP
Spatial brain phenotype data.
- Type:
pd.DataFrame
- SBP_perm
Permuted SBP data for null distribution estimation.
- Type:
pd.DataFrame
- best_comp
Optimal number of PLS-R components. If None, determined via cross-validation.
- Type:
int (default is None)
- __cv
Number of cross-validation folds.
- Type:
int
- __seed
Random seed for reproducibility.
- Type:
int
- __n_jobs
Number of parallel jobs for computation.
- Type:
int
- PLS_performance
Stores prediction outputs and correlation scores.
- Type:
dict
- PLS_r
Median correlation between predicted and observed SBP.
- Type:
float
- PLS_Q2
Cross-validated predictive accuracy statistic.
- Type:
float
- PLS_p_perm
Permutation-based p-value for prediction performance.
- Type:
float
- PLS_report
Gene-level statistical report including weights, signs, and permutation p-values.
- Type:
pd.DataFrame
- weight_perm
Permuted gene weights for null distribution.
- Type:
pd.DataFrame
- sign_perm
Permuted gene signs for null distribution.
- Type:
pd.DataFrame
- gsea_res
Results of spatial permutation test based GSEA. - index: Gene terms in gene_sets. - ES: Raw GSEA enrichment scores. - NES: Normalized GSEA enrichment scores. - p_perm: P-values inferred from spatial permutation test. If multiple gene sets are provided, return a dict of such data frames.
- Type:
pd.DataFrame or dict
Notes
This class is designed for integrative neurogenomics, enabling rigorous evaluation of gene–SBP associations while accounting for spatial autocorrelation. Its functionality is also inherited by the following subclasses; however, if users intend only to perform imaging-transcriptomics analysis without extending to single-cell transcriptomic datasets, it is recommended to employ this class.
References
- GSEA(gene_sets: Dict[str, Dict[str, List[str]] | List[str]], min_size: int = 30, max_size: int = 500, one_sided: bool = True) None[source]
Implementation of spatial permutation test-based GSEA.
- Parameters:
gene_sets (dict) – Gene set for enrichment analysis, must be organized as {‘Term1’: [Gene1, Gene2,…],…}, or a dict of the above gene set.
max_size (min_size &) – Minimum and maximum size of target gene set to be included in GSEA analysis
one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.
- Returns:
Results of this function are stored on self.gsea_res.
- Return type:
None
References
- [1] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-
positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021)
- [2] Martins, D. et al. Imaging transcriptomics: convergent cellular,
transcriptomic, and molecular neuroimaging signatures in the healthy adult human brain. Cell Rep. 37, 110173 (2021).
- cv_PLSR(ncomps: List[int] | numpy.ndarray = range(1, 16), repeats_cv: int = 30, repeats_pred: int = 101) None[source]
Class method for the implementation of cross-validation (CV) partial least squares regression (PLS-R). This function undergoes 3 stages: - Evaluate optimal component number if it is not previously provided. - Evaluate PLS-R prediction performance under optimal parameter. - Infer statistical significance of prediction performance through
spatial permutation test.
- Parameters:
ncomps ({List[int], np.ndarray}, optional) – Optimal component number candidates.
repeats_cv (int) – How many times should optimal component evaluation be run?
repeats_pred (int) – How many times should model performance evaluation be run?
- Returns:
Results of this function are stored on self.PLS_performance, self.PLS_r, self.PLS_Q2 and self.PLS_p_perm.
- Return type:
None
References
Wang, Y. et al. Spatio-molecular profiles shape the human cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).
- get_gene_contrib(metric: str = 'VIP', n_boot: int | None = 1000, one_sided: bool = 'True') None[source]
Compute gene-level contribution to PLS-R prediction and infer gene- level statistical significance against spatial autocorrelation.
- Parameters:
metric (str) – The statistical metric to be used. Must be one of {“PLS1”, “RC”, “VIP”}. - PLS1: PLS1 weights. - RC: Regression coefficient. - VIP: Variable importance in projection.
n_boot (int or None, default=1000) – Number of bootstrap iterations to perform if the metric is PLS1.
one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.
- Returns:
Results of this function are stored on self.PLS_report, self.weight_perm and self.sign_perm.
- Return type:
None
References
- [1] Whitaker, K. J., Vértes, P. E., Romero-Garcia, R & Bullmore, E. T.
Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc. Natl Acad. Sci. USA 113, 9105-9110 (2016).
- [2] Wang, Y. et al. Spatio-molecular profiles shape the human
cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).
- [3] Mahieu, B., Qannari, E. M. & Jaillais, B. Extension and
significance testing of Variable Importance in Projection (VIP) indices in Partial Least Squares regression and Principal Components Analysis. Chemom. Intell. Lab. Syst. 242, 104986 (2023).