zoom.core.ZOOM

class zoom.core.ZOOM(expression: PathLike | pandas.DataFrame, SBP: PathLike | pandas.DataFrame, SBP_perm: PathLike | pandas.DataFrame, best_comp: int | None = None, cv: int = 10, seed: int = 123, n_jobs: int = -1)[source]

The ZOOM class serves as fundation for this package.It implements a framework for partial least squares regression (PLS-R) applied to anatomically comprehensive transcriptomics and spatial brain phenotypic data.

Specifically, this class provides:

  • Initialization of input data (expression, SBP, and permutation-based SBP).

  • Configuration of PLS-R parameters such as component number, cross-validation folds, random seed, and parallelization settings.

  • Methods for cross-validation of PLS-R models to determine optimal latent components, assess prediction accuracy, and evaluate statistical significance via spatial permutation tests.

  • Procedures for quantifying gene-level contributions to prediction performance using multiple metrics (PLS1 weights, regression coefficients, and variable importance in projection), with spatial permutation strategies to infer statistical significance.

  • Relate SBP-relevant genes to pre-defined gene set, such as GO, KEGG pathways, gene co-expression modules, and so on.

expression

Gene expression matrix (e.g., AHBA data).

Type:

pd.DataFrame

SBP

Spatial brain phenotype data.

Type:

pd.DataFrame

SBP_perm

Permuted SBP data for null distribution estimation.

Type:

pd.DataFrame

best_comp

Optimal number of PLS-R components. If None, determined via cross-validation.

Type:

int (default is None)

__cv

Number of cross-validation folds.

Type:

int

__seed

Random seed for reproducibility.

Type:

int

__n_jobs

Number of parallel jobs for computation.

Type:

int

PLS_performance

Stores prediction outputs and correlation scores.

Type:

dict

PLS_r

Median correlation between predicted and observed SBP.

Type:

float

PLS_Q2

Cross-validated predictive accuracy statistic.

Type:

float

PLS_p_perm

Permutation-based p-value for prediction performance.

Type:

float

PLS_report

Gene-level statistical report including weights, signs, and permutation p-values.

Type:

pd.DataFrame

weight_perm

Permuted gene weights for null distribution.

Type:

pd.DataFrame

sign_perm

Permuted gene signs for null distribution.

Type:

pd.DataFrame

gsea_res

Results of spatial permutation test based GSEA. - index: Gene terms in gene_sets. - ES: Raw GSEA enrichment scores. - NES: Normalized GSEA enrichment scores. - p_perm: P-values inferred from spatial permutation test. If multiple gene sets are provided, return a dict of such data frames.

Type:

pd.DataFrame or dict

Notes

This class is designed for integrative neurogenomics, enabling rigorous evaluation of gene–SBP associations while accounting for spatial autocorrelation. Its functionality is also inherited by the following subclasses; however, if users intend only to perform imaging-transcriptomics analysis without extending to single-cell transcriptomic datasets, it is recommended to employ this class.

References

GSEA(gene_sets: Dict[str, Dict[str, List[str]] | List[str]], min_size: int = 30, max_size: int = 500, one_sided: bool = True) None[source]

Implementation of spatial permutation test-based GSEA.

Parameters:
  • gene_sets (dict) – Gene set for enrichment analysis, must be organized as {‘Term1’: [Gene1, Gene2,…],…}, or a dict of the above gene set.

  • max_size (min_size &) – Minimum and maximum size of target gene set to be included in GSEA analysis

  • one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.

Returns:

Results of this function are stored on self.gsea_res.

Return type:

None

References

[1] Fulcher, B. D., Arnatkeviciute, A. & Fornito, A. Overcoming false-

positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data. Nat. Commun. 12, 2669 (2021)

[2] Martins, D. et al. Imaging transcriptomics: convergent cellular,

transcriptomic, and molecular neuroimaging signatures in the healthy adult human brain. Cell Rep. 37, 110173 (2021).

cv_PLSR(ncomps: List[int] | numpy.ndarray = range(1, 16), repeats_cv: int = 30, repeats_pred: int = 101) None[source]

Class method for the implementation of cross-validation (CV) partial least squares regression (PLS-R). This function undergoes 3 stages: - Evaluate optimal component number if it is not previously provided. - Evaluate PLS-R prediction performance under optimal parameter. - Infer statistical significance of prediction performance through

spatial permutation test.

Parameters:
  • ncomps ({List[int], np.ndarray}, optional) – Optimal component number candidates.

  • repeats_cv (int) – How many times should optimal component evaluation be run?

  • repeats_pred (int) – How many times should model performance evaluation be run?

Returns:

Results of this function are stored on self.PLS_performance, self.PLS_r, self.PLS_Q2 and self.PLS_p_perm.

Return type:

None

References

Wang, Y. et al. Spatio-molecular profiles shape the human cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).

get_gene_contrib(metric: str = 'VIP', n_boot: int | None = 1000, one_sided: bool = 'True') None[source]

Compute gene-level contribution to PLS-R prediction and infer gene- level statistical significance against spatial autocorrelation.

Parameters:
  • metric (str) – The statistical metric to be used. Must be one of {“PLS1”, “RC”, “VIP”}. - PLS1: PLS1 weights. - RC: Regression coefficient. - VIP: Variable importance in projection.

  • n_boot (int or None, default=1000) – Number of bootstrap iterations to perform if the metric is PLS1.

  • one_sided (bool) – If True, infer statistical significance via one-sided p-values. Else, use two-sided p-values.

Returns:

Results of this function are stored on self.PLS_report, self.weight_perm and self.sign_perm.

Return type:

None

References

[1] Whitaker, K. J., Vértes, P. E., Romero-Garcia, R & Bullmore, E. T.

Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. Proc. Natl Acad. Sci. USA 113, 9105-9110 (2016).

[2] Wang, Y. et al. Spatio-molecular profiles shape the human

cerebellar hierarchy along the sensorimotor-association axis. Cell Rep. 43, 113770 (2024).

[3] Mahieu, B., Qannari, E. M. & Jaillais, B. Extension and

significance testing of Variable Importance in Projection (VIP) indices in Partial Least Squares regression and Principal Components Analysis. Chemom. Intell. Lab. Syst. 242, 104986 (2023).