COSIE.linkage_construction.preprocess_data_for_subgraphs

preprocess_data_for_subgraphs(data_dict, feature_dict, spatial_loc_dict, linkage_indicator, n_x, n_y, num_hvg=3000)[source]

Split each section into spatial subgraphs and compute cross-section triplet linkages at the subgraph level.

This function partitions the full spatial omics data into n_x × n_y subregions per section, assigns feature and spatial data to each subgraph, and constructs triplet linkages across specified pairs of sections and modalities using the provided linkage_indicator.

Parameters

data_dictdict

A dictionary where each key is a modality (e.g., ‘RNA’, ‘Protein’) and each value is a list of AnnData objects (one per tissue section). Use None if a modality is missing from a section.

feature_dictdict

A dictionary mapping each section name (e.g., ‘s1’, ‘s2’, …) to a sub-dictionary containing processed feature tensors for each modality as torch.FloatTensor. Format: {

‘s1’: {‘RNA’: torch.Tensor, ‘Protein’: torch.Tensor, …}, ‘s2’: {…}

}

spatial_loc_dictdict

A dictionary mapping each section name to a 2D NumPy array of spatial coordinates.

linkage_indicatordict

A dictionary specifying which tissue section pairs and modality pairs should be linked. Format: {

(“s1”, “s2”): [(“RNA”, “RNA”), (“RNA”, “Protein”)], (“s2”, “s3”): [(“ATAC”, “RNA”)]

} means: constructing linkage between section s1 and s2 using both RNA-RNA strong linkage and RNA-Protein weak linkage; constructing linkage between section s2 and s3 using ATAC-RNA linkage

n_xint

Number of spatial divisions along the x-axis per section.

n_yint

Number of spatial divisions along the y-axis per section.

num_hvgint, optional

Number of highly variable features to retain for linkage matching. Default is 3000.

Returns

new_feature_dictdict

Nested dictionary of subgraph-level feature tensors. Format: {section -> subregion index -> modality -> feature tensor}.

new_spatial_loc_dictdict

Nested dictionary of subgraph spatial coordinates. Format: {section -> subregion index -> spatial coordinate array}.

new_linkage_resultsdict

Dictionary storing cross-section linkage triplets at the subgraph level. Keys are 4-tuples (sec1, sub1_idx, sec2, sub2_idx), and values are NumPy arrays of triplets (anchor, positive, negative) in concatenated space.