COSIE.linkage_construction.preprocess_data_for_subgraphs
- preprocess_data_for_subgraphs(data_dict, feature_dict, spatial_loc_dict, linkage_indicator, n_x, n_y, num_hvg=3000)[source]
Split each section into spatial subgraphs and compute cross-section triplet linkages at the subgraph level.
This function partitions the full spatial omics data into n_x × n_y subregions per section, assigns feature and spatial data to each subgraph, and constructs triplet linkages across specified pairs of sections and modalities using the provided linkage_indicator.
Parameters
- data_dictdict
A dictionary where each key is a modality (e.g., ‘RNA’, ‘Protein’) and each value is a list of AnnData objects (one per tissue section). Use None if a modality is missing from a section.
- feature_dictdict
A dictionary mapping each section name (e.g., ‘s1’, ‘s2’, …) to a sub-dictionary containing processed feature tensors for each modality as torch.FloatTensor. Format: {
‘s1’: {‘RNA’: torch.Tensor, ‘Protein’: torch.Tensor, …}, ‘s2’: {…}
}
- spatial_loc_dictdict
A dictionary mapping each section name to a 2D NumPy array of spatial coordinates.
- linkage_indicatordict
A dictionary specifying which tissue section pairs and modality pairs should be linked. Format: {
(“s1”, “s2”): [(“RNA”, “RNA”), (“RNA”, “Protein”)], (“s2”, “s3”): [(“ATAC”, “RNA”)]
} means: constructing linkage between section s1 and s2 using both RNA-RNA strong linkage and RNA-Protein weak linkage; constructing linkage between section s2 and s3 using ATAC-RNA linkage
- n_xint
Number of spatial divisions along the x-axis per section.
- n_yint
Number of spatial divisions along the y-axis per section.
- num_hvgint, optional
Number of highly variable features to retain for linkage matching. Default is 3000.
Returns
- new_feature_dictdict
Nested dictionary of subgraph-level feature tensors. Format: {section -> subregion index -> modality -> feature tensor}.
- new_spatial_loc_dictdict
Nested dictionary of subgraph spatial coordinates. Format: {section -> subregion index -> spatial coordinate array}.
- new_linkage_resultsdict
Dictionary storing cross-section linkage triplets at the subgraph level. Keys are 4-tuples (sec1, sub1_idx, sec2, sub2_idx), and values are NumPy arrays of triplets (anchor, positive, negative) in concatenated space.