COSIE.linkage_construction.perform_weak_linkage_knn

perform_weak_linkage_knn(adata1, adata2, modality1, modality2, num_hvg=3000)[source]

Construct weak linkage triplet pairs between two datasets of different but biologically related modalities.

This function performs symmetric nearest neighbor search using a shared feature space derived from overlapping features (e.g., shared genes, or mapped protein-gene pairs). For each anchor cell, a positive cell is selected from the other dataset (opposite modality), and a negative cell is randomly selected from the same modality.

  • For RNA-protein matching, a curated mapping is loaded via load_protein_gene_mapping().

  • For other epigenomic or transcriptomic modalities, shared features are determined by intersecting .var_names.

Parameters

adata1AnnData

AnnData object for dataset 1.

adata2AnnData

AnnData object for dataset 2.

modality1str

Modality type of adata1. Must be one of: {‘RNA’, ‘RNA_panel2’, ‘Protein’, ‘H3K27me3’, ‘H3K27ac’, ‘ATAC’, ‘H3K4me3’}.

modality2str

Modality type of adata2. Must be one of: {‘RNA’, ‘RNA_panel2’, ‘Protein’, ‘H3K27me3’, ‘H3K27ac’, ‘ATAC’, ‘H3K4me3’}.

num_hvgint, optional

Number of highly variable features to retain for feature matching. Default is 3000.

Returns

tripletsnp.ndarray

An array of shape (n1 + n2, 3), where each row represents:

  • anchor index

  • positive index (from the other section via KNN)

  • negative index (randomly chosen from the same section as anchor)

Indices are in concatenated form:

  • [0, …, n1 - 1] for data1

  • [n1, …, n1 + n2 - 1] for data2