COSIE.linkage_construction.perform_weak_linkage_knn
- perform_weak_linkage_knn(adata1, adata2, modality1, modality2, num_hvg=3000)[source]
Construct weak linkage triplet pairs between two datasets of different but biologically related modalities.
This function performs symmetric nearest neighbor search using a shared feature space derived from overlapping features (e.g., shared genes, or mapped protein-gene pairs). For each anchor cell, a positive cell is selected from the other dataset (opposite modality), and a negative cell is randomly selected from the same modality.
For RNA-protein matching, a curated mapping is loaded via load_protein_gene_mapping().
For other epigenomic or transcriptomic modalities, shared features are determined by intersecting .var_names.
Parameters
- adata1AnnData
AnnData object for dataset 1.
- adata2AnnData
AnnData object for dataset 2.
- modality1str
Modality type of adata1. Must be one of: {‘RNA’, ‘RNA_panel2’, ‘Protein’, ‘H3K27me3’, ‘H3K27ac’, ‘ATAC’, ‘H3K4me3’}.
- modality2str
Modality type of adata2. Must be one of: {‘RNA’, ‘RNA_panel2’, ‘Protein’, ‘H3K27me3’, ‘H3K27ac’, ‘ATAC’, ‘H3K4me3’}.
- num_hvgint, optional
Number of highly variable features to retain for feature matching. Default is 3000.
Returns
- tripletsnp.ndarray
An array of shape (n1 + n2, 3), where each row represents:
anchor index
positive index (from the other section via KNN)
negative index (randomly chosen from the same section as anchor)
Indices are in concatenated form:
[0, …, n1 - 1] for data1
[n1, …, n1 + n2 - 1] for data2