Data enrichment

K Pro's AI toolkit enriches raw data across three stages: data generation (lab protocols and tissue sourcing), data processing (SOTA cloud-based QC & ETL pipelines), and data augmentation where AI transforms unstructured biological data into quantified, analysis-ready biology.

Four augmentation axis are currently available, each described below.


AI cell detection: Histomics

Histomics is Owkin's AI-based digital pathology tool for cell detection and segmentation, including tumour-infiltrating lymphocytes (TILs) and tertiary lymphoid structures (TLS).

Key capabilities:

  • Detects 13 cell types, including understudied immune populations such as neutrophils and eosinophils

  • Trained across 5 cancer types, leveraging transfer learning to maximise efficiency

  • Achieves 24% better F1 classification of cells and 5% better detection using 5× fewer parameters

  • Built on 200,000 consensus annotations from 10 pathologists

Reference: Adjadj et al. arXiv 2025


AI spatial prediction

K Pro can predict gene expression at each spatial spot of a spatial transcriptomics cohort using the associated H&E tile, enabling near single-cell resolution through model distillation.

The model uses a spatial neighbourhood attention architecture (multi-head attention over tile embeddings from neighbouring spots), and was benchmarked on the HEST dataset:

Feature extractor
Training data
HEST Average (Pearson)

Baseline iBOT

FFCD

0.246

H0

FFCD

0.286

H0-mini

FFCD

0.344

H0-mini

MOSAIC

0.381

Reference: Schmauch et al. arXiv 2024


AI enhanced resolution: Deconvolution

K Pro applies deconvolution algorithms to increase the resolution of Visium spatial transcriptomics data down to single-cell level, leveraging paired modalities (H&E + scRNA-seq + spatial).

Two outputs are supported:

  • Spot-level cell type deconvolution: answers specific tumour microenvironment (TME) questions by identifying dominant cell types per spot

  • Spatialization of tumour transcriptomic clusters: maps distinct tumour areas by learning cell signatures from single-cell RNA-seq on paired samples within the same cohort

For reference-free deconvolution, K Pro uses MixUpVI, a joint probabilistic model of pseudobulk and single-cell transcriptomics that estimates cell-type proportions without requiring a reference. Published at ICML 2025 (Grouard, Ouardini, Rodriguez, Vert, Espin-Perez).


AI cell-cell communication

K Pro models local ligand-receptor (LR) interactions using spatial data, without relying on a reference dataset. The pipeline computes LRI values across three diffusion modes — cell contact (no diffusion), secreted signalling (one-neighbour diffusion), and hormone signalling (two-neighbour diffusion) — using prior knowledge tables of ligand-receptor pairs.

Outputs include:

  • Ligand expression by cell type (dot plot per programme)

  • Cellular communication network (chord diagram of sender/receiver cell types)

  • Spatial map of LRI values overlaid on the tissue slide

Last updated

Was this helpful?