# Data enrichment

K Pro's AI toolkit enriches raw data across three stages: **data generation** (lab protocols and tissue sourcing), **data processing** (SOTA cloud-based QC & ETL pipelines), and **data augmentation** where AI transforms unstructured biological data into quantified, analysis-ready biology.

Four augmentation axis are currently available, each described below.

***

#### AI cell detection: Histomics

Histomics is Owkin's AI-based digital pathology tool for cell detection and segmentation, including tumour-infiltrating lymphocytes (TILs) and tertiary lymphoid structures (TLS).

**Key capabilities:**

* Detects **13 cell types**, including understudied immune populations such as neutrophils and eosinophils
* Trained across **5 cancer types**, leveraging transfer learning to maximise efficiency
* Achieves **24% better F1 classification** of cells and 5% better detection using 5× fewer parameters
* Built on **200,000 consensus annotations** from 10 pathologists

> Reference: Adjadj et al. arXiv 2025

***

#### AI spatial prediction

K Pro can predict gene expression at each spatial spot of a spatial transcriptomics cohort using the associated H\&E tile, enabling near single-cell resolution through model distillation.

The model uses a spatial neighbourhood attention architecture (multi-head attention over tile embeddings from neighbouring spots), and was benchmarked on the HEST dataset:

| Feature extractor | Training data | HEST Average (Pearson) |
| ----------------- | ------------- | ---------------------- |
| Baseline iBOT     | FFCD          | 0.246                  |
| H0                | FFCD          | 0.286                  |
| H0-mini           | FFCD          | 0.344                  |
| **H0-mini**       | **MOSAIC**    | **0.381**              |

> Reference: Schmauch et al. arXiv 2024

***

#### AI enhanced resolution: Deconvolution

K Pro applies deconvolution algorithms to increase the resolution of Visium spatial transcriptomics data down to single-cell level, leveraging paired modalities (H\&E + scRNA-seq + spatial).

Two outputs are supported:

* **Spot-level cell type deconvolution:** answers specific tumour microenvironment (TME) questions by identifying dominant cell types per spot
* **Spatialization of tumour transcriptomic clusters:** maps distinct tumour areas by learning cell signatures from single-cell RNA-seq on paired samples within the same cohort

For reference-free deconvolution, K Pro uses **MixUpVI**, a joint probabilistic model of pseudobulk and single-cell transcriptomics that estimates cell-type proportions without requiring a reference. Published at ICML 2025 (Grouard, Ouardini, Rodriguez, Vert, Espin-Perez).

***

#### AI cell-cell communication

K Pro models local ligand-receptor (LR) interactions using spatial data, without relying on a reference dataset. The pipeline computes LRI values across three diffusion modes — cell contact (no diffusion), secreted signalling (one-neighbour diffusion), and hormone signalling (two-neighbour diffusion) — using prior knowledge tables of ligand-receptor pairs.

Outputs include:

* **Ligand expression by cell type** (dot plot per programme)
* **Cellular communication network** (chord diagram of sender/receiver cell types)
* **Spatial map of LRI values** overlaid on the tissue slide
