# Data enrichment

K Pro's AI toolkit enriches raw data across three stages: **data generation** (lab protocols and tissue sourcing), **data processing** (SOTA cloud-based QC & ETL pipelines), and **data augmentation** where AI transforms unstructured biological data into quantified, analysis-ready biology.

Four augmentation axis are currently available, each described below.

***

#### AI cell detection: Histomics

Histomics is Owkin's AI-based digital pathology tool for cell detection and segmentation, including tumour-infiltrating lymphocytes (TILs) and tertiary lymphoid structures (TLS).

**Key capabilities:**

* Detects **13 cell types**, including understudied immune populations such as neutrophils and eosinophils
* Trained across **5 cancer types**, leveraging transfer learning to maximise efficiency
* Achieves **24% better F1 classification** of cells and 5% better detection using 5× fewer parameters
* Built on **200,000 consensus annotations** from 10 pathologists

> Reference: Adjadj et al. arXiv 2025

***

#### AI spatial prediction

K Pro can predict gene expression at each spatial spot of a spatial transcriptomics cohort using the associated H\&E tile, enabling near single-cell resolution through model distillation.

The model uses a spatial neighbourhood attention architecture (multi-head attention over tile embeddings from neighbouring spots), and was benchmarked on the HEST dataset:

| Feature extractor | Training data | HEST Average (Pearson) |
| ----------------- | ------------- | ---------------------- |
| Baseline iBOT     | FFCD          | 0.246                  |
| H0                | FFCD          | 0.286                  |
| H0-mini           | FFCD          | 0.344                  |
| **H0-mini**       | **MOSAIC**    | **0.381**              |

> Reference: Schmauch et al. arXiv 2024

***

#### AI enhanced resolution: Deconvolution

K Pro applies deconvolution algorithms to increase the resolution of Visium spatial transcriptomics data down to single-cell level, leveraging paired modalities (H\&E + scRNA-seq + spatial).

Two outputs are supported:

* **Spot-level cell type deconvolution:** answers specific tumour microenvironment (TME) questions by identifying dominant cell types per spot
* **Spatialization of tumour transcriptomic clusters:** maps distinct tumour areas by learning cell signatures from single-cell RNA-seq on paired samples within the same cohort

For reference-free deconvolution, K Pro uses **MixUpVI**, a joint probabilistic model of pseudobulk and single-cell transcriptomics that estimates cell-type proportions without requiring a reference. Published at ICML 2025 (Grouard, Ouardini, Rodriguez, Vert, Espin-Perez).

***

#### AI cell-cell communication

K Pro models local ligand-receptor (LR) interactions using spatial data, without relying on a reference dataset. The pipeline computes LRI values across three diffusion modes — cell contact (no diffusion), secreted signalling (one-neighbour diffusion), and hormone signalling (two-neighbour diffusion) — using prior knowledge tables of ligand-receptor pairs.

Outputs include:

* **Ligand expression by cell type** (dot plot per programme)
* **Cellular communication network** (chord diagram of sender/receiver cell types)
* **Spatial map of LRI values** overlaid on the tissue slide


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.owkin.com/integrate-your-data/data-enrichment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
