> For the complete documentation index, see [llms.txt](https://docs.owkin.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.owkin.com/explore-and-analyse-data/data-catalog/browse-for-additional-datasets-of-interests/mosaic-dataset.md).

# MOSAIC dataset

## Overview

MOSAIC is a flagship Owkin data asset: a large spatially resolved dataset with **6 data modalities per sample across 11 cancer indications** in a centralized platform.

**11 cancer types covered:** NSCLC, Ovarian, Bladder, Mesothelioma, Glioblastoma, Breast, DLBCL, HNSCC, Pancreas, CRC, Gastric.

**6 data modalities per sample:**

* Clinical data: medical files and consent, clinically validated
* Spatial transcriptomics: subsequent slides from a FFPE block, pathology validated
* Single cell transcriptomics
* Bulk RNA-Seq
* Whole Exome Sequencing (WES)
* Digitized H\&E

**Sample breakdown**

**2,716 patients in the study.**

* \~15% of patients have multiple samples
* \~80% of samples are pre-treatment
* \~10% of samples are post-treatment
* \~10% of samples are relapse / recurrence

## **Clinical data collected**

**Common forms** (patient-related information):

* Demographics (date of diagnosis, date of last follow-up or death, general demographic information, cancer indication)
* Consent & Eligibility
* Subject history
* Treatment form (oncological treatment types, dosage, routes, dates and response)
* Oncologic events before inclusion in MOSAIC (progression/recurrence, other cancer; includes OS and PFS calculations)
* 'End of study' form (cause of end of study or death, if applicable)
* Follow-up forms (yearly occurrence of novel oncologic events and/or death)

**Cancer-type-specific forms:**

* Clinical (height, weight, date of diagnosis, tumor and metastasis location, cTNM, stage; plus tumor-specific information when applicable)
* Pathology (histological type & subtype, (y)pTNM, prognostic histological features, IHC and FISH results when applicable)
* Mutations (all known genetic alterations)

## **Data quality**

**Consistent and rigorous data generation:** uniform sample processing, centralised NGS, rigorous QC steps at every stage.

MOSAIC uses a dynamic, multi-role QC approach across the full workflow:

* **Clinician review** — patient selection (validation of cohort inclusion criteria), clinical record review (eCRF completeness, coherence, accuracy), workflow adaptation based on QC of existing database.
* **Pathologist review** — block selection (tissue of origin, histological subtype, sample timepoint, tumor content), histology slides QC (cuts, tumor content, scanning and staining artefacts), spatial transcriptomics QC.
* **Biologist review** — single-cell annotation (cohort-level cell type annotation, validation of automatic label transfer), clinical record completeness review.

## **Partner institutions**

MOSAIC is generated through partnerships with leading academic medical centers at the forefront of spatial omics research:

* **Gustave Roussy** — PI: Fabrice André (ESMO President-elect, h-index 116). World's top 15 hospitals (Newsweek, 2023). Spatial transcriptomics pioneers via Center for Experimental Therapies platform.
* **CHUV Lausanne** — PI: Raphaël Gottardo (h-index 60). Strong expertise in spatial biology via PETRA platform.
* **University of Pittsburgh** — PI: Robert Ferris (h-index 107). Ranked 3rd in 2022 NIH funding (behind only Johns Hopkins and UCSF). Among the largest academic medical centers in the US.
* **Uniklinikum Erlangen** — PI: Arndt Hartmann (h-index 108). Top-11 German hospitals (2023). Oncology cluster of excellence "NCT" with strong expertise in DLBCL, MM, Bladder, Breast, GBM.
* **Charité** — PI: Ulrich Keilholz (h-index 79). World's top-10 best hospitals and smart hospitals (Newsweek, 2023). #1 in Germany. Spatial transcriptomics pioneers via MDC center.

## **Access**

The broader MOSAIC dataset is part of K Pro paid tiers. The K Pro Free subset is available as **MOSAIC Window** above.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.owkin.com/explore-and-analyse-data/data-catalog/browse-for-additional-datasets-of-interests/mosaic-dataset.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.