For the complete documentation index, see llms.txt. This page is also available as Markdown.

MOSAIC dataset

Overview

MOSAIC is a flagship Owkin data asset: a large spatially resolved dataset with 6 data modalities per sample across 11 cancer indications in a centralized platform.

11 cancer types covered: NSCLC, Ovarian, Bladder, Mesothelioma, Glioblastoma, Breast, DLBCL, HNSCC, Pancreas, CRC, Gastric.

6 data modalities per sample:

  • Clinical data: medical files and consent, clinically validated

  • Spatial transcriptomics: subsequent slides from a FFPE block, pathology validated

  • Single cell transcriptomics

  • Bulk RNA-Seq

  • Whole Exome Sequencing (WES)

  • Digitized H&E

Sample breakdown

2,716 patients in the study.

  • ~15% of patients have multiple samples

  • ~80% of samples are pre-treatment

  • ~10% of samples are post-treatment

  • ~10% of samples are relapse / recurrence

Clinical data collected

Common forms (patient-related information):

  • Demographics (date of diagnosis, date of last follow-up or death, general demographic information, cancer indication)

  • Consent & Eligibility

  • Subject history

  • Treatment form (oncological treatment types, dosage, routes, dates and response)

  • Oncologic events before inclusion in MOSAIC (progression/recurrence, other cancer; includes OS and PFS calculations)

  • 'End of study' form (cause of end of study or death, if applicable)

  • Follow-up forms (yearly occurrence of novel oncologic events and/or death)

Cancer-type-specific forms:

  • Clinical (height, weight, date of diagnosis, tumor and metastasis location, cTNM, stage; plus tumor-specific information when applicable)

  • Pathology (histological type & subtype, (y)pTNM, prognostic histological features, IHC and FISH results when applicable)

  • Mutations (all known genetic alterations)

Data quality

Consistent and rigorous data generation: uniform sample processing, centralised NGS, rigorous QC steps at every stage.

MOSAIC uses a dynamic, multi-role QC approach across the full workflow:

  • Clinician review — patient selection (validation of cohort inclusion criteria), clinical record review (eCRF completeness, coherence, accuracy), workflow adaptation based on QC of existing database.

  • Pathologist review — block selection (tissue of origin, histological subtype, sample timepoint, tumor content), histology slides QC (cuts, tumor content, scanning and staining artefacts), spatial transcriptomics QC.

  • Biologist review — single-cell annotation (cohort-level cell type annotation, validation of automatic label transfer), clinical record completeness review.

Partner institutions

MOSAIC is generated through partnerships with leading academic medical centers at the forefront of spatial omics research:

  • Gustave Roussy — PI: Fabrice André (ESMO President-elect, h-index 116). World's top 15 hospitals (Newsweek, 2023). Spatial transcriptomics pioneers via Center for Experimental Therapies platform.

  • CHUV Lausanne — PI: Raphaël Gottardo (h-index 60). Strong expertise in spatial biology via PETRA platform.

  • University of Pittsburgh — PI: Robert Ferris (h-index 107). Ranked 3rd in 2022 NIH funding (behind only Johns Hopkins and UCSF). Among the largest academic medical centers in the US.

  • Uniklinikum Erlangen — PI: Arndt Hartmann (h-index 108). Top-11 German hospitals (2023). Oncology cluster of excellence "NCT" with strong expertise in DLBCL, MM, Bladder, Breast, GBM.

  • Charité — PI: Ulrich Keilholz (h-index 79). World's top-10 best hospitals and smart hospitals (Newsweek, 2023). #1 in Germany. Spatial transcriptomics pioneers via MDC center.

Access

The broader MOSAIC dataset is part of K Pro paid tiers. The K Pro Free subset is available as MOSAIC Window above.

Last updated

Was this helpful?