MOSAIC dataset

Owkin and world-leading cancer research institutions—University of Pittsburgh (USA), Gustave Roussy (FR), Lausanne University Hospital (CH), Erlangen University Hospital (DE), and Charité University Hospital (DE)—are collaborating on the MOSAIC (Multi Omics Spatial Atlas In Cancer) initiative. MOSAIC is a landmark international project to revolutionize cancer research through the use of spatial and single cell transcriptomics, offering unprecedented information on the biology of tumors.

Spatial omics allows researchers to examine tumors by revealing the location and molecular activity of tumor and immune cells. It provides a detailed map of molecular interactions, allowing scientists to decipher key relationships between a tumor and its environment. By generating and analyzing unprecedented amounts of spatial omics data in combination with multimodal patient data and artificial intelligence, MOSAIC aims to unlock the next wave of treatments for some of the most difficult-to-treat cancers.

As of April 2025, more than 1900 samples have been included in the study.

Data Modalities

MOSAIC includes the following modalities:

  1. Clinical data — Detailed information including demographics, medical history, cancer and treatment-related information, date and nature of sampling, and oncologic events during follow-up. Data is collected via an electronic Case Report Form (eCRF) at inclusion, with annual updates for up to five years post-inclusion.

  2. H&E — Microscopic images generated with Leica Aperio CS2 scanner with 40X magnification in all participating centers. Standard site-specific staining protocols are used.

  3. Spatial Transcriptomics (ST) — Through the Visium Cytassist standard definition protocol from 10X Genomics. Visium Spatial Gene Expression profiles over 18,000 genes in human samples. Total capture area of 6.5 x 6.5 mm defines ~5000 spots with a 55 µm resolution in diameter corresponding to about 1-10 cells on average. Samples were sequenced at 25,000 reads per spot covered by tissue.

  4. Single-Nuclei RNA Transcriptomics (snRNAseq) — Adapted from the Chromium FLEX protocol from 10X Genomics for FFPE tissues. Pools of 4 or 16 samples were processed, aiming for the recovery of 10,000 or 8,000 nuclei correspondingly in each sample. Samples are sequenced with 10,000 reads per cell.

  5. Bulk Ribonucleic Acid Sequencing (bRNAseq) — Performed to assess the general transcriptome of the tumor, enabling comparison with earlier studies that employed only bRNAseq methodology.

  6. Whole Exome Sequencing (WES) — Targets the protein-coding regions (exons) of an individual's genome. Constitutes approximately 1-2% of the genome but harbors most disease-causing mutations.

Full MOSAIC Cohort Summary

As of April 2025, MOSAIC comprises samples from:

  • Bladder cancer (BLCA): 250+ samples

  • Ovarian cancer (OV): 350+ samples

  • Breast cancer (BRCA): 250+ samples

  • Non-small cell lung carcinoma (NSCLC): 350+ samples

  • Mesothelioma (MESO): 70+ samples

  • Head and neck squamous cell carcinoma (HNSCC): 100+ samples

  • Pancreatic cancer (PAAD): Inclusions ongoing

  • Glioblastoma (GBM): 100+ samples

  • Diffuse large B-cell lymphoma (DLBCL): 150+ samples

Samples were collected at various timepoints depending on the sub-cohorts: at diagnosis, after neoadjuvant chemotherapy (post-NACT) or neoadjuvant immunotherapy (post-NAIO), and at relapse.

Last updated

Was this helpful?