Overview of MOSAIC data

MOSAIC window

Owkin and world-leading cancer research institutions University of Pittsburgh (USA), Gustave Roussy (FR), Lausanne University Hospital (CH), Erlangen University Hospital (DE), and Charité University Hospital (DE) are collaborating on the MOSAIC (Multi Omics Spatial Atlas In Cancer) initiative. MOSAIC is a landmark international project to revolutionize cancer research through the use of spatial and single cell transcriptomics, offering unprecedented information on the biology of tumors.

Spatial omics allows researchers to examine tumors by revealing the location and molecular activity of tumor and immune cells. It provides a detailed map of molecular interactions, allowing scientists to decipher key relationships between a tumor and its environment. By generating and analyzing unprecedented amounts of spatial omics data in combination with multimodal patient data and artificial intelligence, MOSAIC aims to unlock the next wave of treatments for some of the most difficult-to-treat cancers.

As of April 2025, more than 1900 samples have been included in the study. A subset of this data, called MOSAIC Window and including 60 patients in 5 cancer indications, is directly searchable here.

Data Modalities

MOSAIC includes the following modalities:

I) Clinical data

II) Hematoxylin and Eosin (H&E) microscopic images

III) Spatial transcriptomics (ST) by 10X Visium SD

IV) Single-Nuclei RNA transcriptomics (snRNAseq) by 10X Chromium Flex

V) Bulk Ribonucleic Acid Sequencing (bRNAseq)

VI) Bulk Whole Exome Sequencing (WES)

  1. Clinical data:

The clinical data include detailed information such as demographics, medical history, cancer and treatment-related information, date and nature of sampling, and oncologic events during follow-up.

Data is collected via an electronic Case Report Form (eCRF) at inclusion in MOSAIC, capturing all relevant history between date of cancer diagnosis and inclusion date. For patients who are alive and under follow-up at the time of inclusion, clinical data—particularly oncologic outcomes—are collected annually for up to five years after inclusion. Additionally, data are collected at the end of the study, which occurs at a maximum of six years post-inclusion, on the date of death, or if the patient is lost to follow-up, whichever comes first.

  1. H&E:

The imaging data are H&E microscopic images, generated with Leica Aperio CS2 scanner with 40X magnification in all the participating centers to minimize batch effects due to different scanners. Standard site-specific staining protocols are used to reflect the variation across pathology institutes and allow models to adjust for out-of-domain effect. Complementing H&E images are captured with routine diagnosis H&E protocols on the Visium slides to aid data integration. Pathologist annotations on the tissue structures are also collected.

  1. ST:

The core modality of the MOSAIC Study is ST through the Visium Cytassist standard definition protocol from 10X Genomics. Visium Spatial Gene Expression is a next-generation molecular profiling solution for classifying tissue based on hybridization of gene-specific probes of over 18,000 genes in human samples. With a total capture area of 6.5 x 6.5 mm, Visium defines ~5000 spots with a 55 µm resolution in diameter corresponding to about 1-10 cells on average (Ref: LIT000128 - Rev C - Product Sheet - Spatial biology without limits: Spatially resolve gene expression in FFPE samples). The 6.5 x 6.5 mm area is selected by pathologists according to the following criteria, ordered by priority: good representation of the tumor with a minimal tumor burden of 40% and covering at least 20% of the selected area; adjacent non-tumoral tissue if possible, invasive margin if possible, TLS if possible. Samples were sequenced at 25,000 reads per spot covered by tissue.

  1. scRNAseq:

Since the Visium platform used is not single-cell resolution, spatial data is complemented with transcriptomes from single-nuclei isolated from the FFPE blocks. This snRNAseq adapted from the Chromium FLEX protocol from 10X Genomics for FFPE tissues uses similar probe-based technology as Visium (Vallejo, A. F. et al., 2022). Pools of 4 or 16 samples were processed in one well, aiming for the recovery of 10,000 or 8,000 nuclei correspondingly in each sample. Samples are sequenced with 10,000 reads per cell.

  1. bRNAseq:

Bulk RNA sequencing is performed to assess the general transcriptome of the tumor, and to be able to compare the cell-level data with earlier studies that employed only bRNAseq methodology.

  1. WES:

WES targets the protein-coding regions, known as exons, of an individual's genome. By only sequencing the exome, which constitutes approximately 1-2% of the genome but harbors most disease-causing mutations, WES efficiently identifies genetic variations linked to disease.

Full Mosaic Cohort

Summary

As of April 2025, MOSAIC comprises samples from bladder cancer (BLCA), ovarian cancer (OV), breast cancer (BRCA), non-small cell lung carcinoma (NSCLC), mesothelioma (MESO), head and neck squamous cell carcinoma (HNSCC), pancreatic cancer (PAAD), glioblastoma (GBM), and diffuse large B-cell lymphoma (DLBCL). Samples were collected at various timepoints depending on the sub-cohorts within each cancer indication: at diagnosis, after neoadjuvant chemotherapy (post-NACT) or neoadjuvant immunotherapy (post-NAIO), and at relapse.

BLCA

BLCA samples are obtained from either primitive muscle-invasive bladder urothelial carcinoma or form metastatic lesions in case of advanced disease. In case of localized or locally advanced disease, samples were obtained from a baseline Trans-Urethral Bladder Resection (TURBT) or cystectomy. In case of NACT, samples from post-NACT cystectomies with residual tumor were also included. NACT consisted of SOC platinum/Gemcitabine for the majority of patients, or MVAC for some other patients. Two other cohorts of locally advanced unresectable/metastatic patients were included, one treated with upfront chemotherapy and another cohort treated with first or second-line combination of Pembrolizumab and Enfortumab Vedotin (NECTIN4 ADC). As of April 2025, more than 250 BLCA samples were included in MOSAIC.

OV

Ovarian cancer samples are collected from primary tumor sites (ovary and fallopian tube) as well as peritoneal metastatic sites (peritoneum, omentum, and carcinomatosis nodules) at baseline, after neoadjuvant chemotherapy (post-NACT), and upon relapse. Patients were diagnosed with FIGO stage II to IV high-grade serous carcinoma, including cases with homologous recombination deficiency, homologous recombination proficiency, or unknown status, and exhibited either platinum-sensitive or platinum-resistant disease. Treatments comprised upfront surgery or post-NACT interval debulking surgery and chemotherapy according to the standard of care (SOC), with additional therapies—such as bevacizumab, PARP inhibitors, and immune checkpoint inhibitors (ICIs)—administered in first-line or subsequent treatment settings. As of April 2025, more than 350 OV samples were included in MOSAIC.

BRCA

BRCA patients were classified by molecular subtype—triple-negative breast cancer (TNBC), HER2‑positive (HER2+), and hormone receptor‑positive (HR+)—according to the specific cohort criteria. The source of the tissue samples varied and included specimens from the primary tumor (collected either at baseline or post-NACT), from relapse lesions, and from metastatic sites. As of April 2025, more than 250 BRCA samples were included in MOSAIC. For TNBC, samples were obtained from patients who received chemotherapy alone or in combination with ICIs either in the neoadjuvant setting or as first‑line systemic therapy for unresectable locally advanced or metastatic disease. HER2+ patients underwent neoadjuvant HER2‑targeted treatment combined with chemotherapy (with or without hormonotherapy). In addition, baseline samples (collected by biopsy or surgery) were included from a non‑metastatic cohort of HR+ patients who were treated with surgery and adjuvant therapy per SOC and subsequently experienced a relapse within 10 years; if available, the relapse samples were also analyzed. Finally, an additional cohort consisted of patients who were treated with an antibody‑drug conjugate (ADC) across any treatment line and irrespective of their underlying molecular subtype.

NSCLC

NSCLC patients were diagnosed with either lung adenocarcinoma (LUAD) or squamous cell carcinoma (LUSC), depending on the cohort. Samples were collected from either primary tumors or metastatic lesions of EGFR/ALK-negative patients. These patients received ICIs as monotherapy or in combination with chemotherapy—either as first‑line systemic therapy for unresectable, locally advanced or metastatic disease, or as neoadjuvant treatment with chemotherapy, in which case samples were obtained from the primary tumor (at baseline or post-NACT). Two additional cohorts comprised patients with driver mutations (e.g., EGFR, ALK, ROS1, KRAS G12C) who were treated with tyrosine kinase inhibitors (TKIs), with samples acquired after disease progression. A final cohort consisted of baseline surgical samples from stage I–II patients treated with upfront surgery, with relapse samples available when possible. As of April 2025, more than 350 NSCLC samples were included in MOSAIC.

MESO

MESO samples were obtained from either baseline pleural biopsies or surgeries, or post-NACT or neoadjuvant immunotherapy or at relapse. The tumors were classified as either epithelioid or biphasic. Some patients received NACT, sometimes in combination with ICIs. Other patients were treated with anti‑PD‑1 ICI at relapse—either as monotherapy or in combination with anti‑CTLA4 antibodies—or with chemotherapy administered alone. Additionally, a subset of patients who were initially unresectable underwent palliative systemic treatment using chemotherapy, ICI, or a combination thereof. As of April 2025, more than 70 MESO samples were included in MOSAIC.

HNSCC

HNSCC samples were collected from two patient cohorts. The first cohort consisted of individuals with locally advanced disease treated with upfront surgery followed by adjuvant SOC therapy. The second cohort included patients with recurrent and/or metastatic (R/M) disease, with samples obtained either from the recurrent primary tumor or metastatic lesions, prior to receiving first-line systemic ICI monotherapy or in combination with chemotherapy. As of April 2025, more than 100 HNSCC samples were included in MOSAIC.

PAAD

For PAAD, all disease settings were represented. Tissue samples were collected from the primary tumor during upfront surgical resection, or via biopsy in cases of borderline resectable disease; for these cases, post-NACT samples were also accepted. In metastatic disease, samples were obtained from either the primary tumor or metastatic lesions. All metastatic patients received SOC first-line systemic chemotherapy, using regimens such as FOLFIRINOX or Gemcitabine with or without Nab-Paclitaxel. As of April 2025, PAAD inclusions are ongoing in MOSAIC.

GBM

GBM samples, defined as glioblastomas according to the WHO 2021 criteria, were all IDH wildtype and included both unmethylated and methylated MGMT profiles. The samples were obtained during baseline surgery, prior to any adjuvant therapy, and at relapse. Most patients underwent the Stupp protocol, and a subset received bevacizumab at relapse. As of April 2025, more than 100 GBM samples were included in MOSAIC.

DLBCL

DLBCL samples were obtained from Ann-Arbor stage I-IV patients treated with R-CHOP, or ‘R-CHOP-like’ regimens (e.g Pola-RCHP; R-EPOCH; RGCVP; RCEPP; etc). A subset of patients were treated with CAR-T cell therapy, with samples obtained before treatment. Both activated-b-cell (ABC) and Germinal-Center B-cell (GCB) subtypes were included. As of April 2025, more than 150 DLBCL samples were included in MOSAIC.

MOSAIC Window

SUMMARY

MOSAIC Window is a subset of the entire MOSAIC dataset. It contains the data of 60 patients in the following cancer indications: n =15 BLCA; n =15 OV; n = 10 GBM; n = 10 DLBCL; and n = 10 MESO.

BLCA

BLCA cases are stage II and III urothelial carcinoma, and one is a squamous cell carcinoma. They are derived from upfront cystectomies, with most patients receiving complete lymph node dissection and some treated with adjuvant chemotherapy, and chemotherapy or immune checkpoint inhibitors (ICI) at relapse, with or without radiotherapy.

OV

OV samples are FIGO stage II to IV high grade serous carcinoma. They include 7 baseline, 2 post‐NACT, and 6 relapse lesions. Patients were treated by upfront or post-NACT interval debulking surgeries and chemotherapy as per standard of care. Additional treatments included Bevacizumab, PARP inhibitors, and ICI in first or later lines.

GBM

GBM are glioblastoma samples as per the WHO 2021 definition, all IDH wildtype, with six and four samples with unmethylated and methylated MGMT promoter respectively. They were obtained from baseline surgery before standard adjuvant therapy as per standard of care. Patients had 1 to 3 brain tumor sites and tumor sizes ranged from 27 to 80mm in diameter. Five patients received Bevacizumab at relapse.

DLBCL

DLBCL samples are Ann Arbor stage III or IV, with three Activated B-cell-like type (ABC), six Germinal center B-cell group (GCB), and one unknown subtype. They were obtained at baseline before the start of standard therapies (R-CHOP), with targeted interventions at relapse.

MESO sample are Stage I to III pleural mesotheliomas, five epithelioid and five biphasic. They include 9 samples from baseline biopsies and surgeries, and one post-NACT sample. Four patients were treated by neoadjuvant chemotherapy (NACT) with one partial response and 3 progressive diseases. 4 patients received anti-PD-1 ICI at relapse, as monotherapy or in combination with anti-CTLA4 antibodies.

Last updated