For the complete documentation index, see llms.txt. This page is also available as Markdown.

Browse the dataset catalog

K Pro comes with a curated set of datasets ready to use from day one, and gives you the ability to discover and request access to additional datasets from the Owkin catalog. This section describes what is available out of the box in K Pro Free, and how to explore further datasets that can be integrated into your project.

Owkin's data coverage is designed around depth and multimodality while maximizing breadth across all domains.

By default, the K Pro comes with a foundation of public datasets that can support user’s research: TCGA, GTEx, CellxGene, CPTAC and Mosaic Window (free version of our flagship MOSAIC dataset). Other public datasets can be either uploaded into the product, or integrated at your request depending of the volume and complexity of the dataset.

To go beyond our readily available and sublicensable data products are currently concentrated in oncology (11 indications, including NSCLC, breast, ovarian, DLBCL, bladder, GBM, pancreatic, head & neck, and mesothelioma), where we offer one of the most comprehensive multimodal patient-level dataset catalogs available for licensing. This focus reflects deliberate curation rather than a gap, and ensures high data quality, rich annotation, and clinical-grade metadata across these indications.

Beyond our core catalog, our data sourcing offering based on a network of 2.5M+ patient data points extends coverage to additional oncology indications as well as key therapy areas including Inflammation & Immunology (e.g. IBD, SLE, RA), Neurology (Alzheimer's disease), and CVRM.

Regarding data recency, the majority of our datasets include patients enrolled from 2012 onwards, reflecting the period of most significant advances in molecular profiling and digital pathology. Our network infrastructure can support active refresh cycles or any de novo access through our data sourcing offering, with access to data collected through the current year for select partners and indications.

Geographically, our proprietary data products draw from both US and EU cohorts, providing transatlantic representation that supports regulatory-relevant diversity. Our sourcing network extends to the APAC region for targeted data acquisition when geographic diversity is a project requirement.

Datasets available in K Pro FreeBrowse for additional datasets of interests

The following pages further specify data assets available on K Pro:

TCGA datasetMosaic WindowMOSAIC dataset

Last updated

Was this helpful?