> For the complete documentation index, see [llms.txt](https://docs.owkin.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.owkin.com/explore-and-analyse-data/k-pro-data-model-and-technical-references.md).

# Understand your data (QC, methods)

For your data to be fully exploitable by K Pro's AI agents, it needs to meet a defined set of structural and quality requirements. This section explains how Owkin thinks about AI data readiness, which biological modalities and file formats K Pro currently supports, and the naming conventions and ontologies data must conform to. Whether you are preparing your own dataset for integration or evaluating a third-party dataset, this section is the technical reference you need.

***

K Pro uses a multi-layer quality and provenance framework for public data.

**Data quality.** Public datasets are transformed into a unified K Data Model with schema harmonization, ontology mapping, normalization, and multi-level structuring. Quality controls include relational mapping to avoid orphaned records, pre-computed metadata, and other filtering steps for integrity and fast exploration.

**Provenance tracking.** Every analysis is anchored in authoritative sources and validated biological knowledge bases. The platform keeps complete provenance for outputs and logs data sources, model decisions, and reasoning steps. For literature-backed content, K Pro uses RAG to verify cited PubMed articles.

**Version control / traceability.** The AI-Readiness Maturity Model defines version history at Level 2 and full data lineage at Level 4. Level 5 adds reproducibility with code + environment and detailed audits. Public dataset pages also expose dataset versions where available — for example, the TCGA entry lists a version in the dataset catalog.

**Proprietary data (MOSAIC and similar).** Owkin computational biology teams have developed data processing pipelines following gold standards and have, where necessary, optimized the pipelines for the specific dataset. Biomedical experts have been included in the development loop for conducting confirmatory analyses with the data, annotations of single-cell clusters, etc., additionally ensuring a high quality data for discovery and other typical uses.

{% content-ref url="/pages/RLzBO1KSapCNhQnq1RdP" %}
[The AI-maturity model](/explore-and-analyse-data/k-pro-data-model-and-technical-references/the-ai-maturity-model.md)
{% endcontent-ref %}

{% content-ref url="/pages/plui2xjJVbZAi6NSMbak" %}
[Supported modalities](/explore-and-analyse-data/k-pro-data-model-and-technical-references/supported-modalities.md)
{% endcontent-ref %}

{% content-ref url="/pages/px5r2MV8w8ih27rA0Xy0" %}
[Preferred ontologies and nomenclatures](/explore-and-analyse-data/k-pro-data-model-and-technical-references/preferred-ontologies-and-nomenclatures.md)
{% endcontent-ref %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.owkin.com/explore-and-analyse-data/k-pro-data-model-and-technical-references.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.