# The AI-maturity model

An AI-ready dataset is a collection of biomedical data specifically prepared so that both humans and K-Pro can seamlessly use it for analytics, model training and research. To establish a systematic way of assessing the value of a dataset, Owkin has introduced an AI-Readiness Maturity Model on a 6-level scale for its own datasets:

* **Level 0 - Uncontrolled data:** Data lacks governance, compliance, or minimal metadata for cataloging.
* **Level 1 - Storage & Legal compliance:** Raw data with minimal metadata; stored securely, ISO 27001 compliant, license & IRB in place.
* **Level 2 – Discoverability:** Data dictionary, schema, programmatic metadata access, and version history available.
* **Level 3 – Exploration:** Quality checks documented, summary tables provided, with manifest/ReadMe for dataset exploration.
* **Level 4 – Interoperability:** Standard formats, cross-modality links, automated QC, and full data lineage.
* **Level 5 – Full traceability + Optimized for AI/ML:** Optimized views, precomputed features, reproducibility (code + environment), and detailed audits.

In order for a third-party dataset to be computed by K-Pro, it has to meet some strict requirements (layout, schema, dictionary) picked amongst the ones above, and described in this document. Owkin is keen to support your journey to achieve this.
