# Glossary

### **Platform & Product Terms**

**Activate** An Agentic Space in K Pro designed for translational research tasks such as biomarker identification, patient selection, and cohort stratification. Available from the Standard tier onward, it contains the Deep Patient Explorer and Population Optimizer agents.

**Agentic AI** An AI system capable of autonomously planning and executing multi-step tasks by selecting and coordinating specialized tools and agents in response to a user's intent. K Pro uses an agentic AI architecture to orchestrate complex biomedical analyses.

**Agentic Space** A grouping of K Pro agents designed to address a specific phase of the research or drug development pipeline. The three agentic spaces are Analyze, Activate, and Amplify, each available from a specific service tier.

**Amplify** An Agentic Space in K Pro targeting advanced drug development workflows, including digital twin enrichment and clinical trial strategy. Available in the Premium tier, it contains the Multimodal Twin Enricher and Trial Strategy Explorer agents.

**Analyze** An Agentic Space in K Pro focused on foundational research capabilities, including literature review, multimodal patient data exploration, and gene-level biological knowledge retrieval. Available from the Free tier onward, it contains the Literature Navigator, Multimodal Explorer, and Knowledge Explorer agents.

**Data Enrichment** The process by which K Pro augments uploaded or existing datasets with AI-derived features, including spatial gene expression prediction from H\&E images, cell-type deconvolution, nuclear morphology analysis, and ligand-receptor interaction modeling.

**Deep Patient Explorer** A K Agent within the Activate space that supports biomarker identification and patient stratification by performing in-depth analyses across multimodal datasets.

**Histomics** A digital pathology capability within K Pro that performs AI-driven cell detection, segmentation, and classification on H\&E whole-slide images. It identifies cell types such as lymphocytes, neutrophils, eosinophils, plasmocytes, fibroblasts, and cancer cells, and produces quantitative features including cell counts, densities, and nuclear morphology metrics.

**K Agents** Specialized AI modules within K Pro, each designed to perform a specific class of research task (e.g., literature review, multiomics analysis, clinical data exploration). Agents are orchestrated by K Pro's underlying system to collaborate in response to user queries.

**K Pro** Owkin's agentic AI platform for biomedical research. K Pro provides access to curated multimodal datasets, specialized AI agents, and visualization tools through a natural language interface. Available in four tiers: Free, Light, Standard, and Premium.

**K Pro Free** The no-cost entry tier of K Pro, providing individual access to the Analyze Agentic Space — including the Literature Navigator, Multimodal Explorer, and Knowledge Explorer agents — and public datasets including the MOSAIC Window and TCGA. Does not include data upload, collaboration, or enterprise features.

**K Pro Light** A paid K Pro tier providing individual access to core agentic capabilities, custom data upload (BYOD), and CCPA compliance, in addition to all features available in K Pro Free.

**K Pro Standard** A paid K Pro tier offering team collaboration, multi-user support, and access to the Activate Agentic Space (Deep Patient Explorer and Population Optimizer), in addition to all Light-tier features.

**K Pro Premium** The highest K Pro tier, providing access to the Amplify Agentic Space (Multimodal Twin Enricher and Trial Strategy Explorer), dedicated onboarding, priority support, and enterprise-grade capabilities, in addition to all Standard-tier features.

**Knowledge Explorer** A K Agent within the Analyze space that retrieves gene- and target-level biological knowledge from curated databases, covering protein families, oncogenicity, immune pathways, tractability, and expression profiles.

**Literature Navigator** A K Agent within the Analyze space that performs literature review and gap analysis by querying PubMed abstracts through semantic search and retrieval-augmented generation (RAG), returning citation-backed summaries.

**MCP (Model Context Protocol)** An open protocol that enables AI assistants such as Claude to connect to external tools and data sources through a standardized interface. Owkin's MCP integration exposes the Pathology Explorer toolset to Claude.ai and Claude Desktop users.

**MCP Connector** The configuration entry point used to link Claude.ai or Claude Desktop to Owkin's MCP server at `https://mcp.k.owkin.com/mcp`. Once connected, users can invoke Pathology Explorer tools directly from their Claude interface.

**Multimodal Explorer** A K Agent within the Analyze space that enables interactive exploration of multimodal patient datasets — including clinical, transcriptomic, spatial, and histological data — through natural-language queries and automated visualization.

**Multimodal Twin Enricher** A K Agent within the Amplify space that enriches datasets by generating AI-derived features — such as predicted spatial transcriptomics, cell-type deconvolution, and cell communication profiles — from existing modalities.

**Orchestration** The process by which K Pro's underlying system interprets a user's intent, selects the appropriate agents and tools, sequences their execution, and consolidates the results into a coherent response. Orchestration incorporates trust and verification mechanisms, including RAG-based citation validation, tool-based grounding against modality-specific AI models, and data-anchored analysis to ensure outputs are derived from actual patient data.

**Pathology Explorer** An AI-powered tool available through K Pro and as an MCP integration, designed to transform H\&E whole-slide images into granular, queryable insights. Trained on 200,000+ annotations, it detects and classifies 6 distinct cell types (lymphocytes, neutrophils, eosinophils, plasmocytes, fibroblasts, and cancer cells), supports spatial analysis of the tumor microenvironment across 27 TCGA tumor cohorts, and enables data export in Parquet format.

**Population Optimizer** A K Agent within the Activate space that assists with patient selection and cohort optimization for clinical development.

**Trial Strategy Explorer** A K Agent within the Amplify space that supports clinical trial de-risking by analyzing competitive clinical trial landscapes and real-world multimodal patient data.

### **Data & Datasets**

**Bring Your Own Data (BYOD)** A K Pro capability (available from Light tier) allowing users to upload their own proprietary datasets for analysis within the platform. Data must conform to K Pro's supported modalities and file format specifications.

**Bulk RNA-seq (bRNAseq)** Bulk RNA sequencing. A method to assess the overall transcriptome of a tissue sample by sequencing RNA from a mixture of cells. Supported formats in K Pro: `.txt`, `.tsv`, `.csv`, `.h5ad`.

**Deconvolution** A computational method for inferring the proportions of distinct cell types within a mixed sample — such as a bulk RNA-seq measurement or a spatial transcriptomics spot — by leveraging reference single-cell expression profiles. K Pro uses deconvolution extensively in spatial transcriptomics analyses and data enrichment.

**Gene Signature** A curated set of genes whose combined expression pattern represents a specific biological process, cell state, or phenotype. Gene signatures can be used in K Pro visualizations in place of individual genes for more robust biological characterization.

**H\&E (Haematoxylin and Eosin)** A standard histological staining technique applied to whole-slide tissue images. H\&E staining highlights cell nuclei and cytoplasm and is the primary imaging modality used by Pathology Explorer.

**IHC (Immunohistochemistry)** A laboratory technique that uses antibodies to detect specific protein markers in tissue sections. K Pro supports IHC slide images and can extract marker-specific scores (e.g., HER2, ER/PR) through proprietary models. Supported formats: `.tif`, `.tiff`, `.svs`, `.dcm`, `.ndpi`, `.mrxs`.

**Jaccard Index** A similarity metric used in K Pro to quantify co-expression overlap between two genes or gene signatures. Calculated as the proportion of cells (or spots) co-expressing both inputs relative to cells expressing either, it ranges from 0 (no overlap) to 1 (complete overlap).

**Moran's I** A spatial autocorrelation statistic that measures the degree to which a gene's expression is spatially clustered within a tissue section. A high Moran's I indicates that the gene tends to be expressed in localized regions rather than randomly across the tissue.

**MOSAIC (Multi Omics Spatial Atlas In Cancer)** A landmark multi-institutional dataset developed by Owkin in collaboration with academic centers (University of Pittsburgh, Gustave Roussy, Lausanne University Hospital, Erlangen University Hospital, and Charité Berlin). It contains clinical data, bulk RNA-seq, single-nuclei RNA-seq, spatial transcriptomics, WES, and H\&E histology across 9 cancer types and 2,200+ patients.

**MOSAIC Window** A curated, publicly accessible subset of the MOSAIC dataset, included in all K Pro tiers. Contains multimodal data from 60 patients across five cancer types: BLCA, OV, GBM, DLBCL, and MESO.

**Multimodal Data** Data combining multiple biological measurement types (modalities) from the same patient or sample, such as clinical records, genomics, transcriptomics, histology, and spatial omics.

**OMOP (Observational Medical Outcomes Partnership)** A standardized data model for organizing and harmonizing observational health data. K Pro follows an OMOP-like schema to ensure interoperability across heterogeneous clinical datasets.

**Proteomics** The large-scale study of proteins, including their expression levels, modifications, and interactions. K Pro supports proteomics data as a modality, accepting normalized intensity matrices (`.txt`, `.tsv`, `.csv`) and AnnData (`.h5ad`) formats.

**RECIST (Response Evaluation Criteria in Solid Tumors)** A standardized set of rules for assessing tumor response to treatment, classifying outcomes as Complete Response (CR), Partial Response (PR), Stable Disease (SD), or Progressive Disease (PD).

**Single-cell RNA-seq (scRNA-seq)** Single-cell RNA sequencing. A technique that measures gene expression at the level of individual cells, enabling identification of distinct cell populations and states within a tissue. Supported formats in K Pro: `.mtx`, `.h5`, `.h5ad`, `.rds`.

**Spatial Transcriptomics (ST)** A technique that measures gene expression while preserving the spatial location of cells or spots within a tissue section. K Pro supports the Visium Cytassist protocol from 10X Genomics. Supported formats: `.mtx`, `.h5`, `.h5ad`, `.rds`.

**TCGA (The Cancer Genome Atlas)** A publicly funded, comprehensive cancer genomics resource comprising data from 20,000+ primary cancer samples across 33 cancer types. Available in all K Pro tiers.

**TILs (Tumor-Infiltrating Lymphocytes)** Immune cells — primarily T cells and B cells — that have migrated from the bloodstream into a tumor. Their density and distribution, quantifiable through K Pro's Histomics and Pathology Explorer tools, are important prognostic and predictive biomarkers in oncology.

**TLS (Tertiary Lymphoid Structures)** Organized aggregates of immune cells that form within or near tumors, resembling lymph nodes. Their presence is generally associated with favorable anti-tumor immune responses and can be detected through K Pro's digital pathology capabilities.

**TME (Tumor Microenvironment)** The cellular and molecular environment surrounding a tumor, including immune cells, stromal cells, blood vessels, and signaling molecules. Characterizing the TME is a key use case for Pathology Explorer and the MOSAIC dataset.

**TNM Staging** A cancer staging system that classifies tumors based on three factors: the size and extent of the primary Tumor (T), involvement of regional lymph Nodes (N), and presence of distant Metastasis (M). Used alongside stage groupings (I–IV) in K Pro's clinical data model.

**WES (Whole Exome Sequencing)** A sequencing technique that targets the protein-coding regions (exons) of the genome, approximately 1–2% of the total genome, which harbors the majority of disease-causing mutations. Supported format in K Pro: `.vcf`.

**WGS (Whole Genome Sequencing)** A sequencing technique that determines the complete DNA sequence of an organism's genome, including both coding and non-coding regions. Supported alongside WES in K Pro via VCF file format.

### **AI & Technical Terms**

**AI-Readiness Maturity Model** Owkin's 6-level framework (0–5) for assessing how well a dataset is prepared for use with K Pro. Levels range from uncontrolled data (Level 0) to fully traceable and AI/ML-optimized datasets with reproducibility standards (Level 5).

**Coding Agent** An AI capability that can create and execute custom analysis steps to answer a user's question. Unlike predefined workflows, a coding agent adapts its approach to the task at hand, which makes it more flexible for open-ended analysis.

**DESeq2** A widely used statistical method for differential gene expression analysis from count-based RNA-seq data. K Pro employs DESeq2 as part of its bulk RNA-seq differential expression analysis (DEA) workflow.

**Embedding / Latent Representation** A compressed, numerical representation of high-dimensional biological data (e.g., gene expression profiles) produced by machine learning models. Used in K Pro for dimensionality reduction plots (PCA, UMAP, t-SNE).

**Foundation Model** A large-scale AI model trained on broad data that can be adapted to many downstream tasks. Owkin deploys foundation models (including iBOT, H0, and H0-mini) for self-supervised feature extraction from whole-slide images, which are used in data enrichment and spatial prediction workflows.

**GTEx (Genotype-Tissue Expression)** A public resource cataloging gene expression levels across human tissues from healthy donors. K Pro references GTEx data to provide tissue-specificity context for genes of interest.

**Hallucination** In the context of LLMs, the generation of plausible-sounding but factually incorrect or fabricated information. K Pro implements monitoring (Tool Call Accuracy) and technical guardrails (PubMed ID verification via RAG) to detect and reduce hallucinations.

**Harmonization** The process of standardizing and normalizing heterogeneous datasets — across modalities, formats, and sources — so they can be queried consistently by K Pro's agents and tools.

**HIPE Model** An Owkin-developed AI model for cell segmentation and annotation on H\&E whole-slide images. It detects and classifies individual cell nuclei into types (e.g., lymphocytes, cancer cells, fibroblasts), producing quantitative histomics features stored as CSV files.

**Ligand-Receptor (LR) Interaction** A cell communication analysis that models signaling between cells by identifying paired ligand and receptor molecules. K Pro's data enrichment pipeline maps these interactions across diffusion modes (cell contact, secreted, and hormone signaling) and visualizes them as dot plots, chord diagrams, and spatial overlays.

**LLM (Large Language Model)** A deep learning model trained on large text corpora, capable of understanding and generating natural language. K Pro is built on top of LLMs, which interpret user queries and coordinate agent responses.

**OpenTargets** An open-access platform integrating public domain data to enable systematic identification and prioritization of drug targets. K Pro uses OpenTargets data for baseline expression profiling and tractability assessments in gene knowledge exploration.

**PubMed** A freely accessible database maintained by the National Library of Medicine containing over 36 million citations and abstracts of biomedical literature. K Pro's Literature Navigator queries PubMed through semantic search and RAG to provide citation-backed research summaries.

**RAG (Retrieval-Augmented Generation)** A technique that combines a generative LLM with a retrieval system to ground responses in factual, source-verified content. K Pro uses RAG over 22M+ PubMed abstracts to ensure literature citations are valid and relevant.

**Reactome** A curated, open-source database of biological pathways and reactions. K Pro references Reactome immune pathways when providing gene-level biological context through the Knowledge Explorer agent.

**Skill** A predefined product capability designed for a specific task, such as running an analysis, producing a visualization or a complete scientific workflow. Skills use known inputs and standardized logic, which makes them consistent, repeatable, and easier to trust for routine work.

**TCA (Tool Call Accuracy)** An internal monitoring metric used by Owkin to measure the proportion of agent interactions in which the correct tool is identified and called with appropriate parameters. TCA is used to detect systematic agent errors.

**Tool** A specialized product component that performs a defined operation inside K Platform, such as retrieving data, running an analysis, or generating a chart. Tools execute work in a structured and standardized way.

### **Visualization Terms**

**Kaplan-Meier Plot** A statistical visualization of time-to-event data (e.g., overall survival) that shows the estimated survival probability over time for one or more patient groups. Typically includes log-rank test p-values and hazard ratios.

**Oncoprint** A matrix visualization displaying the pattern of genetic alterations (mutations, copy number variants, etc.) across a set of patients and genes. Useful for identifying co-occurring or mutually exclusive alterations.

**UMAP (Uniform Manifold Approximation and Projection)** A dimensionality reduction algorithm used to project high-dimensional biological data (e.g., single-cell gene expression) into 2D or 3D for visual exploration. Also available: PCA and t-SNE.

**Volcano Plot** A scatter plot used to visualize differential gene expression, with statistical significance (−log p-value) on the y-axis and effect size (log fold-change) on the x-axis. Highlights genes that are both statistically significant and biologically meaningful.

**WSI (Whole-Slide Image)** A high-resolution digital scan of a complete tissue section on a glass slide. Supported formats in K Pro: `.tif`, `.tiff`, `.svs`, `.dcm`, `.ndpi`, `.mrxs`.

**Security & Compliance Terms**

**CCPA (California Consumer Privacy Act)** A California state privacy law granting consumers rights over their personal data, including the right to know, delete, and opt out of its sale. K Pro compliance with CCPA is available from the Light tier onward.

**GDPR (General Data Protection Regulation)** The European Union's primary data protection regulation, governing the collection, storage, processing, and transfer of personal data. K Pro is GDPR compliant for EU and UK users.

**HIPAA (Health Insurance Portability and Accountability Act)** A US federal law establishing standards for the protection of health information. K Pro's security architecture is designed to support HIPAA compliance requirements.

**IAM (Identity and Access Management)** The framework of policies and technologies used to control user access to systems and data. K Pro uses IAM to enforce customer-level data segregation and role-based permissions.

**ISO 27001:2022** The international standard for information security management systems (ISMS). Owkin is certified to ISO 27001:2022, demonstrating systematic controls for protecting data confidentiality, integrity, and availability.

**ISO 13485:2016** The international standard for quality management systems in medical device development. Owkin holds this certification, applicable to its AI model development practices.

**RBAC (Role-Based Access Control)** A security model in which system access is granted based on a user's role within an organization. Available in the Premium tier of K Pro.

**SSO (Single Sign-On)** An authentication scheme that allows users to log in once and access multiple applications without re-entering credentials. K Pro supports SSO integration in the Premium tier.

**Note:** This glossary is maintained as a living reference. If you encounter a term not defined here, or believe a definition requires updating, please contact <support@owkin.com>.
