Prompting guide & prompt library

Learn how to get the best results from K-Pro. This guide covers the principles behind effective prompts and provides a ready-to-use library organized by use case.

Start here

New to K-Pro? Try this prompt right now:

Provide clinical characteristics of the TCGA-BRCA cohort. Include the distribution of tumor stage and age at diagnosis.

You'll get summary tables and distributions showing the clinical breakdown of the cohort. From there, try a follow-up:

Compute PAM50-like subtype scores using key marker genes from bulk RNA-seq and stratify survival accordingly

This is the core loop in K Pro: ask a question, get a result, refine with a follow-up. Every prompt in this guide works this way.

Part 1 — How to prompt K Pro effectively

The A-S-R-T-C Framework

Every high-performing prompt follows a five-part structure. This framework was built from analysis of 500+ real K-Pro interactions — prompts that follow it succeed consistently, prompts that skip components hit predictable failure modes.

[Action] + [Subject] + [Resolution] + [Tool] + [Context]

Component
What it is
Why it matters
Keywords & examples

A — Action

The analytical intent

Prevents the AI from just "describing" — forces it to calculate

Analyze, Compare, Correlate, Perform DEG, Generate KM

S — Subject

Gene symbols + aliases (HGNC)

Avoids "Column Not Found" errors

TNFRSF9 (CD137), CD274 (PD-L1), VSIR (VISTA)

R — Resolution

Data granularity

Directs the AI to the correct single-cell or bulk layer

Level 3 (refined types), Level 4 (subsets), Bulk RNA

T — Tool

Visualization type

Bypasses tool-specific failure modes (e.g., heatmap crashes on large gene sets)

dotplot, Violin Plot, Kaplan-Meier, Volcano plot

C — Context

Biological or cohort stratification

Filters out irrelevant data (e.g., GBM cells appearing in a bladder cancer query)

Epithelioid vs. Biphasic, TP53 Mut vs. WT, TCGA-BRCA

Not every prompt needs all five components. Literature searches typically need only A + S + C. Data exploration needs at least A + S + C. Visualization and spatial queries benefit from the full A-S-R-T-C.

From weak to effective — see the difference

Example: Gene expression in mesothelioma

Weak prompt:

"Show me CD137 in mesothelioma."

What goes wrong: The AI might pick the wrong data resolution, use a basic bar chart, or fail to find the gene name "CD137" in the database columns.

Better:

"Analyze TNFRSF9 at Level 2 in mesothelioma."

Improved: Uses the official gene symbol and specifies resolution. But still lacks the visualization tool and comparison context.

A-S-R-T-C prompt:

"Analyze [A] TNFRSF9 (CD137) [S] at Single-cell Level 3 [R] using a dotplot [T] in Epithelioid vs. Biphasic Mesothelioma [C]."

Why it works: The agent knows exactly what to calculate, where to find the gene, what resolution to use, which chart to produce, and how to stratify the data.

Common failure modes and how to fix them

These are the four most frequent errors observed in real K-Pro sessions. Each one maps to a missing A-S-R-T-C component.

1. The "Complexity Crash" — too many variables

Failed prompt

"Plot expression for AREG, EREG, HBEGF, BTC, TGFA, EGFR, KRAS, BRAF, MET, and HER3 in a heatmap for all bladder cancer types."

What went wrong

Overloaded the heatmap tool with too many genes across too many categories → timeout error

Fix

"Analyze the EGFR-ligand and bypass pathway genes at Bulk RNA level using a dotplot stratified by Bladder Histological Subgroups."

Why it works

Switching the Tool (T) to a dotplot handles high-dimensional data more efficiently than a heatmap

2. The "Schema Error" — gene alias not found

Failed prompt

"Show me VISTA expression in mesothelioma."

What went wrong

The agent couldn't find "VISTA" in column headers → "Column Not Found" error

Fix

"Compare VSIR (VISTA) expression at Single-cell Level 3 using a Violin Plot across Epithelioid and Biphasic mesothelioma subtypes"

Why it works

Using the official HGNC symbol as Subject (S) with the alias in parentheses ensures correct mapping

3. The "Ambiguity Error" — wrong indication pulled in

Failed prompt

"What are the top cell types expressing CD137?"

What went wrong

The AI pulled Glioblastoma data (Astrocytes/Microglia) because no cancer type was specified

Fix

"Analyze TNFRSF9 (CD137) distribution at Single-cell Level 3 specifically within the Bladder Cancer MOSAIC cohort"

Why it works

Providing the Context (C) filters out non-relevant indications

4. The "Spatial Resolution" failure — too vague for spatial data

Failed prompt

"Where is FAP expressed in the slides?"

What went wrong

Too vague — produced a generic slide view with no quantitative insight

Fix

"Calculate the spatial abundance of FAP within the Invasive Front vs. Tumor Core across ovarian, bladder cancer patients"

Why it works

Forces a calculation (A) on specific spatial niches (C) rather than a simple visual search

The "Chain of Thought" sequence

Your most productive sessions will follow a logical drill-down path. Don't start with a complex differential expression query — start simple and build up.

Step
What to ask
Example

1. Survey

Broad overview of a gene across indications

"Show expression of NECTIN4 across all mosaic cancer types"

2. Focus

Zoom into one indication at higher resolution

"Zoom into MOSAIC-BLCA at Level 3 resolution."

3. Spatial

Check co-localization in the tumor microenvironment

"Does NECTIN4+ tumor cells co-localize with CD8+ T cells in the invasive front in MOSAIC bladder cancer patients?"

4. Clinical

Link findings to patient outcomes

"Does NECTIN4 overexpression correlate with overall survival in TCGA-BLCA?"

Each step builds on the context from the previous one — K-Pro maintains conversation history within a session.

Quick-reference prompt templates

Copy these templates and fill in the bracketed values.

Survival Analysis

Differential Expression

Spatial Analysis

Cell Type Profiling

Target Discovery


Part 2 — Prompt Library

Ready-to-use prompts organized by use case. Each prompt has been verified against K-Pro documentation.

Literature Review & Biomedical Knowledge

Summarize publications on a specific gene or pathway

Prompt:

Summarize the latest PubMed publications on the role of TP53 mutations in non-small cell lung cancer. Focus on findings from the last 3 years and highlight any consensus on prognostic significance.

Expected result: A structured summary of key publications with citations, organized by main findings and areas of consensus/controversy.

Follow-up:

Are there any conflicting findings across these studies regarding TP53's role as a predictive biomarker for immunotherapy response in NSCLC?

Tip: Be specific about the gene, indication, and timeframe. Vague prompts like "tell me about TP53" return overly broad results.

Search for publications about a drug target

Prompt:

Find publications investigating TROP2 as a therapeutic target in triple-negative breast cancer. Include any data on TROP2 expression levels and their correlation with clinical outcomes.

Expected result: A curated list of relevant publications with key findings, expression data, and clinical correlations.

Follow-up:

Based on these publications, what is the evidence for using TROP2 expression as a patient selection biomarker for ADC therapies?

Tip: Combine the target name with a specific indication and the type of evidence you need (expression, outcomes, mechanisms).

Data Exploration & Cohort Analysis

Explore TCGA cohort demographics

Prompt:

Provide clinical characteristics of the TCGA-BRCA cohort. Include the distribution of tumor stage and age at diagnosis.

Expected result: Summary tables and/or distributions showing the clinical breakdown of the TCGA-BRCA cohort.

Tip: Always specify the exact TCGA cohort code (e.g., TCGA-BRCA, TCGA-LUAD) rather than the disease name alone.

A-S-R-T-C breakdown: Show [A] clinical characteristics [S: cohort variables] at bulk level [R] as summary tables [T] in TCGA-BRCA [C].

Explore MOSAIC Window multiomics data

Prompt:

Provide the number of patients that have access to all the available (RNA-seq, Single cell, spatial, H&E, clinical) modalities for MOSAIC bladder cancer.

Expected result: A summary of MOSAIC Window data availability for the requested indication, broken down by modality.

Follow-up:

For ovarian cancer in mosaic window, show the distribution of HRD status." There is no CRC samples in mosaic window.

Tip: MOSAIC Window is Owkin's proprietary multimodal dataset. Specify the indication to scope the data landscape before diving into analysis.

Characterize a patient cohort

Prompt:

Characterize the TCGA-LUAD cohort: show me the distribution of KRAS mutation status, smoking history, tumor stage, and median overall survival for KRAS wt and mutated patients.

Expected result: A multi-variable cohort characterization with summary statistics per subgroup.

Follow-up:

Which of these subgroups has the worst overall survival, and what are their distinguishing molecular features?

Tip: List the specific variables you want characterized upfront — K-Pro works best when it knows exactly what you're looking for.

Assess gene expression across tissues (target prioritization)

Prompt:

Compare the expression of NECTIN4 across all TCGA cancer types. Show me a pan-cancer overview ranked by median expression level.

Expected result: A ranked table or visualization of NECTIN4 expression across TCGA indications.

Follow-up:

For the top 3 indications with highest NECTIN4 overexpression, show me the correlation between NECTIN4 expression and overall survival.

A-S-R-T-C breakdown: Compare [A] NECTIN4 [S] at Bulk RNA level [R] as a ranked table [T] across all TCGA cancer types vs. matched normal tissue [C].

Multiomics integration

Prompt:

In the TCGA-BRCA cohort, integrate RNA-seq gene expression and mutation data. Identify genes that are both differentially expressed and frequently mutated in basal-like versus luminal A patients.

Expected result: A multi-omics integration result showing genes that appear significant across both data modalities.

Follow-up:

Perform an individual gene expression comparison of MSIGDB DNA repair, oxidative stress, and EMT pathway signatures at Bulk RNA level using a violin plot with p-values comparing Basal-like and Luminal TCGA-BRCA patients.

Tip: Multiomics queries are computationally intensive. Start with a specific comparison (two subgroups) rather than "analyze everything."

Visualization

Agent: Data Explorer (with visualization capabilities)

Generate a Kaplan-Meier survival curve

Prompt:

Create a Kaplan-Meier survival curve for TCGA-BRCA patients stratified by TP53 mutation status (mutated vs. wild-type). Include the p-value from a log-rank test and the number of patients at risk.

Expected result: A Kaplan-Meier plot with two curves, log-rank p-value, and at-risk table.

Tip: K-Pro supports real-time plot iteration. Ask for specific formatting changes (colors, labels, font size) in follow-up prompts rather than trying to specify everything at once.

A-S-R-T-C breakdown: Create [A] TP53 mutation survival analysis [S] at bulk level [R] as a KM curve with log-rank test [T] in TCGA-BRCA, mutated vs. wild-type [C].

Create comparative visualizations

Prompt:

Create a box plot comparing the expression of CD274 (PD-L1) across the five molecular subtypes in TCGA-BRCA. Add individual data points and mark statistically significant pairwise comparisons.

Expected result: A box plot with overlaid data points and significance brackets between groups.

Follow-up:

Now create a heatmap of the top 50 differentially expressed genes between these molecular subtypes.

Tip: You can request multiple plot types in sequence. Each builds on the data context from the previous query.

Statistical Analysis

Compute survival statistics

Prompt:

In the TCGA-STAD cohort, test whether there is a statistically significant difference in overall survival between microsatellite-instable (MSI-H) and microsatellite-stable (MSS) patients. Report the hazard ratio, 95% confidence interval, and log-rank p-value.

Expected result: Statistical test results with HR, CI, and p-value, plus a supporting KM curve.

Follow-up:

Run a multivariate Cox regression adjusting for age, stage, and MSI status to confirm whether MSI status is an independent prognostic factor.

Tip: Always specify the statistical test you want (log-rank, Cox, t-test) for precise results. K-Pro backs every analysis with p-values and population-level data.

Cross-Dataset Discovery

Link datasets to find novel biological mechanisms

Prompt:

In the TCGA-KIRC cohort, identify genes whose expression is significantly associated with both VHL mutation status and response to immune checkpoint inhibitors. Cross-reference findings with published literature on VHL-related immune evasion mechanisms.

Expected result: A list of candidate genes with statistical associations, linked to supporting literature evidence.

Follow-up:

For the top 3 candidate genes, show me their spatial expression patterns in the tumor microenvironment using MOSAIC data if available.

Tip: This is a multi-step, multi-agent use case. Be explicit about both the data analysis you want AND the literature cross-reference.

Do's and don'ts

Do

  • Use official gene symbols with common aliases in parentheses: VSIR (VISTA), not just VISTA

  • Specify the dataset using the cohort name and the indication.

  • Name the statistical test you want: log-rank, Cox regression, t-test

  • Iterate in follow-ups — refine plots, add filters, drill deeper

  • Start simple, then drill down — follow the Survey → Focus → Spatial → Clinical sequence

  • Specify the visualization type — dotplot, violin, KM curve — to avoid tool-selection errors

Don't

  • Don't write "search engine" prompts — "Tell me about TP53" is too vague

  • Don't overload a single prompt — stacking multiple plots or analyses into one request is more likely to fail than running a complex one. Break the work into sequential steps, the way you would during a scientific deep-dive.

  • Don't skip the indication/cohort — without context, the AI may pull irrelevant data from other cancer types

  • Don't try to specify everything in one prompt — ask for the analysis first, then adjust formatting in follow-ups

Last updated

Was this helpful?