Prompting guide & prompt library
Learn how to get the best results from K-Pro. This guide covers the principles behind effective prompts and provides a ready-to-use library organized by use case.
Start here
New to K-Pro? Try this prompt right now:
Provide clinical characteristics of the TCGA-BRCA cohort. Include the distribution of tumor stage and age at diagnosis.
You'll get summary tables and distributions showing the clinical breakdown of the cohort. From there, try a follow-up:
Compute PAM50-like subtype scores using key marker genes from bulk RNA-seq and stratify survival accordingly
This is the core loop in K Pro: ask a question, get a result, refine with a follow-up. Every prompt in this guide works this way.
Part 1 — How to prompt K Pro effectively
The A-S-R-T-C Framework
Every high-performing prompt follows a five-part structure. This framework was built from analysis of 500+ real K-Pro interactions — prompts that follow it succeed consistently, prompts that skip components hit predictable failure modes.
[Action] + [Subject] + [Resolution] + [Tool] + [Context]
A — Action
The analytical intent
Prevents the AI from just "describing" — forces it to calculate
Analyze, Compare, Correlate, Perform DEG, Generate KM
S — Subject
Gene symbols + aliases (HGNC)
Avoids "Column Not Found" errors
TNFRSF9 (CD137), CD274 (PD-L1), VSIR (VISTA)
R — Resolution
Data granularity
Directs the AI to the correct single-cell or bulk layer
Level 3 (refined types), Level 4 (subsets), Bulk RNA
T — Tool
Visualization type
Bypasses tool-specific failure modes (e.g., heatmap crashes on large gene sets)
dotplot, Violin Plot, Kaplan-Meier, Volcano plot
C — Context
Biological or cohort stratification
Filters out irrelevant data (e.g., GBM cells appearing in a bladder cancer query)
Epithelioid vs. Biphasic, TP53 Mut vs. WT, TCGA-BRCA
Not every prompt needs all five components. Literature searches typically need only A + S + C. Data exploration needs at least A + S + C. Visualization and spatial queries benefit from the full A-S-R-T-C.
From weak to effective — see the difference
Example: Gene expression in mesothelioma
Weak prompt:
"Show me CD137 in mesothelioma."
What goes wrong: The AI might pick the wrong data resolution, use a basic bar chart, or fail to find the gene name "CD137" in the database columns.
Better:
"Analyze TNFRSF9 at Level 2 in mesothelioma."
Improved: Uses the official gene symbol and specifies resolution. But still lacks the visualization tool and comparison context.
A-S-R-T-C prompt:
"Analyze [A] TNFRSF9 (CD137) [S] at Single-cell Level 3 [R] using a dotplot [T] in Epithelioid vs. Biphasic Mesothelioma [C]."
Why it works: The agent knows exactly what to calculate, where to find the gene, what resolution to use, which chart to produce, and how to stratify the data.
Common failure modes and how to fix them
These are the four most frequent errors observed in real K-Pro sessions. Each one maps to a missing A-S-R-T-C component.
1. The "Complexity Crash" — too many variables
Failed prompt
"Plot expression for AREG, EREG, HBEGF, BTC, TGFA, EGFR, KRAS, BRAF, MET, and HER3 in a heatmap for all bladder cancer types."
What went wrong
Overloaded the heatmap tool with too many genes across too many categories → timeout error
Fix
"Analyze the EGFR-ligand and bypass pathway genes at Bulk RNA level using a dotplot stratified by Bladder Histological Subgroups."
Why it works
Switching the Tool (T) to a dotplot handles high-dimensional data more efficiently than a heatmap
2. The "Schema Error" — gene alias not found
Failed prompt
"Show me VISTA expression in mesothelioma."
What went wrong
The agent couldn't find "VISTA" in column headers → "Column Not Found" error
Fix
"Compare VSIR (VISTA) expression at Single-cell Level 3 using a Violin Plot across Epithelioid and Biphasic mesothelioma subtypes"
Why it works
Using the official HGNC symbol as Subject (S) with the alias in parentheses ensures correct mapping
3. The "Ambiguity Error" — wrong indication pulled in
Failed prompt
"What are the top cell types expressing CD137?"
What went wrong
The AI pulled Glioblastoma data (Astrocytes/Microglia) because no cancer type was specified
Fix
"Analyze TNFRSF9 (CD137) distribution at Single-cell Level 3 specifically within the Bladder Cancer MOSAIC cohort"
Why it works
Providing the Context (C) filters out non-relevant indications
4. The "Spatial Resolution" failure — too vague for spatial data
Failed prompt
"Where is FAP expressed in the slides?"
What went wrong
Too vague — produced a generic slide view with no quantitative insight
Fix
"Calculate the spatial abundance of FAP within the Invasive Front vs. Tumor Core across ovarian, bladder cancer patients"
Why it works
Forces a calculation (A) on specific spatial niches (C) rather than a simple visual search
The "Chain of Thought" sequence
Your most productive sessions will follow a logical drill-down path. Don't start with a complex differential expression query — start simple and build up.
1. Survey
Broad overview of a gene across indications
"Show expression of NECTIN4 across all mosaic cancer types"
2. Focus
Zoom into one indication at higher resolution
"Zoom into MOSAIC-BLCA at Level 3 resolution."
3. Spatial
Check co-localization in the tumor microenvironment
"Does NECTIN4+ tumor cells co-localize with CD8+ T cells in the invasive front in MOSAIC bladder cancer patients?"
4. Clinical
Link findings to patient outcomes
"Does NECTIN4 overexpression correlate with overall survival in TCGA-BLCA?"
Each step builds on the context from the previous one — K-Pro maintains conversation history within a session.
Quick-reference prompt templates
Copy these templates and fill in the bracketed values.
Survival Analysis
Differential Expression
Spatial Analysis
Cell Type Profiling
Target Discovery
Part 2 — Prompt Library
Ready-to-use prompts organized by use case. Each prompt has been verified against K-Pro documentation.
Literature Review & Biomedical Knowledge
Summarize publications on a specific gene or pathway
Prompt:
Summarize the latest PubMed publications on the role of TP53 mutations in non-small cell lung cancer. Focus on findings from the last 3 years and highlight any consensus on prognostic significance.
Expected result: A structured summary of key publications with citations, organized by main findings and areas of consensus/controversy.
Follow-up:
Are there any conflicting findings across these studies regarding TP53's role as a predictive biomarker for immunotherapy response in NSCLC?
Tip: Be specific about the gene, indication, and timeframe. Vague prompts like "tell me about TP53" return overly broad results.
Search for publications about a drug target
Prompt:
Find publications investigating TROP2 as a therapeutic target in triple-negative breast cancer. Include any data on TROP2 expression levels and their correlation with clinical outcomes.
Expected result: A curated list of relevant publications with key findings, expression data, and clinical correlations.
Follow-up:
Based on these publications, what is the evidence for using TROP2 expression as a patient selection biomarker for ADC therapies?
Tip: Combine the target name with a specific indication and the type of evidence you need (expression, outcomes, mechanisms).
Data Exploration & Cohort Analysis
Explore TCGA cohort demographics
Prompt:
Provide clinical characteristics of the TCGA-BRCA cohort. Include the distribution of tumor stage and age at diagnosis.
Expected result: Summary tables and/or distributions showing the clinical breakdown of the TCGA-BRCA cohort.
Tip: Always specify the exact TCGA cohort code (e.g., TCGA-BRCA, TCGA-LUAD) rather than the disease name alone.
A-S-R-T-C breakdown: Show [A] clinical characteristics [S: cohort variables] at bulk level [R] as summary tables [T] in TCGA-BRCA [C].
Explore MOSAIC Window multiomics data
Prompt:
Provide the number of patients that have access to all the available (RNA-seq, Single cell, spatial, H&E, clinical) modalities for MOSAIC bladder cancer.
Expected result: A summary of MOSAIC Window data availability for the requested indication, broken down by modality.
Follow-up:
For ovarian cancer in mosaic window, show the distribution of HRD status." There is no CRC samples in mosaic window.
Tip: MOSAIC Window is Owkin's proprietary multimodal dataset. Specify the indication to scope the data landscape before diving into analysis.
Characterize a patient cohort
Prompt:
Characterize the TCGA-LUAD cohort: show me the distribution of KRAS mutation status, smoking history, tumor stage, and median overall survival for KRAS wt and mutated patients.
Expected result: A multi-variable cohort characterization with summary statistics per subgroup.
Follow-up:
Which of these subgroups has the worst overall survival, and what are their distinguishing molecular features?
Tip: List the specific variables you want characterized upfront — K-Pro works best when it knows exactly what you're looking for.
Assess gene expression across tissues (target prioritization)
Prompt:
Compare the expression of NECTIN4 across all TCGA cancer types. Show me a pan-cancer overview ranked by median expression level.
Expected result: A ranked table or visualization of NECTIN4 expression across TCGA indications.
Follow-up:
For the top 3 indications with highest NECTIN4 overexpression, show me the correlation between NECTIN4 expression and overall survival.
A-S-R-T-C breakdown: Compare [A] NECTIN4 [S] at Bulk RNA level [R] as a ranked table [T] across all TCGA cancer types vs. matched normal tissue [C].
Multiomics integration
Prompt:
In the TCGA-BRCA cohort, integrate RNA-seq gene expression and mutation data. Identify genes that are both differentially expressed and frequently mutated in basal-like versus luminal A patients.
Expected result: A multi-omics integration result showing genes that appear significant across both data modalities.
Follow-up:
Perform an individual gene expression comparison of MSIGDB DNA repair, oxidative stress, and EMT pathway signatures at Bulk RNA level using a violin plot with p-values comparing Basal-like and Luminal TCGA-BRCA patients.
Tip: Multiomics queries are computationally intensive. Start with a specific comparison (two subgroups) rather than "analyze everything."
Visualization
Agent: Data Explorer (with visualization capabilities)
Generate a Kaplan-Meier survival curve
Prompt:
Create a Kaplan-Meier survival curve for TCGA-BRCA patients stratified by TP53 mutation status (mutated vs. wild-type). Include the p-value from a log-rank test and the number of patients at risk.
Expected result: A Kaplan-Meier plot with two curves, log-rank p-value, and at-risk table.
Tip: K-Pro supports real-time plot iteration. Ask for specific formatting changes (colors, labels, font size) in follow-up prompts rather than trying to specify everything at once.
A-S-R-T-C breakdown: Create [A] TP53 mutation survival analysis [S] at bulk level [R] as a KM curve with log-rank test [T] in TCGA-BRCA, mutated vs. wild-type [C].
Create comparative visualizations
Prompt:
Create a box plot comparing the expression of CD274 (PD-L1) across the five molecular subtypes in TCGA-BRCA. Add individual data points and mark statistically significant pairwise comparisons.
Expected result: A box plot with overlaid data points and significance brackets between groups.
Follow-up:
Now create a heatmap of the top 50 differentially expressed genes between these molecular subtypes.
Tip: You can request multiple plot types in sequence. Each builds on the data context from the previous query.
Statistical Analysis
Compute survival statistics
Prompt:
In the TCGA-STAD cohort, test whether there is a statistically significant difference in overall survival between microsatellite-instable (MSI-H) and microsatellite-stable (MSS) patients. Report the hazard ratio, 95% confidence interval, and log-rank p-value.
Expected result: Statistical test results with HR, CI, and p-value, plus a supporting KM curve.
Follow-up:
Run a multivariate Cox regression adjusting for age, stage, and MSI status to confirm whether MSI status is an independent prognostic factor.
Tip: Always specify the statistical test you want (log-rank, Cox, t-test) for precise results. K-Pro backs every analysis with p-values and population-level data.
Cross-Dataset Discovery
Link datasets to find novel biological mechanisms
Prompt:
In the TCGA-KIRC cohort, identify genes whose expression is significantly associated with both VHL mutation status and response to immune checkpoint inhibitors. Cross-reference findings with published literature on VHL-related immune evasion mechanisms.
Expected result: A list of candidate genes with statistical associations, linked to supporting literature evidence.
Follow-up:
For the top 3 candidate genes, show me their spatial expression patterns in the tumor microenvironment using MOSAIC data if available.
Tip: This is a multi-step, multi-agent use case. Be explicit about both the data analysis you want AND the literature cross-reference.
Do's and don'ts
Do
Use official gene symbols with common aliases in parentheses:
VSIR (VISTA), not justVISTASpecify the dataset using the cohort name and the indication.
Name the statistical test you want: log-rank, Cox regression, t-test
Iterate in follow-ups — refine plots, add filters, drill deeper
Start simple, then drill down — follow the Survey → Focus → Spatial → Clinical sequence
Specify the visualization type — dotplot, violin, KM curve — to avoid tool-selection errors
Don't
Don't write "search engine" prompts — "Tell me about TP53" is too vague
Don't overload a single prompt — stacking multiple plots or analyses into one request is more likely to fail than running a complex one. Break the work into sequential steps, the way you would during a scientific deep-dive.
Don't skip the indication/cohort — without context, the AI may pull irrelevant data from other cancer types
Don't try to specify everything in one prompt — ask for the analysis first, then adjust formatting in follow-ups
Last updated
Was this helpful?