Explainability and traceability
Ensuring Citation and Reference Integrity
K Pro leverages Retrieval-Augmented Generation (RAG) on PubMed abstracts to provide accurate and verifiable citations. A key component of this system is the validation that all returned PubMed IDs correspond to actual published articles. While a minimal risk exists that the language model may generate responses that don't fully align with the retrieved article content, this probability is kept low through our RAG architecture.
Our commitment to citation accuracy extends beyond basic validation. We conduct internal evaluations against established public benchmarks for literature review tasks, and we continuously refine our daily evaluation protocols. Future enhancements will include more sophisticated analysis to verify that generated answers appropriately incorporate and reflect the content of retrieved PubMed IDs.
Measuring and Preventing Hallucinations
K Pro implements comprehensive monitoring systems designed to detect and mitigate hallucinations at multiple stages of the response generation process.
Tool Call Accuracy Monitoring: Daily automated tracking uses metrics such as Tool Call Accuracy (TCA) to measure how frequently the system correctly identifies and invokes the appropriate tools. This monitoring enables early detection of systemic issues, including cases where the agent incorrectly requests a tool, fails to recognize when a tool is necessary, or selects a suboptimal tool for the task at hand.
Parameter Validation: Correct tool selection alone is insufficient—the parameters passed to those tools must also be accurate and complete. When incorrect parameters are supplied, the resulting actions can produce erroneous outputs that appear as hallucinations in the final response. To address this, we continuously monitor parameter accuracy and completeness through automated testing against a carefully curated set of evaluation questions, ensuring that tool invocations are not only appropriate but also properly configured.
Last updated
Was this helpful?