Knowledge agents
EARLY ACCESSUse the Quality center to test your knowledge agents before going live. Upload sample questions to verify agent responses and use the results to identify issues and improve agent performance.
To access, go to AI Agents > Quality center > Knowledge agents.
For aggregated performance metrics and historical analytics across all your knowledge agents, see Knowledge agent analytics.
Here, you can search for tasks using the search bar, run a new task, and review existing tasks in a table that includes the following information:
- Task name
- Knowledge agent
- Created at
- Status
- Option to download the
.csvfile with test details
Process overview
- Select the knowledge agent you want to test.
- Create and run the task.
- Check the status of the task.
- When the task is complete, view the results.
- Update the knowledge agent as required.
Test
Test the knowledge agent with sample questions to verify it responds as expected. Upload a .csv file with questions, run the task, and review the results.
Prepare test questions
Create a .csv file with sample questions that represent real end user queries.
Example questions for a car dealership agent:
- What car models do you have available?
- Do you sell new and used cars?
- Can I trade in my old car?
- What financing options do you provide?
To verify answer accuracy, use ground truth evaluation by adding a reference_answer column alongside each question.
Run the task
- On the Infobip web interface, go to AI Agents > Quality center > Knowledge Agents.
- Select Run new task to test how your knowledge agent responds to a set of questions. You can review its answers and see which sources it used and why.
- In the pop-up:
- Enter a task name.
- Select the knowledge agent from the drop-down menu.
- If needed, download the
.csvfile with example questions. - Upload your
.csvfile containing the questions.
- Select Run task.
Ground truth evaluation (optional)
Ground truth evaluation compares agent responses against expected answers you provide.
Add two columns to your .csv file:
questionreference_answer
When the test runs, each generated answer receives one of these labels:
- GOOD - The answer matches the expected answer
- INCOMPLETE - Correct overall but missing some details from the expected answer
- BAD - Contains incorrect or fabricated information
The label and reasoning for each answer are included in the results file.
View task status
You can view all tasks in the Tasks table section. This section shows a maximum of the 20 most recent tasks of each type.
The following task statuses are available:
- Completed: The task is finished and the results are ready to download.
- Failed: The task did not complete successfully. Hover over the information icon next to the status to view details.
- In progress: The task is currently running.
View the results
When the task is complete, select the linked task name to view the results, or download the .csv file from the Tasks section.
Results analytics
Select the linked task name from the task table to open the results analytics page.
This page includes the following information:
- Task name
- Date of creation
- Tabs for Overview, Expected behavior, and Knowledge gaps
Overview
The Overview tab displays key information such as top problems, statistics, insights, and response latency.
Top problems
The Top problems section highlights the main issues to address.
Statistics
The Statistics section includes the following metrics:
| Metric | Description |
|---|---|
| Total interactions | Total number of interactions with knowledge agent. |
| Average interactions per session | Average number of messages exchanged per session. |
| Average interactions per user | Average number of messages exchanged per unique user. |
| Expected behavior rate | Percentage of interactions where the knowledge agent behaved as expected, including successful answers, policy-restricted responses, and not relevant questions. |
| Knowledge gaps | Percentage of interactions where the knowledge agent could not provide a complete and correct answer due to missing or insufficient documentation. Includes hallucination risks, partially answered questions, and unanswered questions. |
| Average response time | Average time it takes for the knowledge agent to respond to a user question. |
| Documents retrieved | Number of distinct documents retrieved to answer user questions. |
Insight distribution
The Insights distribution section provides a breakdown of insight categories:
- Not answered
- Partially answered
- Successful
- Policy restricted
- Not relevant
- Hallucination risk
Response latency
The Response latency section shows latency based on configurable settings:
- Time range:
- Hourly
- Daily
- Date
- Metrics:
- Minimum
- Average
- Maximum
- 25th percentile
- 50th percentile (median)
- 75th percentile
- 90th percentile
- Option to reset to default metrics (Minimum, Average, 90th percentile)
Expected behavior
The Expected behavior tab includes the following metrics:
| Metric | Description |
|---|---|
| Successful | Interactions where the knowledge agent provided correct and useful answers. |
| Policy restricted | Interactions where the knowledge agent correctly restricted the response based on documentation or system prompt instructions. |
| Not relevant | Interactions where the user question was not relevant to the knowledge agent. Includes greetings, goodbyes, and queries outside the supported scope. |
| Partially answered | Interactions where the knowledge agent provided incomplete answers due to insufficient documentation coverage. |
Below these metrics, there are tabs with the same names, along with the number of topics for each.
You can select a question to view how the knowledge agent responded, and use the Show context resources option to see the sources used for the answer.
Knowledge gaps
The Knowledge gaps tab includes the following metrics:
| Metric | Description |
|---|---|
| Hallucination risk | Interactions where the knowledge agent introduced assumptions not grounded in documentation or system instructions, which may lead to incorrect or inaccurate answers. |
| Not answered | Interactions where the knowledge agent could not resolve the user’s question due to missing documentation. |
Below these metrics, there are tabs with the same names, along with the number of topics for each.
When you select a topic, you can view the Classification reasoning and the Knowledge agent reply, and use the Show context resources option to see the sources used.
Download task results
You can download task results as a .csv file for further analysis and processing. The downloaded file contains comprehensive data about each interaction, including:
- Questions and answers
- Context sources used for responses
- Performance metrics such as latency and response time
- Classification results when using ground truth evaluation
- Topic detection and question analysis
- Content filter results (when enabled)
The CSV format allows you to perform custom analysis, share results with your team, or integrate the data into external reporting tools.
Update the agent based on results
Use the results to identify and fix issues. Investigate the root cause before making changes.
| Issue | Solution |
|---|---|
| Agent answered correctly but the answer is not in the context | The prompt likely contains the answer. If the answer is wrong, check the knowledge source content. |
| Agent uses an inappropriate tone | Review and update the prompt in agent settings. |
| Agent responds in the wrong language | Review and update the prompt in agent settings. |
| Agent responses are cut off | Increase the output tokens setting. |
| Agent hallucinates | The knowledge source is likely missing relevant content. Add it. |
| Agent answers out-of-scope questions | Tighten the prompt to define scope and limitations in agent settings. |
For an overview of testing approaches, see Test knowledge agent.