AI Agents
Quality center
Knowledge agents

Knowledge agents

Copy as markdown

|

View as Markdown
EARLY ACCESS

Use the Quality center to test your knowledge agents before going live. Upload sample questions to verify agent responses and use the results to identify issues and improve agent performance.

To access, go to AI Agents > Quality center > Knowledge agents.

For aggregated performance metrics and historical analytics across all your knowledge agents, see Knowledge agent analytics.

Here, you can search for tasks using the search bar, run a new task, and review existing tasks in a table that includes the following information:

  • Task name
  • Knowledge agent
  • Created at
  • Status
  • Option to download the .csv file with test details

Process overview

  1. Select the knowledge agent you want to test.
  2. Create and run the task.
  3. Check the status of the task.
  4. When the task is complete, view the results.
  5. Update the knowledge agent as required.

Test

Test the knowledge agent with sample questions to verify it responds as expected. Upload a .csv file with questions, run the task, and review the results.

Prepare test questions

Create a .csv file with sample questions that represent real end user queries.

Example questions for a car dealership agent:

  • What car models do you have available?
  • Do you sell new and used cars?
  • Can I trade in my old car?
  • What financing options do you provide?

To verify answer accuracy, use ground truth evaluation by adding a reference_answer column alongside each question.

Run the task

  1. On the Infobip web interface, go to AI Agents > Quality center > Knowledge Agents.
  2. Select Run new task to test how your knowledge agent responds to a set of questions. You can review its answers and see which sources it used and why.
  3. In the pop-up:
    1. Enter a task name.
    2. Select the knowledge agent from the drop-down menu.
    3. If needed, download the .csv file with example questions.
    4. Upload your .csv file containing the questions.
  4. Select Run task.

Ground truth evaluation (optional)

Ground truth evaluation compares agent responses against expected answers you provide.

Add two columns to your .csv file:

  • question
  • reference_answer

When the test runs, each generated answer receives one of these labels:

  • GOOD - The answer matches the expected answer
  • INCOMPLETE - Correct overall but missing some details from the expected answer
  • BAD - Contains incorrect or fabricated information

The label and reasoning for each answer are included in the results file.


View task status

You can view all tasks in the Tasks table section. This section shows a maximum of the 20 most recent tasks of each type.

The following task statuses are available:

  • Completed: The task is finished and the results are ready to download.
  • Failed: The task did not complete successfully. Hover over the information icon next to the status to view details.
  • In progress: The task is currently running.

View the results

When the task is complete, select the linked task name to view the results, or download the .csv file from the Tasks section.

Results analytics

Select the linked task name from the task table to open the results analytics page.

This page includes the following information:

  • Task name
  • Date of creation
  • Tabs for Overview, Expected behavior, and Knowledge gaps

Overview

The Overview tab displays key information such as top problems, statistics, insights, and response latency.

Top problems

The Top problems section highlights the main issues to address.

Statistics

The Statistics section includes the following metrics:

MetricDescription
Total interactionsTotal number of interactions with knowledge agent.
Average interactions per sessionAverage number of messages exchanged per session.
Average interactions per userAverage number of messages exchanged per unique user.
Expected behavior ratePercentage of interactions where the knowledge agent behaved as expected, including successful answers, policy-restricted responses, and not relevant questions.
Knowledge gapsPercentage of interactions where the knowledge agent could not provide a complete and correct answer due to missing or insufficient documentation. Includes hallucination risks, partially answered questions, and unanswered questions.
Average response timeAverage time it takes for the knowledge agent to respond to a user question.
Documents retrievedNumber of distinct documents retrieved to answer user questions.
Insight distribution

The Insights distribution section provides a breakdown of insight categories:

  • Not answered
  • Partially answered
  • Successful
  • Policy restricted
  • Not relevant
  • Hallucination risk
Response latency

The Response latency section shows latency based on configurable settings:

  • Time range:
    • Hourly
    • Daily
    • Date
  • Metrics:
    • Minimum
    • Average
    • Maximum
    • 25th percentile
    • 50th percentile (median)
    • 75th percentile
    • 90th percentile
    • Option to reset to default metrics (Minimum, Average, 90th percentile)

Expected behavior

The Expected behavior tab includes the following metrics:

MetricDescription
SuccessfulInteractions where the knowledge agent provided correct and useful answers.
Policy restrictedInteractions where the knowledge agent correctly restricted the response based on documentation or system prompt instructions.
Not relevantInteractions where the user question was not relevant to the knowledge agent. Includes greetings, goodbyes, and queries outside the supported scope.
Partially answeredInteractions where the knowledge agent provided incomplete answers due to insufficient documentation coverage.

Below these metrics, there are tabs with the same names, along with the number of topics for each.

You can select a question to view how the knowledge agent responded, and use the Show context resources option to see the sources used for the answer.

Knowledge gaps

The Knowledge gaps tab includes the following metrics:

MetricDescription
Hallucination riskInteractions where the knowledge agent introduced assumptions not grounded in documentation or system instructions, which may lead to incorrect or inaccurate answers.
Not answeredInteractions where the knowledge agent could not resolve the user’s question due to missing documentation.

Below these metrics, there are tabs with the same names, along with the number of topics for each.

When you select a topic, you can view the Classification reasoning and the Knowledge agent reply, and use the Show context resources option to see the sources used.

Download task results

You can download task results as a .csv file for further analysis and processing. The downloaded file contains comprehensive data about each interaction, including:

  • Questions and answers
  • Context sources used for responses
  • Performance metrics such as latency and response time
  • Classification results when using ground truth evaluation
  • Topic detection and question analysis
  • Content filter results (when enabled)

The CSV format allows you to perform custom analysis, share results with your team, or integrate the data into external reporting tools.


Update the agent based on results

Use the results to identify and fix issues. Investigate the root cause before making changes.

IssueSolution
Agent answered correctly but the answer is not in the contextThe prompt likely contains the answer. If the answer is wrong, check the knowledge source content.
Agent uses an inappropriate toneReview and update the prompt in agent settings.
Agent responds in the wrong languageReview and update the prompt in agent settings.
Agent responses are cut offIncrease the output tokens setting.
Agent hallucinatesThe knowledge source is likely missing relevant content. Add it.
Agent answers out-of-scope questionsTighten the prompt to define scope and limitations in agent settings.

For an overview of testing approaches, see Test knowledge agent.







Need assistance

Explore Infobip Tutorials

Encountering issues

Contact our support

What's new? Check out

Release Notes

Unsure about a term? See

Glossary
Service status

Copyright @ 2006-2026 Infobip ltd.

Service Terms & ConditionsPrivacy policyTerms of use