Knowledge agents
EARLY ACCESSUse the Quality center to evaluate and monitor your knowledge agents. Test agent responses with sample questions, analyze real end user traffic, and use the results to identify issues and improve agent performance.
To access, go to AI Agents > Quality center > Knowledge Agents.
Task types
- Evaluate - Test the knowledge agent with a set of sample questions and check whether it responds as expected. Use ground truth evaluation to compare responses against expected answers.
Process overview
- Select the knowledge agent you want to evaluate.
- Create and run the task.
- Check the status of the task.
- When the task is complete, download and view the results.
- Update the knowledge agent as required.
Evaluate
Test the knowledge agent with sample questions to verify it responds as expected. Upload a .csv file with questions, run the task, and review the results.
Prepare test questions
Create a .csv file with sample questions that represent real end user queries.
Example questions for a car dealership agent:
- What car models do you have available?
- Do you sell new and used cars?
- Can I trade in my old car?
- What financing options do you provide?
To verify answer accuracy, use ground truth evaluation by adding a reference_answer column alongside each question.
Run the task
- On the Infobip web interface, go to AI Agents > Quality center > Knowledge Agents.
- In the Select knowledge agent field, select the agent you want to test.
- Upload the
.csvfile that contains the set of questions. - In the Task name field, enter a name for the task.
- Select Run task.
Ground truth evaluation (optional)
Ground truth evaluation compares agent responses against expected answers you provide. Add two columns to your .csv file: question and reference_answer.
When the test runs, each generated answer receives one of these labels:
| Label | Meaning |
|---|---|
| GOOD | The answer matches the expected answer |
| INCOMPLETE | Correct overall but missing some details from the expected answer |
| BAD | Contains incorrect or fabricated information |
The label and reasoning for each answer are included in the results file.
View task status
You can view all tasks in the Tasks section. This section shows a maximum of the 20 most recent tasks of each type.
| Status | Description |
|---|---|
| Completed | The task is complete and the results are ready to download. |
| Failed | The task failed. Hover over the information icon next to the status to see why. |
| In progress | The task is running. |
View the results
When the task is complete, download the results .csv file from the Tasks section.
Results fields
| Field | Description |
|---|---|
question | The question from the test file. |
answer | The response generated by the knowledge agent. |
original_contexts | Relevant chunks from the knowledge sources that were retrieved, including the source document name. |
reranked_contexts | Contexts after reranking, if enabled. Not available for very short messages. |
content_filter_results | Content filter result for each message. Included only if the content filter is enabled. Contains the status (safe, annotated, blocked), and violation details per category including severity, threshold, configured action, and whether the filter was triggered. |
latency | Time taken by the knowledge agent to generate the response. This differs from the chatbot response time, which may include additional processing. |
topic | The topic determined by the agent based on the question. Provides insight into end user interests. |
is_question_answered | TRUE: agent answered the question. FALSE: agent could not answer (missing content or out of scope). N/A: message was not a question. |
is_answer_in_context | Indicates whether the answer is based on the retrieved context. |
answer_classification | GOOD, INCOMPLETE, or BAD. Included only for Evaluate tasks when reference_answer is provided. |
answer_classification_reasoning | Explanation of why the answer received its classification. Included only for Evaluate tasks when reference_answer is provided. |
Update the agent based on results
Use the results to identify and fix issues. Investigate the root cause before making changes.
| Issue | Solution |
|---|---|
| Agent answered correctly but the answer is not in the context | The prompt likely contains the answer. If the answer is wrong, check the knowledge source content. |
| Agent uses an inappropriate tone | Review and update the prompt in agent settings. |
| Agent responds in the wrong language | Review and update the prompt in agent settings. |
| Agent responses are cut off | Increase the output tokens setting. |
| Agent hallucinates | The knowledge source is likely missing relevant content. Add it. |
| Agent answers out-of-scope questions | Tighten the prompt to define scope and limitations in agent settings. |
For an overview of testing approaches, see Test knowledge agent.