
Harsh Jain
Apr 15, 2025
Implementing agents in Salesforce can deliver exceptional value, but building them is just half the job. These agents must be carefully observed, debugged, and optimized.
In this post, I'll focus on the technical elements of building robust "Agent Reporting" capabilities — ensuring you have the logs, relationships, and report types required to understand exactly how your AI agent is operating under the hood.
Why engineering visibility into AI Agents matters
In the sections that follow, I introduce two key reporting layers: the Conversation Overview Report and the Detailed Technical Report. Together, they provide a full-stack view of how your agents are performing.
From an engineering perspective, this kind of visibility is non-negotiable. When you’re deploying AI agents into critical business workflows — say, lead routing, case resolution, or opportunity updates — it’s not enough to know whether the user got a response. You need to see exactly what the user said, how the agent interpreted it, what prompt it generated behind the scenes, what tools it called, whether those calls succeeded, how long they took, and what came back.
Without this chain of transparency, debugging becomes a guessing game. But with the right reporting setup, teams can trace failures, analyze edge cases, and improve prompt quality based on actual system behavior, not assumptions.
There’s also a cost and performance angle here. The detailed technical report surfaces token usage, model parameters, and latency metrics, which means you can spot inefficiencies early. Maybe a certain use case is silently racking up LLM costs. Maybe a prompt pattern is generating bloated completions that don’t help. With these insights, you can tune for both precision and cost-effectiveness.
From a business perspective, this kind of reporting builds confidence. It shows stakeholders what the agent is actually doing, not just what it’s supposed to do. You can see which actions are succeeding or failing, how well the system is classifying user intents, and whether the responses feel on-brand.
That’s critical for getting buy-in from non-technical teams, especially in enterprise environments where trust in AI systems is still being earned.
These reports also serve as a shared source of truth. Engineers can dive deep into raw prompts and reasoning chains, while business users can monitor high-level conversation flows and agent performance trends. Everyone gets a view into the system, tailored to their needs, but grounded in the same underlying data. That alignment becomes incredibly valuable when you’re scaling AI adoption across departments.
Ultimately, what I've attempted to lay out in this blog is a foundational piece of any serious AI agent deployment.
Because it's not enough to ship a working agent; you have to observe it, understand it, and evolve it. Reporting is what turns an agent from a demo into a durable part of your operating model. And if you want that agent to deliver consistent value, this is the minimum viable level of instrumentation you need.
Types of Agent Reporting we use at realfast
To set up reporting, Salesforce has some pre-built data model objects which you can use out of the box. At realfast, have developed two types of reporting:
Conversation Overview Report - Shows the overall conversation between the agent and user, including topics and actions
Detailed Technical Report - Reveals the actual prompts called by the LLMs and their outputs, along with the intermediate steps taken by the reasoning engine, providing insight into the inner workings of the agent
Building the Conversation Overview Report
This report is built on top of two underlying Salesforce Data Model Objects (DMOs): CopilotUtteranceAnalysis and CopilotForSalesforceAppsEvents.
By joining these two DMOs, you create a powerful view of agent interactions that combines conversational content with execution details:
Copilot Utterance Analysis - Contains the conversation content:
What the user said (Request__c)
What the agent responded (Response__c)
What topic was detected (Topic__c)
Conversation metadata (BotName__c, CreatedDate__c, etc.)
Copilot For Salesforce Apps Events - Contains the execution details:
What actions were performed (ActionName__c)
Whether they succeeded (ActionSuccess__c)
Execution time metrics (ActionDuration__c, ExternalLatencyTime__c)
Execution context (PlanData__c, ConversationEventLogType__c)

As you can see in the image above, this report presents a clear timeline of user requests, agent responses, topic classifications, and action executions.
For example, you can see how the agent handles a lead creation request, classifies it into the right topics and executes the appropriate actions.
Benefits of the Conversation Overview Report
Conversation Flow Analysis: Track the natural progression of user-agent dialogues
Topic Classification Accuracy: Evaluate how well your agent is understanding user intents
Action Success Monitoring: Identify which actions are failing and need attention
Response Appropriateness: Review agent responses to ensure they match user expectations
User Experience Evaluation: Assess the overall quality and effectiveness of conversations
Building the Detailed Technical Report
For deeper technical insights, we build a more comprehensive report that reveals the inner workings of the agent. This requires joining several GenAI-related DMOs:
GenAIGatewayRequest - Contains details about the LLM request:
The complete prompt text (prompt__c)
Model parameters (temperature__c, frequencyPenalty__c)
Token usage metrics (promptTokens__c, completionTokens__c)
GenAIGatewayResponse - Contains response metadata:
Response IDs and parameters
Timestamps for tracking
GenAIGeneration - Contains the actual LLM outputs:
Raw response text (responseText__c)
Masked versions for PII protection (maskedResponseText__c)
GenAIGatewayRequestTag - Contains metadata tags:
Context tags (tag__c, tagValue__c) - these help you understand various metadata like the agent associated with the request and the user utterance associated with the model request

As you can see above, this detailed report provides extraordinary visibility into the agent's reasoning process. You can see:
The complete system instructions given to the LLM
The exact user request being processed
The token counts for measuring usage and costs
The step-by-step reasoning process of the agent
The final action execution and response
Benefits of the Detailed Technical Report
Prompt Engineering Insights: See exactly how system instructions and user inputs are formatted for the LLM
Token Usage Tracking: Monitor token consumption for cost management and optimization
Reasoning Transparency: Understand the agent's step-by-step thought process
Model Performance Analysis: Evaluate how different model parameters affect response quality
Data Extraction Validation: Confirm if the agent is correctly parsing structured data from user inputs
Security and PII Handling: Verify that sensitive information is being properly masked
Error Root Cause Analysis: Pinpoint exactly where in the processing chain issues occur
This kind of end-to-end transparency transforms how you evolve your AI agents. It turns opaque decision-making into something you can inspect, debug, and improve with confidence. When you can trace an agent’s behavior from user input all the way through to final action — with full visibility into how it reasoned, what it said, and why it did what it did — you're in a position to optimize performance, ensure alignment with business logic, and continuously raise the quality bar.
Observability like this helps you scale what works.
These reports provide a baseline to get you started with agent monitoring.
From here, you can build customised filters, alerts, and dashboards to meet your specific requirements. The data structure gives you complete flexibility to create specialized views for different stakeholders, from engineering teams who need deep technical insights to business users who want to understand overall agent performance.
By having visibility into both conversation flows and the underlying technical details, you can continuously refine your agents, optimize their performance, and ensure they deliver consistent value to your users.