Evaluate
| bedrockagentcore_evaluate | R Documentation |
Performs on-demand evaluation of agent traces using a specified evaluator¶
Description¶
Performs on-demand evaluation of agent traces using a specified evaluator. This synchronous API accepts traces in OpenTelemetry format and returns immediate scoring results with detailed explanations.
Usage¶
bedrockagentcore_evaluate(evaluatorId, evaluationInput,
evaluationTarget, evaluationReferenceInputs)
Arguments¶
evaluatorId |
[required] The unique identifier of the evaluator to use for
scoring. Can be a built-in evaluator (e.g.,
|
evaluationInput |
[required] The input data containing agent session spans to be evaluated. Includes a list of spans in OpenTelemetry format from supported frameworks like Strands (AgentCore Runtime) or LangGraph with OpenInference instrumentation. |
evaluationTarget |
The specific trace or span IDs to evaluate within the provided input. Allows targeting evaluation at different levels: individual tool calls, single request-response interactions (traces), or entire conversation sessions. |
evaluationReferenceInputs |
Ground truth data to compare against agent responses during evaluation. Allows to provide expected responses, assertions, and expected tool trajectories at different evaluation levels. Session-level reference inputs apply to the entire conversation, while trace-level reference inputs target specific request-response interactions identified by trace ID. |
Value¶
A list with the following syntax:
list(
evaluationResults = list(
list(
evaluatorArn = "string",
evaluatorId = "string",
evaluatorName = "string",
explanation = "string",
context = list(
spanContext = list(
sessionId = "string",
traceId = "string",
spanId = "string"
)
),
value = 123.0,
label = "string",
tokenUsage = list(
inputTokens = 123,
outputTokens = 123,
totalTokens = 123
),
errorMessage = "string",
errorCode = "string",
ignoredReferenceInputFields = list(
"string"
)
)
)
)
Request syntax¶
svc$evaluate(
evaluatorId = "string",
evaluationInput = list(
sessionSpans = list(
list()
)
),
evaluationTarget = list(
spanIds = list(
"string"
),
traceIds = list(
"string"
)
),
evaluationReferenceInputs = list(
list(
context = list(
spanContext = list(
sessionId = "string",
traceId = "string",
spanId = "string"
)
),
expectedResponse = list(
text = "string"
),
assertions = list(
list(
text = "string"
)
),
expectedTrajectory = list(
toolNames = list(
"string"
)
)
)
)
)