Skip to content

Evaluate

bedrockagentcore_evaluate R Documentation

Performs on-demand evaluation of agent traces using a specified evaluator

Description

Performs on-demand evaluation of agent traces using a specified evaluator. This synchronous API accepts traces in OpenTelemetry format and returns immediate scoring results with detailed explanations.

Usage

bedrockagentcore_evaluate(evaluatorId, evaluationInput,
  evaluationTarget, evaluationReferenceInputs)

Arguments

evaluatorId

[required] The unique identifier of the evaluator to use for scoring. Can be a built-in evaluator (e.g., Builtin.Helpfulness, Builtin.Correctness) or a custom evaluator Id created through the control plane API.

evaluationInput

[required] The input data containing agent session spans to be evaluated. Includes a list of spans in OpenTelemetry format from supported frameworks like Strands (AgentCore Runtime) or LangGraph with OpenInference instrumentation.

evaluationTarget

The specific trace or span IDs to evaluate within the provided input. Allows targeting evaluation at different levels: individual tool calls, single request-response interactions (traces), or entire conversation sessions.

evaluationReferenceInputs

Ground truth data to compare against agent responses during evaluation. Allows to provide expected responses, assertions, and expected tool trajectories at different evaluation levels. Session-level reference inputs apply to the entire conversation, while trace-level reference inputs target specific request-response interactions identified by trace ID.

Value

A list with the following syntax:

list(
  evaluationResults = list(
    list(
      evaluatorArn = "string",
      evaluatorId = "string",
      evaluatorName = "string",
      explanation = "string",
      context = list(
        spanContext = list(
          sessionId = "string",
          traceId = "string",
          spanId = "string"
        )
      ),
      value = 123.0,
      label = "string",
      tokenUsage = list(
        inputTokens = 123,
        outputTokens = 123,
        totalTokens = 123
      ),
      errorMessage = "string",
      errorCode = "string",
      ignoredReferenceInputFields = list(
        "string"
      )
    )
  )
)

Request syntax

svc$evaluate(
  evaluatorId = "string",
  evaluationInput = list(
    sessionSpans = list(
      list()
    )
  ),
  evaluationTarget = list(
    spanIds = list(
      "string"
    ),
    traceIds = list(
      "string"
    )
  ),
  evaluationReferenceInputs = list(
    list(
      context = list(
        spanContext = list(
          sessionId = "string",
          traceId = "string",
          spanId = "string"
        )
      ),
      expectedResponse = list(
        text = "string"
      ),
      assertions = list(
        list(
          text = "string"
        )
      ),
      expectedTrajectory = list(
        toolNames = list(
          "string"
        )
      )
    )
  )
)