Skip to content

Mastra + Langfuse Eval - llm as a judge #15263

@daneatmastra

Description

@daneatmastra

This issue was created from Discord post 1492462086742081558:

Open in Discord

use Mastra for my agents and Langfuse for evaluations, including LLM-as-a-Judge on live observations. My stack exports traces with the Mastra Langfuse integration (OTel and LangfuseSpanProcessor).

I need evaluators to run only for specific agents, for example weather-agent versus agent-builder-2-supervisor. I configured an observation-level LLM-as-a-Judge evaluator and added a filter on trace name so it would only target the right calls.

The problem is that with the trace name filter enabled, nothing matches and evaluations do not run as I expect. If I remove the trace name filter, the evaluator works again, but then it scores every invocation across all agents, which is too broad.

On model generation spans I see attributes like gen_ai.agent.id set to the real agent id, but the trace in the Langfuse UI is named generically, for example invoke or agent run. The trace name dropdown in filters only shows those generic names, not my agent id. I also do not see langfuse.trace.name on those observations in the way I expected. I suspect the evaluator compares trace name to that root-level name, not to gen_ai.agent.id, which is why filtering by agent id as trace name never works.

I am asking how I should scope evaluators per agent in this setup, whether trace name filters are meant to match root span names only, and what the recommended workaround is, for example metadata, tags, or another filter type. I would also welcome clearer guidance or product alignment between Mastra trace naming and Langfuse evaluator filters.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EvalsIssues surrounding Mastra EvalsLangfuseIssues using Langfuse with MastraObservability (AI Telemetry)Issues related to AI related Observability/Telemetry (Traces, Metrics, Logs)RAGIssues with Mastra's RAG systemdiscordFor issues created from Discord discussions.effort:mediumimpact:hightrio-tnttrio-tracery

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions