Mastra + Langfuse Eval - llm as a judge

This issue was created from Discord post 1492462086742081558:

[![Open in Discord](https://img.shields.io/badge/Open_in_Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/channels/1309558646228779139/1492462086742081558)

use Mastra for my agents and Langfuse for evaluations, including LLM-as-a-Judge on live observations. My stack exports traces with the Mastra Langfuse integration (OTel and LangfuseSpanProcessor).

I need evaluators to run only for specific agents, for example weather-agent versus agent-builder-2-supervisor. I configured an observation-level LLM-as-a-Judge evaluator and added a filter on trace name so it would only target the right calls.

The problem is that with the trace name filter enabled, nothing matches and evaluations do not run as I expect. If I remove the trace name filter, the evaluator works again, but then it scores every invocation across all agents, which is too broad.

On model generation spans I see attributes like gen_ai.agent.id set to the real agent id, but the trace in the Langfuse UI is named generically, for example invoke or agent run. The trace name dropdown in filters only shows those generic names, not my agent id. I also do not see langfuse.trace.name on those observations in the way I expected. I suspect the evaluator compares trace name to that root-level name, not to gen_ai.agent.id, which is why filtering by agent id as trace name never works.

I am asking how I should scope evaluators per agent in this setup, whether trace name filters are meant to match root span names only, and what the recommended workaround is, for example metadata, tags, or another filter type. I would also welcome clearer guidance or product alignment between Mastra trace naming and Langfuse evaluator filters.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mastra + Langfuse Eval - llm as a judge #15263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mastra + Langfuse Eval - llm as a judge #15263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions