Three concrete, working ways to run **Mastra evals** against an agent that has **memory** turned on — including observational-memory in thread scope (the configuration that triggers ObservationalMemory (scope: 'thread') requires a threadId, but none was found in RequestContext or MessageList.). Everything in this example uses Mastra evals primitives (runEvals, createScorer, Dataset.startExperiment). No custom evaluation harness. The agent in every script uses @mastra/memory + @mastra/libsql for storage and observational memory in thread scope. Each script writes to a fresh temp DB and cleans up after itself. A deterministic mock model is used so no API key is required and runs are reproducible in CI.
Autonomy
Semi-autonomous
Sandbox-aware
No declared sandbox guidance
Network access
Unspecified
Filesystem access
Unspecified
Permissions declared
Not declared
Pattern
Single agent
Models
gpt-4oclaude-3-5-sonnetgpt-3.5-turbo