Evals course module fourteen: the eval improvement loop. Learn how to complete the full eval-driven improvement cycle in Braintrust. Find problems in production, sample them into a dataset, run a baseline, test a fix, and verify the results. More here → https://lnkd.in/gXXK6v6P
About us
Braintrust is the AI observability platform helping teams measure, evaluate, and improve AI in production. By connecting evals and observability in one workflow, teams at Notion, Stripe, Zapier, Vercel, and Ramp ship quality AI products at scale.
- Website
-
https://braintrust.dev/
External link for Braintrust
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco
- Type
- Privately Held
- Founded
- 2023
Products
Braintrust
Automated Testing Software
Braintrust is the AI observability platform. By connecting evals and observability in one workflow, Braintrust gives builders the visibility to understand how AI behaves in production and the tools to improve it. Teams at Notion, Stripe, Zapier, Vercel, and Ramp use Braintrust to compare models, test prompts, and catch regressions — turning production data into better AI with every release.
Locations
-
Primary
Get directions
San Francisco, US
Employees at Braintrust
Updates
-
Going from prototype to production is more challenging than ever. With AI products, teams need to manage multi-step agents, tool use, and the unpredictability of real users. Learn how to ship production AI applications in this workshop from AI Engineer Europe with Braintrust and Trainline.
Shipping complex AI applications — Braintrust & Trainline
https://www.youtube.com/
-
Evals course module thirteen: analyzing production logs. Use Topics in Braintrust to automatically cluster production logs into named categories. Find patterns across hundreds of conversations and build targeted eval datasets, without the need for manual review. More here → https://lnkd.in/gaYBKriH
-
-
Teams who catch regressions fast run their evals in the same place they log their traces. When the two are in different tools, every regression makes a long, manual journey before anything gets fixed. When traces and evals are in the same place, that whole sequence collapses into a problem you can solve quickly.
-
At Vercel, customers expect to build with the latest models immediately, so they ship support within hours of release. Braintrust gives their team a structured way to benchmark new models against existing ones, catch performance differences early, and deploy with confidence. Read more → https://lnkd.in/g5-HNTPD
-
Evals course module twelve: online scoring. Learn how to run online scoring against production logs as they arrive, so you get continuous quality monitoring without manual intervention. More here → https://lnkd.in/gger7twf
-
-
What's new: - Custom views for dataset rows with tailored annotation interfaces - Translate message content in traces without leaving the UI - Snapshot, roll back, and pin dataset versions to environments for evals - Eval foundations course: learn to build evals like top teams https://lnkd.in/gznacJJk
-
-
Encyclopedia Evalica A resource from Braintrust compiling the most important things to know about evals. The terms to learn, the principles to apply, and Braintrust's take on why evals matter. Read more → https://lnkd.in/etFZNrh5
-
Evals course module eleven: analyzing multi-turn traces Per-turn scorers catch individual response issues. Trace-level scorers catch conversation-wide failures like lost context or unresolved issues. Lean how to run both together. More here → https://lnkd.in/guM3YJXh
-
-
An eval platform is more than just a test runner. Evals require shared definitions of "good," reliable data pipelines, labelling workflows, versioning, and trust in results across many teams and model changes. Phillip Hetzel explains the design principles behind Braintrust's platform in this session from AI Engineer Europe. https://lnkd.in/e9bTXvsK
Why building eval platforms is hard — Phil Hetzel, Braintrust
https://www.youtube.com/
