Open models are not just a pricing story.
They are what happens when the AI stack becomes modular: models, APIs, harnesses, tools, and inference all improving independently.
Together AI is building the inference layer for that shift.
We gave a 2 hr deepdive on how to build inference engines that handle trillion token agentic workloads at @aiDotEngineer.
Will drop slides and detailed walkthrough!
As open models get stronger, more workloads move into the competitive inference market.
That pushes the real fight toward speed, cost, reliability, and control.
Together AI is where open models become production infrastructure.
To celebrate the start of @aiDotEngineer AI Engineer World's Fair, we're launching a bracket competition!
Use an agent or pick manually to choose your winners for the Round of 32 by 9:00 am PT Tuesday, June 30, for a chance to win:
1/ Lego trophy
2/ Mac Mini (for your 24/7
More reason why we’re excited about GLM-5.2 on Together 👇
Strong enough for serious coding work, cheap enough to change routing decisions, and easy to access through the tools developers already use.
use GLM 5.2 via a USA provider that doesn't retain prompts if you care about privacy. The model is just as good as opus 4.8/gpt 5.5
It's the same speed as claude code, and goes down less often.
On open router, its never since you can swap to diff provider if 1 drops.
I'm
There's a big difference between a single model call and serving an agent at scale. @ZainHasan6 breaks down what actually changes.
Catch our team this Monday at 9 a.m. PST for their open-source inference workshop at @aiDotEngineer
next week at @aiDotEngineer, we are joining @togethercompute for a conversation on what goes into running agents at scale.
@olive_jy_song, Research Lead, RL at MiniMax, and
@realDanFu, VP of Kernels at Together AI, will walk through both sides of M3: the training decisions
What happens when AI agents collaborate on open science?
At @aiDotEngineer World’s Fair, @james_y_zou will share work on EinsteinArena and DSGym, from multi-agent math discovery to better evaluation for data science agents.
Day 3, July 1. Expo Stage 3 SW.
As token usage explodes, model choice becomes product strategy.
Teams are already testing models like GLM-5.2 because they want frontier quality, better tokenomics, and more control over cost, data, and deployment.
Together AI is building the inference layer for that open-model
A huge moment for open source AI - as token usage skyrockets across orgs, so do concerns over cost & data/vendor lock-in & necessity of a multi-AI strategy
Keep an eye on @Lux_Capital portfolio cos @huggingface@togethercompute@SakanaAILabs & more! 🚀
I love using GLM 5.2 for web app iteration.
My workflow: generate 6 variations, then pick the best one and continue iterating on it.
I built Recast to make this even easier.
Give it a prompt, get 6 variations, download the code for your favorite version, then continue
LLMs are getting better at writing GPU kernels. Multi-GPU kernels are the harder test.
At @aiDotEngineer World's Fair, @simran_s_arora will share ParallelKernelBench, an open-source benchmark built from real CUDA communication problems where performance depends on moving data