Scientific discovery is entering a new AI era and we're proud to help make it possible. SambaNova is honored to be part of U.S. Department of Energy (DOE)'s Genesis Mission Consortium. We're excited to be part of what's next. 🦾
SambaNova
Computer Hardware Manufacturing
San Jose, California 95,662 followers
Transforming AI with efficiency, security, and sovereignty - driven by our relentless pursuit of intelligence.
About us
Welcome to SambaNova: Revolutionizing AI Capacity At SambaNova, we're empowering developers, enterprises, governments, and data centers to unlock their full AI potential. Our full-stack infrastructure, from chips to models, enables lightning-fast performance, low power consumption, and high-efficiency computing. Our Mission To give every developer, enterprise, government and data center absolute sovereignty over their own data, models and AI infrastructure – to future-proof the AI workloads that will power and scale tomorrow. Our Technology We give our customers the optionality to experience SambaNova through the cloud or on-premise. Samba Cloud delivers the fastest inferences on the largest open source models like Llama 4 and DeepSeek. Developers can get started building in minutes with our OpenAI compatible APIs. All customers start on the developer tier and when they need more capacity can scale into our enterprise tier. SambaStack is our on-premise offering which includes the system, the platform, and foundation models. These components combine into a powerful technology stack that delivers unparalleled performance, ease of use, accuracy, data privacy, and the ability to power every use case across the world's largest organizations. SambaManaged is a modular and ready-to-deploy AI cloud designed to deliver unmatched efficiency for data centers and cloud service providers. This solution allows organizations to quickly deploy advanced AI inference services—without the need for costly infrastructure upgrades or specialized expertise—in as little as 90 days. At the heart of SambaNova innovation is the Reconfigurable Dataflow Unit (RDU). Purpose built for AI workloads, the RDU takes advantage of a dataflow architecture and a three-tiered memory design. The three tiers of memory enable the platform to run hundreds of models on a single node and to switch between them in microseconds. In 2023, SambaNova released its 4th generation RDU chip, the SN40L.
- Website
-
http://www.sambanova.ai
External link for SambaNova
- Industry
- Computer Hardware Manufacturing
- Company size
- 201-500 employees
- Headquarters
- San Jose, California
- Type
- Privately Held
- Founded
- 2017
- Specialties
- High Performance Computing, Artificial Intelligence, Machine Learning, GPT3, Foundation Models, Deep Learning, Computer Vision, True Resolution, 3D Image Analysis, Recommendation, AI Platform, Large Language Models, AI for Science, Generative AI, AI Inference, and Premium Inference
Employees at SambaNova
Locations
-
Primary
Get directions
2460 N First Street
100
San Jose, California 95131, US
Updates
-
"As the cost of tokens continue to rise because the infrastructure is so expensive, people are looking for alternatives. So at SambaNova, we come in with a much lower-cost infrastructure, much higher performance. We drive it at 10 kilowatts, an order of magnitude lower power than the traditional GPUs." — Rodrigo Liang During his chat with Matthew Miller on Bloomberg Television, Rodrigo explained how we're able to make AI more affordable: disaggregated inference. Instead of relying on a single architecture, it combines GPUs for prefill, RDUs for high-speed decode, and CPUs for orchestration, allowing each processor to do what it does best. The result is faster inference, lower latency, and significantly lower-cost tokens for enterprises and service providers deploying AI at scale. This architecture powers the world's first commercially available disaggregated inference cloud, including VC2, helping organizations bring AI into production with better performance and better economics. Watch their conversation below ⬇️
Making AI More Affordable with Disaggregated Inference
-
"Where dataflow comes in is... you can actually just flush these things like a river flushes through their own environment with a very fast pace." - Rodrigo Liang One of the biggest challenges in AI today isn't the model, it's getting data to the right place at the right time. That's where dataflow architecture comes in, helping eliminate memory bottlenecks and unlock faster, more efficient inference at scale. 🦾 🎥: Forbes
The Dataflow Architecture That Could Dethrone the GPU
-
Say bonjour to our team at RAISE Summit 🇫🇷 We'll be talking about why one chip isn't enough anymore, and why the future of AI inference is all about using the right chip for the right workload. AI agents are changing what's required from infrastructure. Long contexts, multi-step reasoning, and continuous tool use demand a different approach to inference than traditional GPU-only architectures. That's why we're building premium inference with disaggregated infrastructure: • GPUs for prefill • RDUs for high-speed decode • CPUs to orchestrate the agent loop Faster inference, higher throughput, and better economics. See you in Paris: https://lnkd.in/gv7b3jrq
SambaNova at RAISE Summit 2026
-
Open-weight models keep getting better and Gemma 4 31B is a great example. It combines frontier-class reasoning with production-ready coding performance and native agentic capabilities, making it a strong choice for everything from coding assistants to multi-agent applications. Some highlights: • 89.2% on AIME 2026 (no tools) • 80.0% on LiveCodeBench v6 • Native function calling, structured JSON output, and system prompt support for AI agents When paired with SambaCloud, developers also get the lowest latency available, helping agents respond faster and making interactive AI experiences feel more natural. We break down what makes Gemma 4 31B stand out—and why speed matters just as much as model quality. Read the blog: https://lnkd.in/e-QMY8K5
Gemma 4 31B Running Fastest on SambaCloud
-
Inference is now the defining workload in AI, and that changes the economics of the whole stack. At Deep Tech Week SF, SambaNova Chief Product and Strategy Officer Abhi I. put it plainly: Inference is the defining workload for AI now and for the future. He further emphasized that training is a cost center; inference is where you make money. As such, hardware has to be purpose built for the task at hand. It’s not a one-size-fits-all story anymore. Heterogeneity is the order of the day: a combination of chips and networking that produces the best results. And, a combination of models—frontier and open source, large and small—that are fit to the task. Disaggregated inferencing is the right architecture for the Agentic Era. GPUs handle the prefill stage, SambaNova's RDUs handle decode, and CPUs orchestrate and run the tools. That is what lets you serve premium tokens, run the largest models running fast, with the power efficiency communities demand, at a TCO that works for the enterprise. As Abhi says, “More intelligence per joule — that’s the strategy.” It’s about achieving the lowest possible power consumption and deploying into existing data centers (air cooled). Abhi was joined by Rajiv K. of Velaura AI on the power layer and Barun Kar of Upscale AI on the networking layer, in a conversation moderated by Sriram Viswanathan of Celesta Capital. Three companies, three layers of the stack, one shared view: the GPU-centric, vertically integrated stack is the bottleneck, and purpose-built, heterogeneous infrastructure is the answer for the defining workload in AI - inferencing. Thank you to Celesta Capital for hosting. Lip-Bu Tan's Fireside Chat with Michael E Marks was a conversation between two silicon valley veterans full of wise advice to startup founders, such as “Stay flexible and adjust your trajectory if customers and markets change.” We at SambaNova appreciate the call out for the success we are having from both of them and take their advice to heart.
-
-
"You need premium because your models are bigger, your models are lower latency, your models are much higher throughput." — Rodrigo Liang The bigger opportunity is better inference. Faster tokens. Better efficiency. Lower cost. That's what makes AI agents practical at scale. Thanks Akash Pasricha The Information TITV for the great conversation on where AI infrastructure is headed. 🦾 Vista Equity Partners
-
June was all about faster agents, premium inference, and the next gen of AI infrastructure ⚡️ TL;DR: - The first disaggregated inference demo for AI agents is live - Gemma 4 31B is running fastest on SambaCloud - Customers are scaling AI faster with SambaNova - Meet us in Paris in July 🔗 https://lnkd.in/ebtBBCSD
-
Our CEO Rodrigo Liang and Vista Equity Partners' Monti Saroya joined The Information TITV today to discuss Vista and Cambium Networks' cloud initiative & why inference is becoming the next battleground in AI. The future isn't one chip doing everything...it's disaggregated inference, where GPUs, RDUs, and CPUs work together to deliver premium inference at scale. 🦾 Watch their segment below ⬇️ https://lnkd.in/ghEcapkE
The Information | TITV | June 24, 2026 The Information’s TITV is first in tech news and analysis from the people that break and shape the story. The rest is just commentary. Watch every weekday at 10 am PT/ 1 ET on The Information.com, App, YouTube, X—and on demand wherever you get your podcasts.
The Information | TITV | June 24, 2026
www.linkedin.com
