PALO ALTO — DeepInfra, a cloud platform for high-throughput AI inference, has landed $107 million in Series B funding to scale its inference cloud and global capacity. Processing nearly five trillion tokens per week, DeepInfra enables enterprises and scaleups to run open-source and agent-driven AI workloads with improved cost, performance and security.
Developed by the team behind the popular messenger app, imo, which has scaled across more than 200 million users globally, DeepInfra’s latest round is co-led by 500 Global and Georges Harik, one of Google’s earliest engineers, with participation from A.Capital Ventures, Crescent Cove, Felicis, NVIDIA, Peak6, Samsung Next, Supermicro and Upper90.
“When we launched nearly four years ago, we believed inference would become the dominant driver of enterprise AI workloads – and we are now at this inflection point,” said Nikola Borisov, co-founder and CEO, DeepInfra. “What’s happening now is incredibly exciting – open-source models are rapidly reaching parity with proprietary systems, unlocking a new wave of innovation at a fraction of the cost and enabling widespread adoption. At the same time, agent-based systems are driving continuous, high-volume demand. Inference is no longer a thin layer – it’s the system constraint that will define the majority of workloads. Most cloud platforms weren’t built for this always-on, distributed model, so we built DeepInfra from the ground up to deliver better economics, performance, and security.”
The investment reflects 500 Global’s portfolio thesis across the AI stack. The firm’s conviction is that infrastructure will be as defining a category as the models themselves.
“Demand for AI is causing every layer of the AI stack to innovate, and inference is no exception. In the agentic age, new workflows are arising on a rapid basis, as evidenced recently by OpenClaw and AutoResearch. Enterprises and developers building with open source and agent-driven AI need infrastructure that was designed to be flexible, fast and reliable. We backed DeepInfra because, in our assessment, this team has already proven they can build and operate distributed systems at global scale, and because we believe purpose-built inference infrastructure will be fundamental to the next phase of AI as compute was to the last,” said Tony Wang, Managing Partner, 500 Global.
DeepInfra is an early infrastructure collaborator in NVIDIA’s open AI ecosystem, supporting Nemotron models, NemoClaw agent framework, and the NVIDIA Dynamo inference software. As NVIDIA advances the Nemotron family of open-source models and agent-based systems like OpenClaw drive increased inference demand, DeepInfra is one of the vendors providing the infrastructure layer for these systems in production. The platform supports more than 190 open-source models through OpenAI-compatible APIs and offers a fully managed, enterprise-ready environment with built-in security, including zero data retention and SOC 2 and ISO 27001 certification.
“DeepInfra gives us access to best-in-class models with the reliability and speed we need to ship. The performance speaks for itself. They help us keep up with the pace of innovation in this space,” said Jesse Proudman, president and CTO, Venice AI.