DFJ Growth

Sam Fort, Kevin Tu, Maxim Sirenko, Brian Akin

Modular: Unifying AI from Model to Metal

The AI boom is driving a global compute arms race, with companies pouring billions into GPUs, CPUs, and new accelerators to fuel the next wave of intelligent applications. Running AI workloads, such as delivering ChatGPT responses or software code completion tasks, requires orchestrating dozens of complex software layers to reach peak performance.

Today, teams stitch together a patchwork of scripts and open-source tools that must be re-tuned for each model and underlying hardware device. The result is that developers are constrained to one vendor – typically NVIDIA and their CUDA software – which can lead to suboptimized performance, extra engineering work, and higher costs. The problem is only getting worse as AI usage scales and customer demands become more diverse, requiring specialized chips and ongoing software optimizations to meet cost and performance goals.

This fragmentation and overhead create an opportunity for a more cohesive, hardware-agnostic approach. Every era of computing has been defined by a breakthrough that enabled more seamless usability: the operating system for PCs, the browser for the internet, and the hypervisor for cloud computing. For AI, we believe that breakthrough is Modular’s unified compute layer.

At DFJ Growth, we’ve been searching for a team with the vision and technical depth to make unified AI compute a reality - abstracting away complexity so developers can “write once, run anywhere.” That search led us to Chris Lattner (CEO), Tim Davis (President), and the Modular team. Having built seminal components of the software ecosystem, from programming languages to compilers, this team is uniquely prepared to take on one of the hardest problems in modern computing.

Today, we’re proud to back Modular’s $250 million Series C financing and its mission to build AI’s unified compute layer – think of it like a hypervisor for AI that abstracts away hardware-specific details so models can run fast and efficiently across any device.

Modular’s solution has three connected layers:

Mojo — the language.
Mojo provides a common programming model for AI hardware, allowing developers to write high-performance code that runs well across devices. It has a similar syntax to Python but lets developers speed up the parts of their code that matter most.
MAX — the serving framework.
MAX is the high-performance execution layer that serves models efficiently across different chips, enabling leading performance and portability without the need for hardware-specific tuning.
Mammoth — the fleet manager.
Mammoth is air-traffic control for AI workloads. It intelligently routes requests and manages model execution across heterogeneous clusters to maximize resource utilization and availability of production workloads.

Code in Mojo, run fast with MAX, and scale with Mammoth - all inside a single unified compute layer.

After three years of development, Modular’s solution runs across NVIDIA, AMD, and Apple GPUs as well as Intel, ARM, and AMD CPUs, with more edge and cloud processors on the way. The Modular stack already serves trillions of model outputs (“tokens”) per day, has a vibrant and growing community, and currently delivers 20–50% performance gains over leading open-source software on the latest processors.

In the early cloud era, VMware enabled one physical server to act like many, making computing far more flexible and portable. Modular is doing something similar for AI: abstracting away low-level details so developers can focus on products, not plumbing. In a world where AI will run on every chip, in every data center, and at the edge, hand-tuning each model for every device becomes untenable. Modular has built a system that scales to meet the growing demands of AI, enabling teams to move quicker, reduce costs, and future-proof their businesses in an evolving and increasingly diverse model and hardware landscape. This is no small feat.

Modular set out on an audacious mission and assembled a team singularly built for the challenge. The team is led by Chris Lattner, who created core computing technologies such as Swift, LLVM, and MLIR that modern developers use every day, and Tim Davis, who helped lead Google’s product efforts in AI frameworks and low-level AI software. They are a rare combination who have a vision for the future of computing and know how to build it.

We’re thrilled to partner with Chris, Tim, and their team as they bring the vision of a unified compute layer to life.