Matrices - Training Environments for LLM Agents

Since March, we've partnered with frontier AI labs to help train computer use agents in the vein of ChatGPT Agent.

Along the way, we've realized that no more surprising research breakthroughs are needed to automate most of the work that humans do today. The bottleneck is now engineering. But there's a lot of it to do.

For most of LLM history, progress has been made primarily from scaling up SFT, which is about learning to mimic humans directly from records of human behavior. But over the last couple of years, the most exciting progress in AI capabilities has been through RL, which is about allowing the AI to learn via trial and error instead of mimicry. The analogy to human learning is straightforward; humans can become competent from trying to mimic experts, but they only become great after experimenting on their own.

RL has taken AI to near-superhuman level at competitive math and coding. But when it comes to the type of work that humans actually want automated, like doing taxes, AI is still underwhelming.

This is because in order to RL on a task, a model needs to be placed in an environment where it can practice doing that task thousands of times, receiving feedback on its performance with each attempt. Such environments are easy to create for competitive math/coding. But you can imagine the challenge of creating an environment where AI can try doing your taxes thousands of times. How many times will you allow an AI to sign in to your Robinhood account to get your 1099-B, or email your boss because the W-2 is late, or actually file the taxes and hear feedback from the government?

The AI needs to go through all of these motions in the training process, but you can't allow real-world side effects. Which means that this universe the AI is interacting with needs to be simulated.

You can't fit the whole universe into one container, so in practice you need to set it up like an escape room. All the people and websites the AI might need to interact with seem to be operating as they should, but it's all smoke and mirrors, and nothing far beyond the rails actually exists. Ideally, the working scope of the simulation should be wide enough to cover anything the AI might plausibly experiment with, so that it never needs to know it's living in the Matrix.

Scaling up

We currently have a few hundred of these virtual escape rooms. To automate all work, we'll need to build millions. Doctors' offices with angry patients, court systems with ongoing disputes, startup landscapes with VCs to pitch to and competitors to scope out. We can't build all these simulations on our own; we'll need the help of the world.

Matrices is an ecosystem for building these simulations at scale. The core platform looks a lot like a level editor of a giant video game. The level editor is nothing without the people using it, who can be broken into two core personas:

Engineers that build configurable components (simulations of web apps like Gmail, Salesforce, etc.) that can be dropped into levels
Creative people who design and test levels for AI to solve, using those components built by the engineers

We will soon open up our level editor to the world, so that anyone can get paid to adopt one of these personas and help train AI.

We'll need to scale up not only the quantity of simulations, but also the complexity of each one. Today our escape rooms are short, taking under an hour each for humans to solve. Before we can consider all work to be automated, we'll need to have rooms where the objective is to build and exit a billion dollar SaaS startup. The AI will need to do market research on simulated landscapes, write and execute real code, ship it to simulated users, running ads on simulated marketplaces, raise money from simulated VCs, and more. It's a long road to get there, but none of the challenges are insurmountable.

It's hard to overstate the importance of this effort. We believe that several years from now, a large fraction of the world economy will be dedicated to designing these challenges for AI to solve. It will be among the last work that humans do.

If you'd like to be a pioneer in this space, we're hiring engineers and detail-oriented task creators. We're still a tiny team, yet we have 7-figure contracts with AI labs and we pay top-of-market compensation. We're backed by Index Ventures (who led our $5M seed round), AI Grant (fund by Nat Friedman and Daniel Gross), and Naval Ravikant. You can reach out to us from here.