In my last blog post, I laid out an argument for why startup founders should not try to build a business around selling RL gyms to AI labs. One of the reasons basically amounted to: this is a bubble, the prices being paid for these things make no sense, and this apparent labor "arbitrage" cannot last. In this post, I want to consider the reverse: what if the hype and seemingly endless cash surrounding RL environments (which are, at bottom, sparkling data annotation) is completely rational?
Why RL Environments FEEL Mispriced
The impetus for writing my last post was a gut feeling more than anything. A feeling that the amount of cash sloshing around the AI ecosystem, especially for producing mediocre-quality training data, is... off. If you asked me to support this feeling with an argument, I would tell you to read my previous blog post, but if you didn't want to do that, one argument would go something like this:
- (1) RL environments are strictly easier to devise, build, and maintain than real software.
- (2) The going prices for these "toy" pieces of software are high, considering this simplicity.
- (3) This is a market inefficiency that will be corrected, which will drive the price down and eliminate the "arbitrage."
I think (1) is just obviously true. If you are building software for AIs to play with, it doesn't need to be secure, it doesn't need to do things in the real world (like send e-mails or make payments), it doesn't need to have 99.99% uptime, and many features can be simulated rather than built.
More importantly, fake software made for AIs doesn't need to have product-market fit or users. It doesn't need to solve a problem or provide value, it doesn't need to be elegant or fast, and it doesn't need to be better than existing solutions. In fact, many RL gyms are just pale imitations of existing software! In such cases, the less creativity, the better.
However, there's room for disagreement on (2) and (3), which I'll elaborate on.
Value is Unrelated to Labor
One reason for wanting to believe that RL gyms are mispriced is that it feels like a "glitch": people who know about this glitch can go make fake software for AIs, and get a lot of money for it. Meanwhile, people who don't know about the glitch are working really hard trying to make real software. If they're startup founders, they might get $0 for that software, if no one likes it. If they're an engineer, they'll get a couple hundred bucks an hour. But if they'd spent those hours making fake software for AIs, maybe they could have made way more money! If this "glitch" is, in fact, a market inefficiency, then as soon as the poor folks who don't know what an RL gym is find out, it's over.
On the other hand, maybe it's not an inefficiency? It is important to remember, when you're annoyed that someone is making a lot of money for doing something that looks easy, that the value of work is only loosely correlated (if at all) with difficulty and effort. I can spend 80 hours a week making handmade beeswax candles, and I will get paid less than someone who does a podcast for 1 hour. Even if it's "easier" to make fake software, if the value produced is greater, then it's not mispriced. So maybe, just maybe, the market is telling us that making fake software is super valuable, thank you very much. And by the way, it's not as though fake Salesforce is selling for more than real Salesforce—it is discounted appropriately!
You Will Own Nothing, And You Will Draw Boxes
You might still find it hard to believe that all this fake, disposable software is accurately priced, especially at a time when AI makes it easier than ever to build. But maybe there's still a good reason for it.
Suppose the following were true: rather than the market mispricing RL gyms, the market has misallocated labor. We've got hundreds of thousands of people writing papers no one will ever read, going to law school because they were bored, learning to write code for a software job that might not exist in 5 years... maybe the eye-popping prices for data and gyms are the invisible hand, beckoning these folks to come make training data instead. Pssst. Come do Ranking and Rating. Come make a copy of Doordash that no one will ever use. We promise it's not boring.

Glamorizing it isn't enough—if you want to reorganize the economy around labeling data, you'll have to pay for it. And this makes sense to do if you believe: (a) we are on track to build transformative AI that will do a bunch of crazy stuff like cure cancer, automate all jobs, and take us to Mars (that sounds pretty valuable!); and (b) the timeline to that transformative AI is a long grind that basically involves painstakingly teaching the AI to do each job that humans currently do. Without (a), there's no reason to dump so much money into this stuff. Without (b)—for instance, if you believe in AGI 2026—the economy won't have time to adapt a whole lot, although you could still view the current state as "surge pricing" (ha) to squeeze out the last bit of juice and get to transformative AI first.
Honestly, it's not insane to believe both, especially as certain pie-in-the-sky promises about scaling our way to superhuman intelligence by stacking more layers have (so far) failed to materialize. Maybe it will be a grind, and to get there, we'll all have to draw boxes for a living. Sounds kind of bleak. Hopefully I'm independently wealthy by then.
So Should I Build an RL Environment Startup?
If you believe in the future I just outlined, you should probably try to get your hands on some shares of Mercor as quickly as possible. Chop chop! Get your slice of the pie, or serfdom awaits!


