Tesla Dojo: Elon Musk’s big plan to build an AI supercomputer, explained
For years, Elon Musk has talked about Dojo — the AI supercomputer that will be the cornerstone of Tesla’s AI ambitions. It’s important enough to Musk that he recently said the company’s AI team is going to “double down” on Dojo as Tesla gears up to reveal its robotaxi in October.
But what exactly is Dojo? And why is it so critical to Tesla’s long-term strategy?
In short: Dojo is Tesla’s custom-built supercomputer that’s designed to train its “Full Self-Driving” neural networks. Beefing up Dojo goes hand-in-hand with Tesla’s goal to reach full self-driving and bring a robotaxi to market. FSD, which is on about 2 million Tesla vehicles today, can perform some automated driving tasks, but still requires a human to be attentive behind the wheel.
Tesla delayed the reveal of its robotaxi, which was slated for August, to October, but both Musk’s public rhetoric and information from sources inside Tesla tell us that the goal of autonomy isn’t going away.
And Tesla appears poised to spend big on AI and Dojo to reach that feat.
Tesla’s Dojo backstory
Musk doesn’t want Tesla to be just an automaker, or even a purveyor of solar panels and energy storage systems. Instead, he wants Tesla to be an AI company, one that has cracked the code to self-driving cars by mimicking human perception.
Most other companies building autonomous vehicle technology rely on a combination of sensors to perceive the world – like lidar, radar and cameras – as well as high-definition maps to localize the vehicle. Tesla believes it can achieve fully autonomous driving by relying on cameras alone to capture visual data and then use advanced neural networks to process that data and make quick decisions about how the car should behave.
As Tesla’s former head of AI, Andrej Karpathy, said at the automaker’s first AI Day in 2021, the company is basically trying to build “a synthetic animal from the ground up.” (Musk had been teasing Dojo since 2019, but Tesla officially announced it at AI Day.)
Companies like Alphabet’s Waymo have commercialized Level 4 autonomous vehicles – which the SAE defines as a system that can drive itself without the need for human intervention under certain conditions — through a more traditional sensor and machine learning approach. Tesla has still yet to produce an autonomous system that doesn’t require a human behind the wheel.
About 1.8 million people have paid the hefty subscription price for Tesla’s FSD, which currently costs $8,000 and has been priced as high as $15,000. The pitch is that Dojo-trained AI software will eventually be pushed out to Tesla customers via over-the-air updates. The scale of FSD also means Tesla has been able to rake in millions of miles worth of video footage that it uses to train FSD. The idea there is that the more data Tesla can collect, the closer the automaker can get to actually achieving full self-driving.
However, some industry experts say there might be a limit to the brute force approach of throwing more data at a model and expecting it to get smarter.
“First of all, there’s an economic constraint, and soon it will just get too expensive to do that,” Anand Raghunathan, Purdue University’s Silicon Valley professor of electrical and computer engineering, told TechCrunch. “Some people claim that we might actually run out of meaningful data to train the models on. More data doesn’t necessarily mean more information, so it depends on whether that data has information that is useful to create a better model, and if the training process is able to actually distill that information into a better model.”
Raghunathan says despite these doubts, the trend of more data appears to be here for the short-term at least. And more data means more compute power needed to store and process it all to train Tesla’s AI models. That is where Dojo, the supercomputer, comes in.
What is a supercomputer?
Dojo is Tesla’s supercomputer system that’s designed to function as a training ground for AI, specifically FSD. The name is a nod to the space where martial arts are practiced.
A supercomputer is made up of thousands of smaller computers called nodes. Each of those nodes has its own CPU (central processing unit) and GPU (graphics processing unit). The former handles overall management of the node, and the latter does the complex stuff, like splitting tasks into multiple parts and working on them simultaneously. GPUs are essential for machine learning operations like those that power FSD training in simulation. They also power large language models, which is why the rise of generative AI has made Nvidia the most valuable company on the planet.
Even Tesla buys Nvidia GPUs to train its AI (more on that later).
Why does Tesla need a supercomputer?
Tesla’s vision-only approach is the main reason. The neural networks behind FSD are trained on vast amounts of driving data to recognize and classify objects around the vehicle and then make driving decisions. That means, when FSD is engaged, the neural nets have to collect and process visual data continuously at speeds that match the depth and velocity recognition capabilities of a human.
In other words, Tesla means to create a digital duplicate of the human visual cortex and brain function.
To get there, Tesla needs to store and process all the video data collected from its cars around the world and run millions of simulations to train its model on the data.
To give you a sense of scale, Tesla said that as of May 2024, Tesla vehicles with FSD version 12 have driven 300 billion miles already.
Tesla appears to rely on Nvidia to power its current Dojo training computer, but it doesn’t want to have all its eggs in one basket — not least because Nvidia chips are expensive. Tesla also hopes to make something better that increases bandwidth and decreases latencies. That’s why the automaker’s AI division decided to come up with its own custom hardware program that aims to train AI models more efficiently than traditional systems.
At that program’s core is Tesla’s proprietary D1 chips, which the company says are optimized for AI workloads.