Is Decart’s Oasis 3 the Future of Physical AI?

Is Decart’s Oasis 3 the Future of Physical AI?

Simon Glairy is a preeminent voice in the intersection of insurance technology and artificial intelligence, specializing in how digital simulations influence real-world risk management. With a career dedicated to evaluating the reliability of “digital twins” and autonomous systems, he brings a skeptical yet optimistic eye to the latest breakthroughs in generative world models. As Decart unveils Oasis 3, a platform capable of simulating hours of photorealistic driving for a fraction of the usual cost, Glairy is uniquely positioned to dissect whether these “dream-like” environments are ready to handle the high-stakes liability of the automotive and robotics industries.

This dialogue explores the shift from static video generation to interactive physical AI, examining the technical wizardry of the Decart Optimization Stack and the economic implications of a $4 billion valuation fueled by giants like Toyota and Nvidia. We delve into the sensory experience of navigating a model that generates 8,000 tokens per frame and the current “memory” limitations that cause a crisp New York morning to dissolve into a generic urban blur. Finally, the conversation addresses the critical data gap between “good driving” and accidents, and how a developer ecosystem of 100,000 users might eventually bridge the divide between simulation and reality.

How do you evaluate the balance between the striking photorealism of Oasis 3 and the functional necessity of simulating rare, high-risk driving scenarios?

The allure of photorealism in a world model is undeniable, especially when you see a crisp morning in New York rendered with such fidelity that it feels indistinguishable from a dashcam feed. However, from a risk management perspective, the “beauty” of the simulation is secondary to its structural integrity, particularly when we are looking at the three-camera setup—one front-facing and two side-facing views—that Decart provides. To truly train an autonomous vehicle, we need the model to do more than just look right; it must behave right when things go wrong. Decart is offering this at a remarkable $0.02 per second, which democratizes the ability to run these “edge case” scenarios infinitely. The challenge I see is that while the visuals are breathtaking, the system currently struggles with physics, sometimes allowing cars to ghost through one another because the training data is heavily skewed toward “good driving” rather than the messy reality of collisions.

Decart recently secured $300 million in funding and a $4 billion valuation with backing from Toyota and Nvidia; what does this strategic alignment reveal about the future of physical AI?

This level of investment from hardware and automotive titans suggests that the industry is moving away from purely research-based AI toward vertically integrated, “physical” applications. What is most impressive is that Decart has burned through drastically less than $100 million in its lifetime, showcasing a level of capital efficiency that is rare in the high-compute world of generative models. By building on their Decart Optimization Stack (DOS), they are making these models run efficiently across Nvidia, Amazon, and Google hardware, which is a major signal to investors like Adobe and eBay that the technology is scalable. For an insurance expert, this signal is clear: the infrastructure for testing autonomous systems is becoming significantly cheaper—more than an order of magnitude cheaper than rivals—which will inevitably lead to a massive influx of synthetic training data.

The phenomenon of the environment degrading from a specific city into a “disjointed stream of consciousness” is fascinating—how do you see this memory limitation affecting the reliability of long-term simulations?

The degradation issue is a significant hurdle because it reflects the inherent struggle of autoregressive models to maintain a coherent narrative over long periods. When you are generating roughly 8,000 tokens per frame at tens of frames per second, you are looking at hundreds of thousands of tokens every second, and that context window fills up with terrifying speed. It is a sensory experience akin to a dream where you turn a corner and the intersection you just left has vanished, replaced by a generic urban landscape. For developers, this means that while the model is “infinite,” its consistency is currently tethered to a very short-term memory. Until the team at Decart can successfully compress that memory or expand the window to store millions more tokens, these simulations will remain better suited for short, tactical maneuvers rather than long-range navigation testing.

Given that the model often fails to simulate basic physics, such as cars driving through one another, how can companies use this for safety training without reinforcing “impossible” behaviors?

This is the “major research problem” that the industry is currently cracking, and it stems from a fundamental lack of data on accidents compared to the abundance of smooth, incident-free driving footage. When a world model doesn’t understand that two solid objects cannot occupy the same space, it creates a “physics gap” that can be dangerous if used for end-to-end training. However, the move toward allowing users to start generations from a video of a real environment, rather than just a text prompt or image, should help ground the model in real-world constraints. We have to treat these early iterations like a flight simulator that occasionally ignores gravity; they are useful for practicing the controls and visual recognition, but we cannot yet trust them to teach the vehicle the visceral consequences of a high-speed impact.

Decart is actively fostering a community of over 100,000 developers who are already building on their foundation models; what kind of “surprise” applications do you expect to see in the coming months?

The shift toward a developer-first API is a page straight out of the OpenAI playbook, and it is where the most creative risk-reduction tools will likely emerge. We are already seeing developers leverage the “Lucy” model for e-commerce and livestreaming, but the leap into Oasis 3 opens the door for hyper-localized simulations where a user could potentially upload a video of their own neighborhood to test how a delivery robot would handle those specific curbs. I expect we will see a lot of “synthetic niche” environments—simulating heavy rain in a specific desert town or a sudden blackout in a crowded metro area. With the cost being so low, the barrier to entry for a small startup to build a specialized safety-testing suite is virtually gone, and that is where the real innovation in autonomous reliability will happen.

What is your forecast for the evolution of interactive world models in the next few years?

I believe we are on the cusp of a “consistency revolution” where world models will move past this dream-like state and into a phase of persistent digital twin reality. Within the next twenty-four to thirty-six months, the integration of longer context windows will allow these models to “remember” a city layout for hours, enabling a vehicle to drive 50 miles in a simulation and return to the exact same starting point without a single pixel being out of place. As Decart and their competitors solve the physics of collisions, these models will become the primary “driving range” for every autonomous system on the planet, eventually making real-world road testing a final exam rather than a daily classroom. We are moving toward a world where “driving” is first perfected in a trillion miles of perfect, persistent digital dreams before the rubber ever hits the actual asphalt.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later