Accelerating Physical AI Deployment by Closing the Gen2Real Gap

The Promise of Gen AI Synthetic Data

When generative video models like SORA and Runway first appeared, they hinted at something bigger than entertainment content creation: the possibility of implicit world modeling and near-limitless synthetic data generation for training physical AI. As a trusted partner for world modeling, Duality recognized this early potential and also highlighted some of the risks. Since then, the interest and pace of innovation in AI world modeling has accelerated. Generative world models, such as Cosmos, VJEPA-2, Genie, WorldLabs, and The Matrix, have emerged as the next stage — all of them capable of producing dynamic, interactive scenes from natural language or image prompts.

Like LLMs and vision foundation models, these generative systems offer striking generalization and ease of use. And, given the scarcity of high-quality, labeled, real-world datasets, their ability to generate “on-demand” data can significantly accelerate the development and safe deployment of robots and agentic workflows across a range of industry verticals from manufacturing, to logistics, to transportation, to defense, where physical AI is making rapid inroads.

But we are not there yet: major challenges remain in ensuring that the synthetic data produced by generative world models has accurate predictive value in terms of how the real world behaves. Currently, the data produced by generative world models simply isn’t reflective of the physical context being modeled. Unlike traditional simulators that construct a 3D scene and simulate what happens next using the laws of physics, they predict tokens of sensor output based on statistical correlations from training data. The result is that generative world models:

Lack true physical understanding of, and grounding in, the real world
Are prone to catastrophic failures and hallucinations when extrapolating beyond their training domain
Display wide disparity in understanding and performance within specific domains.
- Example: Information rich for scenarios featuring on-road driving within urban environments, but severely lacking for off-road autonomy in more organic biomes.

Collectively, we call these challenges the Gen2Real Gap — analogous to the better-known Sim2Real Gap faced by simulation-derived synthetic data.

Video 1. Illustrating the Gen2Real gap. Digital twins provide a high level of control over the scenario's initial conditions, while generative AI provides generalized synthetic data under varied conditions. Although the results appear realistic, the vehicle dynamics and wheel–terrain interactions diverge from the causal behavior dictated by physics, exhibiting a pronounced gen2real gap. [Model: RunwayML Gen-4]

Duality’s work in physical AI and autonomous robotics has time and again shown the effectiveness of rigorous and quantitative approaches for closing the sim2real gap [1, 2]. This deep experience, gained in partnership with our Falcon customers, has taught us valuable lessons in bridging the virtual and real. We believe that these approaches are directly applicable to overcoming the challenges of closing the gen2real gap, and, In turn, making generative world model data immediately usable and useful for training physical AI across a range of applications.

Closing the Gen2Real Gap

In navigating the potential of various approaches for closing the gen2real gap, it is first important to understand the fundamental differences between simulated vs generative world models. Generative and simulation approaches are both versions of world modeling and they can be mapped onto a spectrum of Implicit to Explicit, with each bringing its own intrinsic strengths — and trade-offs.

Explicit models (like physics-based digital twin simulators) model geometry, physics, and sensor behavior directly. This specificity results in high-accuracy and precise control.
Implicit models (like generative world models) offer increased generality and excel at diversity and accessibility but struggle with precision and grounding.

The spectrum between them (Fig 1) should not be viewed as binary or zero-sum but, instead, an opportunity:

Hybrid synthetic data pipelines and agentic workflows can combine the strengths of simulated and generative approaches in closing the gen2real gap and accelerate the deployment of safe and robust physical AI models.

Fig 1. World models exist on a spectrum of explicit to implicit. For any particular use case we can determine what blend of these approaches best fits that particular application.

Some examples of hybrid approaches include:

Using LLM-based twin and scenario generation to create varied, diverse simulation scenarios, followed by physics-based, deterministic simulation for accurate, grounded synthetic data output. This approach lowers the barrier to scenario creation while conserving the control and precision of explicit models.
Carrying out simulation with quicker to generate, lower-fidelity or proxy simulation assets, and then relying on implicit models to apply style-transfer techniques to add real-world fidelity.
Leveraging digital twin simulation to create domain specific datasets to post-train and ground foundation world models (Fig 2).

Fig 2. This type of hybrid approach maintains the flexibility and accessibility of a generative model while grounding the output with real-world physics and accurate, domain-specific data

These are not one-size-fits-all solutions. Each domain and use case — threat detection, off-road driving, industrial QA, etc — requires tailoring the pipeline to its data requirements. Finding the optimal path is limited solely by our imagination.

Validation of Hybrid Approaches

For vision models, we’ve already seen this play out with phenomenal results. Recently, at the 17th Annual Ground Vehicle Systems Engineering & Technology Symposium (GVSETS 2025), Duality’s work was selected as the Best Overall Technical Paper for demonstrating how combining generalized vision foundations models (VFMs) with domain-specific post-training dramatically improved their precision and robustness in real-world settings. Our early experiments show that by combining semantically rich, diverse, accurate and high fidelity digital twin simulation-derived synthetic datasets with limited real-world datasets, results in post-trained gen world models that gain both accuracy and grounding (Fig 3).

Fig 3. From Duality's paper "Fine-Tuning Foundation Models for Off-Road Autonomy with Digital Twin Simulation". Results showed optimal semantic segmentation performance when simulation data was combined with limited real-world datasets

One application of this is in the off-road autonomy for defense domain. A generative model like Cosmos can quickly produce broad, photo-real scenario variations. But grounding those scenarios in Falcon’s digital twin simulation ensures accurate vehicle dynamics and tire-terrain interaction, leading to a realistic assessment of drivability and safety (see video below). This approach harnesses the strengths of each approach, producing results that neither could generate on their own and thus accelerating the field deployment of the downstream AI model while also boosting its operational accuracy and robustness.

Need for Rigorous Metrics

Ultimately, progress requires rigor and objective measurement. Synthetic data must be evaluated not just for how “real” it looks, but for how predictive it is of real-world outcomes, and how valuable it is for building models that perform well when compared to real world data.

At Duality we developed the 3I Framework (a quantitative approach for measuring the quality of synthetic data based on its Indistinguishability, Information-richness, and Intentionality) as a systematic way to close the sim2real gap. It has been at the heart of our synthetic data success stories. We believe this framework applies equally well to the gen2real challenge.

Real-world testing of physical AI and autonomous systems under diverse conditions will always remain the final litmus test of any synthetic data approach. But a structured process of evaluation and iteration is the best way to ensure that generative synthetic data actually advances physical AI, rather than sending data curation and model building in circles.

Conclusion

Synthetic data is essential for training robust AI models for the physical world. The profound shortage of high-quality, labeled real-world data that’s necessary for realizing a future with resilient autonomous robots and agentic embodied AI is not viable without it. And its ability to close data gaps that limit training efficacy and ensure that deployed systems are safe, predictable and robust is a strength that cannot be left untapped.

With Falcon, digital twin simulation reduces data collection timelines from months or years to just a few weeks. Agentic workflows provide a solid framework to leverage hybrid approaches combining the intrinsic strengths of explicit and implicit modeling methods and grounding generative world models. Without compromising synthetic data quality, we can remove one of the main bottlenecks in synthetic data creation: the need to manually build full 3D simulation contexts. In practice, this means data generation timelines could shrink from weeks to mere hours, significantly accelerating how quickly physical AI models can be created, updated, and deployed.

World modeling has always been in Duality’s DNA. By combining generative AI and digital twin simulation techniques, we extend that vision of using virtual worlds to solve real-world problems. The approaches outlined here for closing the gen2real gap provide a clear path forward—allowing our customers to immediately begin leveraging generative world models safely, effectively, and cost-efficiently.

‍

If you have any questions or comments about this blog, or simply want to learn more about our work — we want to hear from you! Drop us a line here.

‍