When generative video models like SORA and Runway first appeared, they hinted at something bigger than entertainment content creation: the possibility of implicit world modeling and near-limitless synthetic data generation for training physical AI. As a trusted partner for world modeling, Duality recognized this early potential and also highlighted some of the risks. Since then, the interest and pace of innovation in AI world modeling has accelerated. Generative world models, such as Cosmos, VJEPA-2, Genie, WorldLabs, and The Matrix, have emerged as the next stage — all of them capable of producing dynamic, interactive scenes from natural language or image prompts.
Like LLMs and vision foundation models, these generative systems offer striking generalization and ease of use. And, given the scarcity of high-quality, labeled, real-world datasets, their ability to generate “on-demand” data can significantly accelerate the development and safe deployment of robots and agentic workflows across a range of industry verticals from manufacturing, to logistics, to transportation, to defense, where physical AI is making rapid inroads.
But we are not there yet: major challenges remain in ensuring that the synthetic data produced by generative world models has accurate predictive value in terms of how the real world behaves. Currently, the data produced by generative world models simply isn’t reflective of the physical context being modeled. Unlike traditional simulators that construct a 3D scene and simulate what happens next using the laws of physics, they predict tokens of sensor output based on statistical correlations from training data. The result is that generative world models:
Collectively, we call these challenges the Gen2Real Gap — analogous to the better-known Sim2Real Gap faced by simulation-derived synthetic data.
Duality’s work in physical AI and autonomous robotics has time and again shown the effectiveness of rigorous and quantitative approaches for closing the sim2real gap [1, 2]. This deep experience, gained in partnership with our Falcon customers, has taught us valuable lessons in bridging the virtual and real. We believe that these approaches are directly applicable to overcoming the challenges of closing the gen2real gap, and, In turn, making generative world model data immediately usable and useful for training physical AI across a range of applications.
In navigating the potential of various approaches for closing the gen2real gap, it is first important to understand the fundamental differences between simulated vs generative world models. Generative and simulation approaches are both versions of world modeling and they can be mapped onto a spectrum of Implicit to Explicit, with each bringing its own intrinsic strengths — and trade-offs.
The spectrum between them (Fig 1) should not be viewed as binary or zero-sum but, instead, an opportunity:
Hybrid synthetic data pipelines and agentic workflows can combine the strengths of simulated and generative approaches in closing the gen2real gap and accelerate the deployment of safe and robust physical AI models.

Some examples of hybrid approaches include:

These are not one-size-fits-all solutions. Each domain and use case — threat detection, off-road driving, industrial QA, etc — requires tailoring the pipeline to its data requirements. Finding the optimal path is limited solely by our imagination.
For vision models, we’ve already seen this play out with phenomenal results. Recently, at the 17th Annual Ground Vehicle Systems Engineering & Technology Symposium (GVSETS 2025), Duality’s work was selected as the Best Overall Technical Paper for demonstrating how combining generalized vision foundations models (VFMs) with domain-specific post-training dramatically improved their precision and robustness in real-world settings. Our early experiments show that by combining semantically rich, diverse, accurate and high fidelity digital twin simulation-derived synthetic datasets with limited real-world datasets, results in post-trained gen world models that gain both accuracy and grounding (Fig 3).

One application of this is in the off-road autonomy for defense domain. A generative model like Cosmos can quickly produce broad, photo-real scenario variations. But grounding those scenarios in Falcon’s digital twin simulation ensures accurate vehicle dynamics and tire-terrain interaction, leading to a realistic assessment of drivability and safety (see video below). This approach harnesses the strengths of each approach, producing results that neither could generate on their own and thus accelerating the field deployment of the downstream AI model while also boosting its operational accuracy and robustness.
Ultimately, progress requires rigor and objective measurement. Synthetic data must be evaluated not just for how “real” it looks, but for how predictive it is of real-world outcomes, and how valuable it is for building models that perform well when compared to real world data.
At Duality we developed the 3I Framework (a quantitative approach for measuring the quality of synthetic data based on its Indistinguishability, Information-richness, and Intentionality) as a systematic way to close the sim2real gap. It has been at the heart of our synthetic data success stories. We believe this framework applies equally well to the gen2real challenge.
Real-world testing of physical AI and autonomous systems under diverse conditions will always remain the final litmus test of any synthetic data approach. But a structured process of evaluation and iteration is the best way to ensure that generative synthetic data actually advances physical AI, rather than sending data curation and model building in circles.
Synthetic data is essential for training robust AI models for the physical world. The profound shortage of high-quality, labeled real-world data that’s necessary for realizing a future with resilient autonomous robots and agentic embodied AI is not viable without it. And its ability to close data gaps that limit training efficacy and ensure that deployed systems are safe, predictable and robust is a strength that cannot be left untapped.
With Falcon, digital twin simulation reduces data collection timelines from months or years to just a few weeks. Agentic workflows provide a solid framework to leverage hybrid approaches combining the intrinsic strengths of explicit and implicit modeling methods and grounding generative world models. Without compromising synthetic data quality, we can remove one of the main bottlenecks in synthetic data creation: the need to manually build full 3D simulation contexts. In practice, this means data generation timelines could shrink from weeks to mere hours, significantly accelerating how quickly physical AI models can be created, updated, and deployed.
World modeling has always been in Duality’s DNA. By combining generative AI and digital twin simulation techniques, we extend that vision of using virtual worlds to solve real-world problems. The approaches outlined here for closing the gen2real gap provide a clear path forward—allowing our customers to immediately begin leveraging generative world models safely, effectively, and cost-efficiently.
If you have any questions or comments about this blog, or simply want to learn more about our work — we want to hear from you! Drop us a line here.