Simulation has always been our best tool for predicting the future. We conduct simulation on grand scales, but also to carry out the most basic of human motor functions. We are constantly considering conditions and locations, and predicting behaviors and outcomes. Modern technology by no means invented simulation, but today’s tools constantly evolve it to make increasingly better predictions about ever more complex events.
These predictions, however, are only valuable if the simulation data has the required fidelity and precision to successfully translate to the real world. Put another way: when exploring any question of interest, the data generated by a simulation is only useful if it can approximate what would happen in the real world with a high degree of accuracy.
Consider a city planning project that may require decommissioning an existing bridge. City planners need to know the ripple effects this would have on nearby streets and neighborhoods. Or in another example, for a city area that is susceptible to flooding from a nearby river, can we precisely learn the time needed for evacuation and identify the bottlenecks in exit routes? For projects like these, generalized scenarios are not enough — real locations and real-world data are called for.
While it is not possible to remove a real bridge or cause a flood simply to learn the outcomes, we can cost effectively build and use Digital Site Twins (also referred to as simply Site Twins) — virtual environments, based on diverse sources of Geographic Information System (GIS) data of real locations. When semantic information from GIS data is joined with the infinitely modifiable nature of a digital twin, we open the door to any “what if'' questions relevant to that environment (limited only by the availability and quality of the GIS data sources). Whether testing autonomous systems deployment in a populated area or evaluating disaster response protocols of a real city, the fidelity of the digital twin virtual environment and its accurate representation of the real-world location are necessary to bridging the Sim2Real gap.
What makes Site Twins different from other virtual environments? How are they built? And how can we build them quickly? Creating Site Twins, after all, can be quite daunting. This is why Duality researched and developed an AI-powered pipeline, one that enables our customers to build any desired Site Twins from already available data. The short video below offers an overview of how Digital Site Twins and our GIS pipeline work. The work shown is from Duality’s project with our partners at Amazon Web Services. Using various available and derived datasets, Duality built a simulation of a 200 sq km city based on Melbourne, Australia, with the goal of exploring several real-world scenarios.
Over the course of three blog posts we’re going to take a deep dive into the concepts explored in the above video and cover every stage of our AI-powered GIS pipeline. Where does the data come from? How do we extract semantic information? How does it all come together into a single digital twin? How are dynamic elements like traffic and pedestrian crowds introduced?
Let’s begin by diving deeper into Digital Site Twins!
Every digital twin is its own entity capable of accurately representing its real-world twin (physically and behaviorally) in any context. This enables digital twin-based simulation to take a divide-and-conquer approach to achieving both required fidelity and accuracy while managing complexity. We can simply focus on the fidelity of each separate digital twin and have confidence that when they are all combined to build the scenario, the digital twins will perform faithfully to the real-world counterparts and the emergent complex behaviors will accurately represent the real-world outcomes. Conventional simulation approaches do not permit this level of modularity.
Just as digital twins are not simply 3D models, a true, simulation-ready Digital Site Twin is much more than a model built from a 3D scan of a location. For example, a predictive digital twin of a robot contains everything needed to faithfully recreate that robot’s real-world functionality in simulation. This includes the 3D mesh, correct textures, accurate physics and mechanics, virtual sensors, autonomy software, etc. Similarly, a digital twin of an environment needs to carry all the information, characteristics, and variations needed to use that environment in simulation, which goes far beyond a 3D mesh and a photoreal texture.
Site Twins must incorporate and align all of the source data about real locations into editable layers that can be virtually recombined and recomposed. A Site Twin’s utility in answering questions rests on its ability to represent hypothetical changes. In other words, to use the virtual environment for predictive simulation, we have to be able to easily and realistically alter that environment. But, if the virtual environment consists of a single mesh, this is not possible. We need the ability to move and alter every feature, e.g., trees, buildings, infrastructure networks, road grids, soil types, vegetation varieties, water levels, and much more. All of these must be separate, alterable objects.
Furthermore, depending on the use case, the Site Twin may need to carry more abstract data, i.e., temperature variations, moisture levels, seasonal population changes, daylight fluctuations, etc. — anything that may meaningfully affect the outcome of a scenario. For AI training data, it is critical that we can segment and annotate objects with flexibility on semantic labels and classes.
This is also the reason why tools like Google Earth, while technically impressive and navigationally useful, cannot be used for simulation because they lack the physics, interactivity, configurability, and most of the real-world derived information needed to generate useful predictive data.
The real-world data at the core of the Site Twin enables a holistic understanding of how that environment functions and makes the Site Twin an infinitely reusable simulation resource. Everytime we wish to ask a new predictive question, the semantic information at the heart of the twin enables us to introduce a meaningful edit to the Site Twin, with assurance that these changes will propagate to all aspects of the Site Twin (including the dynamic elements like crowds and traffic) in a realistic manner.
The foundational real data requirement means that we need data sources (like satellite or aerial imagery) with a high enough resolution to preserve needed information. With higher resolution images, we can create higher fidelity Site Twins that more faithfully represent the real-world environment.
Our customers’ diverse use cases have provided us with numerous concrete examples of the Digital Site Twin creation process. As we walk through specific steps of the pipeline, we’ll illustrate them with images from different projects, including:
• Farmland Site Twin from midwestern U.S. built for drone-based detection training
• Wilderness environment Site Twin from southern U.S. built for off-road autonomy simulation
• City based on Melbourne, Australia from our collaboration with the Amazon Web Services SimSpace Weaver team (introduced in the above video). A 200 sq km environment built via various available and derived datasets and aimed at exploring the following three real-world scenarios:
So, what is the approach for gathering the needed data for a Site Twin? After all, we can’t exactly drive a fleet of sensor equipped vehicles into the middle of the Amazon rainforest. In most cases, we rely on sources of aerial and satellite imagery. But images are simply that: two-dimensional bitmaps of color variations.
The main challenge in creating Site Twins is how to extract accurate semantic information from these images. This information is then combined with other GIS data sources (digital elevation and surface models) to procedurally reconstruct the location as an interactive 3D environment. However, manually extracting semantic information from the available imagery and datasets is a very labor and time-intensive task. More importantly, it's highly susceptible to error.
This is the challenge we are solving with our end-to-end, AI/ML-driven GIS pipeline. We use this pipeline and multiple data sources to extract semantic information about a geolocation, and use that information to recreate the environment to the fidelity required by the use case.
It is important to note that GIS data is notoriously unstructured, and most prominent data sources use their own approaches. Since we generally rely on multiple sources for the comprehensive data needed for the Site Twin, the data input to the pipeline is often quite heterogeneous. As a result, our pipeline has to be flexible enough to incorporate all kinds of data and then produce a consistent output that can be easily interpreted in Falcon (our digital twin simulator).
In the next section, we examine the data needs of Site Twins.
Both types of data can be open source or available for purchase, but they have some significant differences.
Satellite data is routinely produced by various agencies.
Aerial data, from drones and other aircraft, is produced closer to the ground and, therefore, provides better resolution than satellite images.
Most satellite and aerial imagery is recorded with standard camera sensors that yield images in the spectrum visible to the human eye. These images are seminal for building any Site Twin for reasons that require little explanation. However, some satellite and aerial sources are equipped with sensors that record additional bands outside of the visible spectrum (i.e., Near Infrared, Shortwave Infrared, etc.). This multispectral data can be a very useful component for mining deeper information about the environment. One such uses is illusrated below, where color infrared allows us to distinguish vegetation from water, and observe relative vegetation health.
The above mentioned data does not need to be limited to a single point in time. If a Site Twin will be used to study changes over time, or needs to accurately reflect seasonal variations, temporally spaced data becomes necessary. An example of this can be the same location that is surveyed regularly over several years to detect changes, such as land coverage, climate change, erosion, etc.
Periodic lidar mapping has been commonplace for many years. For example, the USGS has been repeatedly mapping the entirety of the United States and makes this data available for free. But this type of data can also be purchased from satellite and aerial mapping companies.
Lidar Point Clouds from these surveys are processed to generate two models:
Techniques and methods for extracting or deriving semantic information from source geospatial data are numerous and diverse. Our systematic experimentation with a large variety of these approaches has incorporated extensive training of various machine learning models. Specific approach is determined by two factors: (1) Intended use of the Site Twin, and (2) the variety and quality of the available imagery and datasets.
Here, it is important to note that some of this semantic information can also be purchased from GIS data providers. At Duality, we chose to build our own pipeline for the following reasons:
We primarily carry out three types of deep learning analysis: Classification, Object Detection, and Semantic Segmentation.
In the first step, we apply machine learning (ML) models to break down RGB images into identifiable areas of their defining, continuous features. This step makes it possible to delineate areas like forests, lakes, rivers, fields, etc. Often they can be more precisely identified with specific soil types, waterbody classification, vegetation varieties, and so on.
Some of the Statistical Analysis and ML models we use include ISO Clustering, KNN, Max Likelihood, and SVM, among others. Choosing a specific one depends on the desired precision of the Site Twin and the imagery we have access to. Factors like resolution, amount of cloud cover in the image, shadows, and more determine which model will yield superior results.
When we have access to higher resolution imagery, we can use object detection and semantic segmentation to extract much more precise data about specific objects and features. This includes buildings, trees, infrastructure towers, vehicles, ships, birds, etc. This in turn enables us to run statistical analysis on the image to learn some useful characteristics of the terrain. A few examples:
Object Detection can identify specific, distinct objects or entities, and features in the raw RGB image data. This includes building footprints, trees, roads, trails, etc. We leverage a variety of models, including SSD, YOLO3, and RetinaNet, among others. As before, intended use of the Site Twin, along with image source and quality will dictate which models will yield more precise results.
Semantic Segmentation can identify all of the instances of any object in the image. Even more significantly, instead of bounding boxes, Semantic Segmentation yields precise shapes, dimensions, and coordinates of all the instances of various objects in the image. We primarily rely on MaskRCNN for Semantic Segmentation.
With access to multispectral data such as near-infrared or short-wave infrared, we can calculate various band indices to run analysis that yields finer details about the chosen location. For example, we can determine vegetation health, infer atmospheric and soil moisture levels, and even estimate soil iron content.
Let's take a look at some examples.
The first example shows the Normalized Difference Vegetation Index (NDVI): Analysis of the levels of chlorophyll content to infer vegetation health, which informs the vegetation modeling in the site twin. NDVI relies on red and near-infrared (NIR) bands.
In the second example, we look at the Normalized Difference Moisture Index (NDMI): Analysis of humidity and moisture in the area, which similarly informs moisture inclusion in the Site Twin. NDMI relies on near-infrared (NIR) and short-wave infrared (SWIR) bands.
Lastly, we show the Agriculture Index: Analysis of presence of active agricultural activity informs the state of the agricultural lands to include in the Site Twin. Agriculture index relies on NIR, SWIR, and Blue bands.
These are just three examples, and the more multispectral data we have, the more information can be learned about the location via various band indices.
Change detection is employed with time-series data we discussed earlier, and is carried out by a deep learning model. We primarily use STANet on the imagery taken at different time periods, and take the delta of what has changed.
The above steps all work towards a single goal: taking a large variety of (1) unstructured data from various sources, and (2) salient features derived via various mechanisms, and use them to produce a rich, informative, and unified dataset. In Part II of this blog series, we will show how procedural work flows are used to build out any Site Twin from this comprehensive dataset. We will cover how all the necessary assets are brought together in Falcon and Unreal Engine, and the data structure that makes the final result simulation-ready. In Part III, we will address how we implement dynamic features like AI pedestrian crowds, vehicular traffic and more, and show how the output of our GIS pipeline makes it easy to implement these dynamic systems in a seamless and modular way.
Below we wrap up Part I with a few examples of Site Twins built from different data sources and to different fidelity as dictated by each use case.
Unlike the first two examples, the Melbourne environment required high-fidelity and large scale for investigating scenarios with massive numbers of concurrently simulated agents. However, as it was not used to visual sensor data generation, photorealism was not required. Site twins are always built with consideration for detail needs and resource availability.
For the final results of this site twin, please revisit the video in the introduction of this blog. And stay tuned for Part II of the GIS-pipeline series!
To get hands-on experience with digital twin simulation in these kinds of site twin environments, make sure to sign up to test drive Falcon for free with our full-featured browser-based sandbox scenarios (coming soon!).
If you wish to learn more about leveraging Duality's GIS pipeline, or any other Falcon capabilities, for your projects, simply reach out to email@example.com .