Geospatial to Simulation: How To Build Digital Twins of Real-World Places

*Duality’s AI-powered Geographic Information System* (GIS) pipeline allows us to extract semantic information from a satellite image (left) and use it to build a digital twin of any environment (middle) with as much details as needed for any use case (right).

Simulation has always been our best tool for predicting the future. We conduct simulation on grand scales, but also to carry out the most basic of human motor functions. We are constantly considering conditions and locations, and predicting behaviors and outcomes. Modern technology by no means invented simulation, but today’s tools constantly evolve it to make increasingly better predictions about ever more complex events.

These predictions, however, are only valuable if the simulation data has the required fidelity and precision to successfully translate to the real world. Put another way: when exploring any question of interest, the data generated by a simulation is only useful if it can approximate what would happen in the real world with a high degree of accuracy.

Consider a city planning project that may require decommissioning an existing bridge. City planners need to know the ripple effects this would have on nearby streets and neighborhoods. Or in another example, for a city area that is susceptible to flooding from a nearby river, can we precisely learn the time needed for evacuation and identify the bottlenecks in exit routes? For projects like these, generalized scenarios are not enough — real locations and real-world data are called for.

While it is not possible to remove a real bridge or cause a flood simply to learn the outcomes, we can cost effectively build and use Digital Site Twins (also referred to as simply Site Twins) — virtual environments, based on diverse sources of Geographic Information System (GIS) data of real locations. When semantic information from GIS data is joined with the infinitely modifiable nature of a digital twin, we open the door to any “what if'' questions relevant to that environment (limited only by the availability and quality of the GIS data sources). Whether testing autonomous systems deployment in a populated area or evaluating disaster response protocols of a real city, the fidelity of the digital twin virtual environment and its accurate representation of the real-world location are necessary to bridging the Sim2Real gap.

What makes Site Twins different from other virtual environments? How are they built? And how can we build them quickly? Creating Site Twins, after all, can be quite daunting. This is why Duality researched and developed an AI-powered pipeline, one that enables our customers to build any desired Site Twins from already available data. The short video below offers an overview of how Digital Site Twins and our GIS pipeline work. The work shown is from Duality’s project with our partners at Amazon Web Services. Using various available and derived datasets, Duality built a simulation of a 200 sq km city based on Melbourne, Australia, with the goal of exploring several real-world scenarios.

Over the course of three blog posts we’re going to take a deep dive into the concepts explored in the above video and cover every stage of our AI-powered GIS pipeline. Where does the data come from? How do we extract semantic information? How does it all come together into a single digital twin? How are dynamic elements like traffic and pedestrian crowds introduced?

Part I examines the data needs for a Site Twin; explores AI methods for extracting semantic information from diverse data sources; and showcases examples of several Site Twins built based on this data.
Part II, we will dive deeper into procedural workflows for building the site twins and ensuring that they are simulation ready.
Part III, we will explore incorporating additional dynamic layers to make the simulation complete, including AI crowds and vehicles, time of day changes, weather patterns, and more.

Let’s begin by diving deeper into Digital Site Twins!

Site Twins and Their Applications

Every digital twin is its own entity capable of accurately representing its real-world twin (physically and behaviorally) in any context. This enables digital twin-based simulation to take a divide-and-conquer approach to achieving both required fidelity and accuracy while managing complexity. We can simply focus on the fidelity of each separate digital twin and have confidence that when they are all combined to build the scenario, the digital twins will perform faithfully to the real-world counterparts and the emergent complex behaviors will accurately represent the real-world outcomes. Conventional simulation approaches do not permit this level of modularity.

Just as digital twins are not simply 3D models, a true, simulation-ready Digital Site Twin is much more than a model built from a 3D scan of a location. For example, a predictive digital twin of a robot contains everything needed to faithfully recreate that robot’s real-world functionality in simulation. This includes the 3D mesh, correct textures, accurate physics and mechanics, virtual sensors, autonomy software, etc. Similarly, a digital twin of an environment needs to carry all the information, characteristics, and variations needed to use that environment in simulation, which goes far beyond a 3D mesh and a photoreal texture.

Site Twins must incorporate and align all of the source data about real locations into editable layers that can be virtually recombined and recomposed. A Site Twin’s utility in answering questions rests on its ability to represent hypothetical changes. In other words, to use the virtual environment for predictive simulation, we have to be able to easily and realistically alter that environment. But, if the virtual environment consists of a single mesh, this is not possible. We need the ability to move and alter every feature, e.g., trees, buildings, infrastructure networks, road grids, soil types, vegetation varieties, water levels, and much more. All of these must be separate, alterable objects.

Furthermore, depending on the use case, the Site Twin may need to carry more abstract data, i.e., temperature variations, moisture levels, seasonal population changes, daylight fluctuations, etc. — anything that may meaningfully affect the outcome of a scenario. For AI training data, it is critical that we can segment and annotate objects with flexibility on semantic labels and classes.

This is also the reason why tools like Google Earth, while technically impressive and navigationally useful, cannot be used for simulation because they lack the physics, interactivity, configurability, and most of the real-world derived information needed to generate useful predictive data.

The real-world data at the core of the Site Twin enables a holistic understanding of how that environment functions and makes the Site Twin an infinitely reusable simulation resource. Everytime we wish to ask a new predictive question, the semantic information at the heart of the twin enables us to introduce a meaningful edit to the Site Twin, with assurance that these changes will propagate to all aspects of the Site Twin (including the dynamic elements like crowds and traffic) in a realistic manner.

The foundational real data requirement means that we need data sources (like satellite or aerial imagery) with a high enough resolution to preserve needed information. With higher resolution images, we can create higher fidelity Site Twins that more faithfully represent the real-world environment.

Our customers’ diverse use cases have provided us with numerous concrete examples of the Digital Site Twin creation process. As we walk through specific steps of the pipeline, we’ll illustrate them with images from different projects, including:

• Farmland Site Twin from midwestern U.S. built for drone-based detection training
• Wilderness environment Site Twin from southern U.S. built for off-road autonomy simulation
• City based on Melbourne, Australia from our collaboration with the Amazon Web Services SimSpace Weaver team (introduced in the above video). A 200 sq km environment built via various available and derived datasets and aimed at exploring the following three real-world scenarios:

Understanding evacuation timelines for residents fleeing a flooding neighborhood
Evaluating the ripple effects of a bridge closure on nearby traffic patterns
Analyzing the crowd flow from and around a stadium after a sold-out event

Data Builds (Simulation-Ready) Worlds

So, what is the approach for gathering the needed data for a Site Twin? After all, we can’t exactly drive a fleet of sensor equipped vehicles into the middle of the Amazon rainforest. In most cases, we rely on sources of aerial and satellite imagery. But images are simply that: two-dimensional bitmaps of color variations.

The main challenge in creating Site Twins is how to extract accurate semantic information from these images. This information is then combined with other GIS data sources (digital elevation and surface models) to procedurally reconstruct the location as an interactive 3D environment. However, manually extracting semantic information from the available imagery and datasets is a very labor and time-intensive task. More importantly, it's highly susceptible to error.

This is the challenge we are solving with our end-to-end, AI/ML-driven GIS pipeline. We use this pipeline and multiple data sources to extract semantic information about a geolocation, and use that information to recreate the environment to the fidelity required by the use case.

It is important to note that GIS data is notoriously unstructured, and most prominent data sources use their own approaches. Since we generally rely on multiple sources for the comprehensive data needed for the Site Twin, the data input to the pipeline is often quite heterogeneous. As a result, our pipeline has to be flexible enough to incorporate all kinds of data and then produce a consistent output that can be easily interpreted in Falcon (our digital twin simulator).

In the next section, we examine the data needs of Site Twins.

Data Sources and Types

Satellite and Aerial Imagery

Both types of data can be open source or available for purchase, but they have some significant differences.

Satellite data is routinely produced by various agencies.

Resolution between 30 m and 1 m
Readily available, up-to-date, and often free

Aerial data, from drones and other aircraft, is produced closer to the ground and, therefore, provides better resolution than satellite images.

Enables higher fidelity Site Twins due to resolution as close as 4 cm
Data is available for purchase with price correlated to map resolution

*Left: 10 m resolution Sentinel-2 Satellite Imagery of Melbourne, Australia. Right: 5 cm resolution Nearmap Aerial Imagery of the same area. Source: Sentinel-2, ESRI and* *Nearmap*

Multispectral Data

Most satellite and aerial imagery is recorded with standard camera sensors that yield images in the spectrum visible to the human eye. These images are seminal for building any Site Twin for reasons that require little explanation. However, some satellite and aerial sources are equipped with sensors that record additional bands outside of the visible spectrum (i.e., Near Infrared, Shortwave Infrared, etc.). This multispectral data can be a very useful component for mining deeper information about the environment. One such uses is illusrated below, where color infrared allows us to distinguish vegetation from water, and observe relative vegetation health.

Left: USGS NAIP Natural Color imagery of New York City, New Jersey, and Hudson River. It is difficult to distinguish water from vegetation. Right: Color Infrared imagery where water appears Blue-Green while vegetation appears Red. Here, brighter Red implies healthier vegetation. Color Infrared is created by replacing Red band with Near Infrared band, Green with Red, and Blue with Green. Source: USGS NAIP

Time-Series Data

The above mentioned data does not need to be limited to a single point in time. If a Site Twin will be used to study changes over time, or needs to accurately reflect seasonal variations, temporally spaced data becomes necessary. An example of this can be the same location that is surveyed regularly over several years to detect changes, such as land coverage, climate change, erosion, etc.

*Satellite (Sentinel-2) imagery of the same Iowa location taken in October 2021 (left) and July 2022 (right).*

Digital Elevation and Surface Models

Periodic lidar mapping has been commonplace for many years. For example, the USGS has been repeatedly mapping the entirety of the United States and makes this data available for free. But this type of data can also be purchased from satellite and aerial mapping companies.

Lidar Point Clouds from these surveys are processed to generate two models:

Digital Elevation Model (DEM), which represents elevation changes to the bare earth and does not include features such as buildings or trees.
Digital Surface Model (DSM), which captures heights of many features like buildings, trees, and much more.

*Example of a Digital Elevation Model (DEM) of 260 sq km of Melbourne, Australia. Source: Geoscience Australia*

‍

*Left: Digital Elevation Model (DEM) of Melbourne, Australia. Right: Digital Surface Model (DSM) of the same area. Source:* *Nearmap*

Extracting Semantic Information from Satellite and Aerial Imagery

Techniques and methods for extracting or deriving semantic information from source geospatial data are numerous and diverse. Our systematic experimentation with a large variety of these approaches has incorporated extensive training of various machine learning models. Specific approach is determined by two factors: (1) Intended use of the Site Twin, and (2) the variety and quality of the available imagery and datasets.

Here, it is important to note that some of this semantic information can also be purchased from GIS data providers. At Duality, we chose to build our own pipeline for the following reasons:

First, it allows us the flexibility to use the highest resolution of data available in any given location and even fuse data from different sources into a unified semantic dataset.
Second, it gives us complete control over the semantic labeling of objects. The latter feature is especially vital for AI teams that require ability to quickly experiment with different semantic schemes. With our approach, the labeling can be customized and their works is supported by instant and perfect ground truth training data.

Deep Learning Analysis

We primarily carry out three types of deep learning analysis: Classification, Object Detection, and Semantic Segmentation.

Classification

In the first step, we apply machine learning (ML) models to break down RGB images into identifiable areas of their defining, continuous features. This step makes it possible to delineate areas like forests, lakes, rivers, fields, etc. Often they can be more precisely identified with specific soil types, waterbody classification, vegetation varieties, and so on.

Some of the Statistical Analysis and ML models we use include ISO Clustering, KNN, Max Likelihood, and SVM, among others. Choosing a specific one depends on the desired precision of the Site Twin and the imagery we have access to. Factors like resolution, amount of cloud cover in the image, shadows, and more determine which model will yield superior results.

Object Detection and Semantic Segmentation

When we have access to higher resolution imagery, we can use object detection and semantic segmentation to extract much more precise data about specific objects and features. This includes buildings, trees, infrastructure towers, vehicles, ships, birds, etc. This in turn enables us to run statistical analysis on the image to learn some useful characteristics of the terrain. A few examples:

Concentrations of various features (e.g., small bushes are more likely to be in one type of location, tall trees in another)
Distribution of any feature throughout the terrain (e.g., a birch tree tends to grow "x" feet away from another birch tree)
Variation in soil and vegetation as a function of their distance from water sources.

Object Detection can identify specific, distinct objects or entities, and features in the raw RGB image data. This includes building footprints, trees, roads, trails, etc. We leverage a variety of models, including SSD, YOLO3, and RetinaNet, among others. As before, intended use of the Site Twin, along with image source and quality will dictate which models will yield more precise results.

Results of Object Detection for vegetation of medium size from USGS NAIP 1 m resolution imagery. Goal behind this analysis was to infer the statistical and geographical distributions of small, medium, and large vegetation and their relationships.

Semantic Segmentation can identify all of the instances of any object in the image. Even more significantly, instead of bounding boxes, Semantic Segmentation yields precise shapes, dimensions, and coordinates of all the instances of various objects in the image. We primarily rely on MaskRCNN for Semantic Segmentation.

*Semantic Segmentation applied to identify all building footprints (left) and vegetation (right) in* *Nearmap* *images of Melbourne, Australia.*

‍

*Semantic Segmentation applied to identify street networks (left) and bodies of water (right) in* *Nearmap* *images of Melbourne, Australia.*

Band Indices

With access to multispectral data such as near-infrared or short-wave infrared, we can calculate various band indices to run analysis that yields finer details about the chosen location. For example, we can determine vegetation health, infer atmospheric and soil moisture levels, and even estimate soil iron content.

Let's take a look at some examples.

The first example shows the Normalized Difference Vegetation Index (NDVI): Analysis of the levels of chlorophyll content to infer vegetation health, which informs the vegetation modeling in the site twin. NDVI relies on red and near-infrared (NIR) bands.

NDVI analysis carried out on the satellite imagery using red and near-infrared (NIR) bands. Left: The source image (top) and NDVI result (bottom) from October 2021. Right: The source image (top) and NDVI result (bottom) from July 2022. Here July 2022 is showing a higher level of verdant vegetation.

In the second example, we look at the Normalized Difference Moisture Index (NDMI): Analysis of humidity and moisture in the area, which similarly informs moisture inclusion in the Site Twin. NDMI relies on near-infrared (NIR) and short-wave infrared (SWIR) bands.

NDMI analysis carried out on the satellite imagery using near-infrared (NIR) and short-wave infrared (SWIR) bands. Left column shows the source image and NDMI result from October 2021. Right shows the same for July 2022. July 2022 represents much higher moisture levels.

Lastly, we show the Agriculture Index: Analysis of presence of active agricultural activity informs the state of the agricultural lands to include in the Site Twin. Agriculture index relies on NIR, SWIR, and Blue bands.

Agriculture analysis carried out on the satellite imagery using NIR, SWIR and Blue bands. Left column shows the source image result from October 2021. Right shows the same for July 2022. July 2022 shows higher levels of agricultural vegetation.

These are just three examples, and the more multispectral data we have, the more information can be learned about the location via various band indices.

Change Detection

Change detection is employed with time-series data we discussed earlier, and is carried out by a deep learning model. We primarily use STANet on the imagery taken at different time periods, and take the delta of what has changed.

Assembling the Site Twin - Examples

The above steps all work towards a single goal: taking a large variety of (1) unstructured data from various sources, and (2) salient features derived via various mechanisms, and use them to produce a rich, informative, and unified dataset. In Part II of this blog series, we will show how procedural work flows are used to build out any Site Twin from this comprehensive dataset. We will cover how all the necessary assets are brought together in Falcon and Unreal Engine, and the data structure that makes the final result simulation-ready. In Part III, we will address how we implement dynamic features like AI pedestrian crowds, vehicular traffic and more, and show how the output of our GIS pipeline makes it easy to implement these dynamic systems in a seamless and modular way.

Below we wrap up Part I with a few examples of Site Twins built from different data sources and to different fidelity as dictated by each use case.

Example 1: Rural Environment in Iowa

*Sample image from the Sentinel-2 satellite data (left) used to create the Iowa farmland site twin (right), presented here from an overhead angle.*

Details of the Iowa site twin. Right image showcases realistic weather, complete with water-pooling in geographically correct locations. These kinds of details play a crucial role when it comes to site-specific testing and behavior data gathering.

Example 2: Wilderness Environment in Texas

Progression of a site twin based on a location near Kerrville, Texas. Satellite image (left) exhibits a 1m resolution. The classification step is shown in the middle image. Right image shows an overhead view of the completed site twin. Satellite imagery source: United States Geological Survey (USGS) National Agriculture Imagery Program (NAIP).

Site twins incorporate and align all of the source data about real locations into editable layers that can be virtually recombined and recomposed. Here we can see literal landscape layers being applied to build the site twin. Bare ground is show in the image on the left, while soil layers and vegetation have been applied in the image on the right.

On the ground we can see the rich details of the site twin. Every detail is derived from real GIS data. With higher resolution of the source data, higher-fidelity site twins are possible. E.g. - every virtual tree can be procedurally placed where a real tree stands in the real world.

Water bodies in the Texas environment. As with all site twins in Falcon, realistic weather and lighting conditions can be introduced and modified at runtime, a useful feature for collecting relevant sensors data.

Example 3: City site twin based on Melbourne, Australia

Unlike the first two examples, the Melbourne environment required high-fidelity and large scale for investigating scenarios with massive numbers of concurrently simulated agents. However, as it was not used to visual sensor data generation, photorealism was not required. Site twins are always built with consideration for detail needs and resource availability.

Examples of high-resolution aerial imagery (5cm) of Melbourne, Australia provided by Nearmap. The high resolution of the source data allowed for very high-fidelity, especially once paired with surface and elevation data, in modeling the streets, water bodies, and path-specific nuances of the city.

Nearmap aerial image (left) and the corresponding Digital Surface (center) and Elevation (right) Models of the same location.

Extracting detailed street network and building footprints information from Nearmap aerial imagery.

‍

For the final results of this site twin, please revisit the video in the introduction of this blog. And stay tuned for Part II of the GIS-pipeline series!

__________________________________________

To get hands-on experience with digital twin simulation in these kinds of site twin environments, make sure to sign up to test drive Falcon for free with our full-featured browser-based sandbox scenarios (coming soon!).

If you wish to learn more about leveraging Duality's GIS pipeline, or any other Falcon capabilities, for your projects, simply reach out to solutions@duality.ai .