Digital Twins
Sim2Real for Embodied AI: Testing Simulation-Predicted Behavior of a Real-World LLM-Controlled Drone
May 16, 2024
· Written by
Francesco Leacche
Felipe Mejia
Apurva Shah
Mish Sukharev

A major focus of Duality’s research work is enabling embodied AI developers to test and tune their models in simulation which accurately and reliably predicts real world behaviors. In a recent blog we detailed the Task Coding framework: an effective way to make robotic systems less brittle, with more generalized domains of operations, while keeping their behavior predictable. This framework relies on the language reasoning capabilities of a Large Language Model (LLM), such as GTP or LlamaCode, to generate a sequence of pre-defined executable tasks based on natural language mission prompts.

But how does this framework measure up in practice? While we have shown successful examples in simulations, the tests of translating them to real world applications are critical. (Note: you can run the mentioned task coding simulation scenarios, Infrastructure Inspection and Visual Reasoning, from your browser by simply creating a free FalconCloud account!)

Today we’re sharing the results of early tests carried out with our partners at Ingeniarius, a cognitive field robotics company with a mission of designing disruptive mobile robotics solutions for challenging applications, such as agriculture, forestry and construction. Ingeniarius's focus on R&D&I services for their customers made them a natural partner for testing the Task Coding framework in the real world.

So how did it go? Let’s see in the video below:

This video showcases the outcomes of the same natural language prompt executed by a digital twin of a UAV/drone in simulation alongside its physical counterpart in the real world.

What Is Happening in This Test?

In this project, we prompt an LLM (in this case GPT-4, but any desired LLM can be integrated) with the following mission stated in natural language:

“Go to the tower, then move around it while descending by 2 meters after each loop. Land when a height of 3 meters from the bottom of the structure is reached”

The LLM generates Python code that can implement specific drone actions: TakeOff, LandIn, GoTo, and Explore. TakeOff and LandIn are simply used for their eponymous functions. GoTo requires a 3D point as a destination input, while Explore expects four 3D points that describe a rectangular area for a search grid pattern. Complex behaviors result from sequences of these actions as dictated by the code generated by the LLM.

We first executed the mission generated by the LLM in Falcon, with a digital twin of the drone operating in a 3D environment modeled by Ingeniarius after scanning their headquarters in Porto, Portugal, Once the prompt was refined to ensure correct execution in simulation, that code was handed over to the Ingeniarius team for execution with the physical drone.

As we can see from the side-by-side videos, the drone’s flight pattern and performance closely match what was observed in simulation.

How Did We Build It?

To adapt our task coding framework from the previous simulation demo to the real-world drone, we needed to:

  1. Create the digital twin of the drone.
  2. Create the digital twin of the environment.
  3. Adapt the Task Coding vocabulary to the drone’s ROS Action Services.

Creating the digital twin of the drone

Ingeniarius’s Scout v3 drone used in the real-world experiment (left) and the digital twin created by Duality (right)

The real drone: Designed by Ingeniarius, the Scout v3 drone is the third iteration specifically developed for navigation in complex environments. It integrates a ROS UAV System which controls everything from path planning and collision avoidance to motor controls. Scout v3 utilizes a single 3D LiDAR, maintaining a factor graph that combines LiDAR, IMU, and GNSS data for improved real-time odometry estimation and mapping.

The previous version of this drone, the Scout v2, utilized dual stereo cameras and was recently tested for deployment in forestry surveying (FEROX EU Project). This configuration yielded high quality data, but encountered odometry drifts due to the limitations of visual SLAM approaches in more homogeneous, forestry environments (as observed in the OPENSWARM EU Project), thus driving the employment of 3D LiDAR in the current Scout v3.

The digital twin: The model of the drone originated from CAD files for the physical drone. Using FalconEditor’s patented workflow, the CAD model was imported as a 3D mesh and its components bound to a quadcopter base system twin. System properties, such as the drone’s weight, center of gravity and rotor RPM, were set to roughly match their real world properties resulting in a tuned digital twin that specifically matched the drone used by Ingeniarius.

When it comes to the drone’s behaviors, since this is still an early test, we made some simplifications. As mentioned above, the real drone receives missions via the ROS Action protocol. To mirror the process in the digital twin, we re-implemented the ROS action client (used to send the missions) and server (used to receive the missions) in Falcon. The ROS Action Client is responsible for sending tasks and receiving feedback, while the ROS Action Server is responsible for receiving the missions, executing them, and sending feedback to the Client.

Task Coding and ROS Actions architecture

The result: The digital twin closely mirrors the real drone's ROS actions, while the dynamics are approximated and simplified for real-time simulation.

Creating the digital twin of the environment

Ingeniarius headquarters was used as the test site for the UAV/drone Sim2Real experiment. Real-world site in Alfena, Portugal (left) and the digital twin of the site used in Falcon (right).

The real location: The test site is the Ingeniarius headquarters in Alfena, located in the Porto district of Portugal. This historic location started as an 18th century farmhouse, and was transformed by Ingeniarius into a comprehensive environment for field robot experimentation. The roughly 2,000 square meter outdoor space provides vital real-world challenges and constraints such as terrain variation, hurdles and debris, GNSS-denied or impaired areas and more. The facility is also set up for optimal monitoring and seamless inter-agent communication in order to facilitate field tests. This environment includes the FORTIS tower used in our test (height 6m, width 2m, depth 3m) which was built (along with other developments) as part of the FORTIS EU Project and is strategically engineered to emulate the complexities of real-world construction challenges.

The digital twin of the location: The 3D environment was modeled from a point cloud captured by Ingeniarius SLAM 3D mapping framework — a LiDAR-inertial-GNSS factor graph approach. More specifically, the point cloud was captured using the Scout v3 drone, which runs an enhanced version of the LIO-SAM framework on-board. 3D transformations were applied to ensure that the coordinate systems align between the Falcon simulation and the real world, thus enabling consistent navigation. This served as the initial environment digital twin for the test, though further iterations are planned to enhance the fidelity of this site twin.

This video depicts the 3D SLAM framework running on the Scout v3 drone in order to generate the point cloud of the Ingeniarius headquarters used to model the digital twin of the environment.

Adapting the Task Coding vocabulary to the drone’s ROS action interface

With new context the LLM piloted drone can interpret commands like "takeoff", "land", "go to", and "explore". The LLM can then take natural language mission prompts and translate them into Python code corresponding to the ROS messages to be sent to the drone.

Let’s look at the prompt text again, and see the sequence of commands returned by the LLM:

What Are the Next Steps?

This project is very much a work in progress, and much has yet to be done to increase the fidelity of the simulation. For example, in this iteration object referencing in the scene involves injecting information about significant structures (e.g. the tower) into the LLM prompt and the LLM then uses this data (position, orientation, size) to generate the mission code. In a future iteration we plan to integrate a visual QA system like the one we previously showcased in our ViperGPT scenario.

As we iterate on the current results to improve the environment's fidelity and the accuracy of the missions generated by the LLM, here are a few steps we’re planning to explore next:

  • Enhancing the environment's fidelity with more detailed modeling.
  • Adapting more of the drone’s software stack.
  • Improving the system dynamics of the drone’s digital twin.
  • Increasing the range and flexibility of missions generated by the LLM.