AI Literacy Lesson 3 – When artificial intelligence enters the physical world

Lesson 3. When Artificial Intelligence Enters the Physical World: Embodied AI

Dr. Son Pham

If the 2012 turning point helped the computer see the world through images, and 2017 helped it understand human language, the next phase of artificial intelligence is even more ambitious: taking that intelligence out of the computer screen and putting it in a body that works in the physical world. This is the field known as Embodied AI, or Embodied AI.

In previous AI systems, the majority of computer activity took place in the digital space. A language model can write text, a computer vision system can recognize images, but it all takes place in a data environment. As AI enters the physical world, things become much more complicated. At this time, artificial intelligence needs to understand not only data, but also gravity, movement, time, and actual risks.

This transformation is taking place in many fields such as self-driving cars, industrial robots, and humanoid robots. These systems must not only observe and analyze, but also make decisions and act. A mistake in the text can be corrected, but a mistake in the physical world can have serious consequences. Therefore, Embodied AI is considered the most difficult chapter in the journey of artificial intelligence development.

Difference Between Digital Space and Real World

In the digital space, AI systems operate in a relatively secure environment. If a language model mispredicts a word or generates a sentence that doesn't make sense, the user simply deletes it and rewrites it. Mistakes have almost no physical consequences. This allows researchers to test many new ideas without worrying too much about risk.

However, when artificial intelligence enters the real world, every decision can lead to direct consequences. Imagine a self-driving car driving on the highway. If the system mistakenly recognizes a sign or brakes at the wrong time, it can be dangerous for many people. Unlike in a software environment, the physical world does not have an "undo" button.

In addition, AI self-determination also has to process data in real time and face environmental uncertainty. The weather can change, people can act unexpectedly, and rare situations can always happen. This makes building an AI system that works safely in the real world one of the biggest technical challenges of our time.

The philosophy of "Pure Vision" in autonomous vehicles

In the field of autonomous vehicles, many technology companies choose to use sophisticated sensor systems to help vehicles understand their surroundings. These systems often incorporate various technologies such as cameras, radar, and laser sensors. Among them, LiDAR is considered one of the most important tools because it can create very accurate three-dimensional spatial maps.

However, the Tesla company chose a different direction. Under Elon Musk's leadership, Tesla pursues a philosophy of "pure vision", that is, building autonomous vehicle systems that rely mainly on cameras and neural networks. Their argument is quite simple: the world's transportation system is designed for humans, and humans drive mostly based on their eyes. So, if it is possible to build a neural network strong enough to understand images like a human, the car can also drive safely using only sight.

This philosophy led to an approach called end-to-end deep learning. Instead of dividing the system into too many manual processing steps, the image data from the cameras is fed directly into the neural network. Here, the system learns to understand the scenery, predict movements, and make driving decisions. This is a bold approach because it places much of the cognitive responsibility on the learning capabilities of the neural network.

How an autonomous car "thinks"

To understand how autonomous AI works in autonomous vehicles, we can break down the process into three main steps: awareness, planning, and action. Each step is supported by neural networks and complex machine learning algorithms.

The first step is perception. The cameras around the car constantly send images to the car's central computer. The system uses computer vision models to analyze each frame and identify important elements such as road surfaces, markings, other vehicles, pedestrians, and signs. An important technique in this step is semantic segmentation, which helps the AI distinguish different types of objects in a scene.

After understanding the surroundings, the system moves on to the planning step. Here, the AI calculates various motion scenarios in a very short period of time. For example, if the car in front of you slows down or a pedestrian is about to cross the street, the system will quickly evaluate options such as slowing down, changing lanes, or keeping the same direction. Finally, the action step sends control commands down to the vehicle's mechanical systems such as the steering wheel, brakes, and throttle to execute that decision.

Training with real data and simulations

One of the biggest challenges of self-determination AI is handling rare situations, often referred to as edge cases. These situations don't happen often, but they're important for safety. For example, a car might encounter a person wearing a strange costume, a bicycle going in the opposite direction, or a construction site that suddenly appears on the road.

To solve this problem, Tesla takes advantage of the huge fleet of cars operating worldwide. Every time a driver has to interfere with the self-driving system, the data from that situation is sent back to the data center. These examples become "real-world lessons" that help the model learn how to handle similar situations in the future.

Besides real data, companies also use simulated environments to train AI. In these virtual worlds, the system can experience millions of different scenarios without risking humans. Generative AI models are also used to create complex situations such as heavy rain, heavy snow, or chaotic traffic. As a result, neural networks can "test drive" billions of miles in a simulated environment before being applied to real life.

From self-driving cars to humanoid robots

Once AI systems have learned how to navigate safely in complex environments, the next step is to extend that ability to other physical tasks. This led to the development of robots capable of interacting directly with the world around them. A notable example is the Tesla Optimus humanoid robot.

Unlike self-driving cars that only need to control the wheels and steering wheel, humanoid robots have to perform much more sophisticated operations. It needs to learn to grasp objects, balance when moving, and interact with the environment just like humans. This requires a combination of computer vision, motion control, and machine learning.

An important technique in this process is Reinforcement Learning. In this learning method, the robot tests a variety of actions and receives a reward for doing it correctly. Through millions of tests, the system gradually learned how to adjust grip force, hand angle, and body movement to complete the task. For example, the robot can learn that holding an egg requires very light force, while holding a wrench requires more force.

The future of artificial intelligence

Embodied AI is considered the next step in the development journey of artificial intelligence. If language models help AI understand human knowledge, then embodied systems will allow that intelligence to interact directly with the physical world. This could open up many applications in manufacturing, healthcare, logistics, and even in family life.

However, the road ahead is still challenging. AI systems must achieve a very high level of safety before they can operate extensively in an environment with humans. In addition, combining perception, planning, and action in a unified system remains a complex problem in computer science and robotics.

However, the development trend is showing a clear direction. Artificial intelligence is gradually shifting from algorithms that work in computers to entities that can observe, think, and act in the real world. When that happens, AI will no longer be just lines of code running on servers, but become a tangible part of the environment around us.

Nha Viet Institute

Head Office

Consulting Services

Newsletter