AI Literacy Lesson 1 – When machines decipher worldview

Lesson 1. When Machines Decipher Worldview: Artificial Neural Networks (2012)

Dr. Son Pham

Before 2012, the history of information technology used to fall into an interesting paradox. Computers can perform billions of calculations per second, simulate the global climate and even pilot space probes, but struggle with a seemingly simple task: look at a photo and say what it is. A child can instantly recognize a cat, a car, or a loved one's face, while a computer is embarrassed by such obvious things.

The reason is not in computing power, but in the way humans teach machines to understand the world. For decades, computer systems have been built on a linear programming model: humans observe phenomena, write down rules, and then ask machines to apply those rules. This approach is very effective for obvious logical problems such as calculations or data management, but it proves weak in the face of the volatile world of images.

The turning point came when scientists began to shift from the mindset of "teaching the machine each rule" to "letting the machine learn from the data on its own". In 2012, the event at the ImageNet Large Scale Visual Recognition Challenge proved that deep neural networks can go far beyond any previous method. From that moment, artificial intelligence entered a new era.

The impasse of Rule based Programming thinking

In the early stages of artificial intelligence, software engineers tried to teach computers to recognize the world by listing characteristics. For example, if they want to identify a cat, they will observe and draw typical signs: triangular ears, beards, and long tails. These characteristics are then translated into logical conditions in the computer program. When the image satisfies those conditions, the system concludes that it is a cat.

At first glance, this approach seems reasonable because it resembles how humans describe things. However, problems arise when applied to the real world. A cat may lie curled up so that the tail does not appear in the frame, may turn its back so that the beard disappears, or be obscured by objects around it. Just a small detail changes, and the entire rule system can come to the wrong conclusion.

The researchers call this phenomenon the infinite variability of the data. In the real world, no two images are exactly alike. Lighting changes, viewing angles change, colors change, and every little change can break rigid rules. So despite the engineers' efforts to add thousands of rules, the system still can't achieve the flexibility that humans perceive.

AlexNet 2012: Breaking the Limits with Deep Neural Networks

The turning point came when Geoffrey Hinton's team introduced a system called AlexNet. Instead of relying on a list of rules written by humans, the system is built on an artificial neural network. The core idea is to create a structure of multiple layers of data processing, simulating how neurons in the brain link together.

In this model, the image data is included in the form of pixels. Neural networks don't try to understand the entire image right away. Instead, it analyzes layer-by-layer. The first layers recognize simple elements such as edges, lines, or angles. The next layers begin to piece those features together into more complex shapes such as curves or geometric structures.

As the data passes through more layers, the neural network gradually recognizes conceptual features such as eyes, ears, or fur texture. Finally, all the information is compiled to draw conclusions about the object in the photo. It is important that these characteristics are not assigned by the programmer but are discovered by the system itself during the learning process.

Reverse Propagation Mechanism: The Heart of Evolution

A neural network can be very complex, but without a learning mechanism it is just a static mathematical structure. The secret to helping neural networks improve over time lies in the backpropagation algorithm, also known as backpropagation. This algorithm allows the system to adjust connections inside the network based on the error between the prediction and the correct result.

The learning process begins when the neural network is fed a labeled dataset, such as millions of images of cats and dogs. The system makes a prediction for each image, then compares the prediction with the correct answer. If the result is wrong, a value called the loss function is calculated to measure the degree of deviation.

This error signal is then transmitted back through the layers of the network. In the process, the connection weights between the neurons will be adjusted little by little. The process is repeated millions of times, causing the neural network to gradually reduce errors and learn the general patterns of the data. This is how computers "train" like a learner through experience.

GPU and Big Data: The two "lungs" that provide power

Although the idea of neural networks has been around for decades, it wasn't until the early 2010s that it really took off. One of the important reasons is the emergence of GPUs, which are graphics processors that are designed to serve the gaming industry. GPUs are capable of performing thousands of calculations in parallel, which is very suitable for matrix operations in neural networks.

Thanks to GPUs, training Deep Learning models becomes feasible in terms of time. Problems that used to take weeks or months of computation on the CPU can be reduced to a few days. This improvement opens up the possibility of building deeper and more complex neural networks.

Parallel to computing power is the explosion of Internet data. Billions of images shared online have become a huge source of data for machine learning systems. When there is enough diverse data, neural networks can learn the true features of the world instead of just memorizing specific examples. Thanks to the combination of GPUs and Big Data, Deep Learning has stepped out of the lab and become the foundation of modern artificial intelligence.

When computers start "seeing" the world

After AlexNet's resounding success at the ImageNet Large Scale Visual Recognition Challenge, the scientific community quickly realized that they were at a major turning point. For the first time, a computer system can not only process data, but also learn to extract complex features from images. The error in image recognition has plummeted, far exceeding all previous computer vision methods.

This success created a domino effect among artificial intelligence researchers. Labs around the world are starting to apply deep neural networks to a variety of problems, from facial recognition to video analysis. Big tech companies are quick to invest in deep learning, because they realize that the ability to understand images and sounds will open up countless commercial applications.

In just a few years, technologies that once existed only in the lab began to appear in everyday life. Smartphones can recognize faces to unlock, apps can automatically classify photos, and surveillance systems can detect unusual behavior. It's all based on the same principle: neural networks learn to identify patterns from data.

From computer vision to understanding language and behavior

Once scientists demonstrate that deep neural networks can understand images, the next question quickly arises: whether this method can be applied to other forms of data. The answer is yes. The ideas that developed from Deep Learning quickly spread to the field of natural language processing, where computers learn to understand human text and speech.

In these systems, instead of pixels, the input data is magnetic strings. Neural networks analyze how words appear together in billions of sentences on the Internet to learn the structure of a language. Over time, the system begins to understand more abstract concepts such as context, meaning, and even the emotional nuances of the sentence.

Thanks to those advancements, many familiar technologies today become possible. Voice assistants can understand users' questions, machine translation systems can switch between multiple languages, and search engines can understand the intent behind the query. Importantly, all of these capabilities come from the same foundation: deep learning based on neural networks.

A philosophical shift in the way artificial intelligence is built

From the perspective of scientific philosophy, the rise of Deep Learning is not only a technical advancement but also a change in the way people approach intelligence. For decades earlier, scientists tried to describe intelligence using clear logical rules. They believe that if there are enough rules, computers can simulate human thinking.

But Deep Learning shows that wisdom can emerge in a different way. Instead of building knowledge with rigid rules, systems can learn directly from experience, in the same way that humans learn when observing the world. This makes artificial intelligence closer to biology, because the learning process of neural networks has many similarities to the way neural networks in the brain change as people learn.

This philosophical shift also explains why ideas that were once seen as overly simple in the past have become so powerful in the era of big data. When there is enough data and computing power, machine learning models can discover laws that are difficult for humans to describe in words.

Widespread impact in science and industry

After 2012, Deep Learning quickly became one of the fastest-growing fields of study in computer science. Universities open more training programs in artificial intelligence, while tech companies pour billions of dollars into research and applications. From medicine to transportation, from finance to education, nearly every industry is beginning to explore the possibilities of this technology.

In medicine, neural networks are used to analyze X-ray and MRI images to assist doctors in detecting diseases early. In traffic, computer vision systems help autonomous vehicles recognize pedestrians and signs. In the field of e-commerce, deep learning algorithms help analyze user behavior to recommend suitable products.

It's worth noting that many of these applications were previously considered too complex for computers. But as systems learn from millions or billions of examples, they begin to reach accuracy that equals or even surpasses humans in certain specific tasks.

Limits and new questions

While Deep Learning offers impressive achievements, it also raises many new questions. One of the biggest challenges is transparency. Deep neural networks often act as a "black box" where decisions are made based on millions of parameters that are difficult for humans to explain clearly.

In addition, machine learning systems rely heavily on data. If the training data is biased or lacks diversity, the model can learn incorrect conclusions. This is especially important in sensitive fields such as healthcare, law, or finance, where algorithm decisions can directly affect humans.

These challenges have led the scientific community to begin researching new directions such as artificial intelligence that can be explained, learning less data, or combining logical knowledge with deep learning. The goal is to build systems that are not only robust but also reliable.

The beginning of the modern artificial intelligence era

Looking back at history, 2012 can be seen as an important transition point in the development of artificial intelligence. Before that time, many people were still skeptical about whether computers could truly understand the world. After the turning point of Deep Learning, the question is no longer "is it possible", but "how far will it go".

From image recognition to language understanding, from data analysis to content creation, neural network-based systems are gradually becoming the foundation of many modern technologies. These advancements also create the foundation for a new generation of artificial intelligence capable of interacting with humans in a more natural way.

So the story of artificial neural networks is more than just a chapter in the history of computer science. It marks the moment when machines begin to move from a state that only executes commands to a state that can learn from the world. That's when the machine begins to build itself a "worldview" based on data and experience.

Nha Viet Institute

Head Office

Consulting Services

Newsletter