+1 617-517-6363
200 Melrose St, Melrose, MA
info@nhaviet.org

Are you a force for educational change?  Join us 

AI Literacy Lesson 2 – The era of contextual insights
Home » About us  »  AI Literacy Lesson 2 – The era of contextual insights

Lesson 2. The Era of Contextual Insights: Content Generation AI (2017)

Dr. Son Pham

 

If 2012 marked the moment when computers began to "see" the world through neural networks that recognize images, 2017 marked another important step forward: computers began to understand and create human language. This is the basis for the birth of systems that can write text, answer questions, summarize documents, or even compose poetry and write programming code.

Prior to this time, computers were able to process text at a basic level. Automated translation systems and chatbots have been around for years, but their quality often disappoints users. Computers can translate each word, but they don't understand the context of the whole sentence. Therefore, the translations are often mechanical, sometimes corny to the point of laughter. Many people have experienced strange translations and realized that computers only seem to understand small parts of the text.

The turning point occurred in 2017 when researchers at Google published a historic work called Attention Is All You Need. This paper introduces an entirely new architecture called Transformer, ushering in the era of modern language models and laying the foundations for today's artificial intelligence systems.

Weakness of Short-Term Memory: RNNs and LSTMs

Prior to the advent of Transformers, the majority of language processing systems were based on the Recurrent Neural Network architecture. The idea of RNN is quite intuitive: read text in word-by-word sequence, much like how a human reads a sentence from left to right. When reading a new word, the model updates its memory state and then moves on to the next word. Thanks to this mechanism, the AI can remember part of the information from previously read words.

This method works quite well for short sentences. However, as the sentence becomes longer, the system begins to have trouble retaining important information. During training, the learning signal gradually weakens as it passes through multiple computational steps, a phenomenon known as vanishing gradient. This makes it difficult for the model to learn the connections between words that are far apart in the sentence.

To overcome this problem, the researchers developed variants such as LSTM or GRU to help the model remember longer. But no matter how improved, these systems still have to process documents in a step-by-step sequence. So, when reading to the end of a very long sentence, the model may have forgotten the subject at the beginning of the sentence. As a result, AI-generated translations or paragraphs are often incoherent, like a person trying to remember a long story but only retaining a few fragments.

The Rise of the Transformer: The Miracle Named "Attention"

The Transformer architecture has completely changed the way computers process language. Instead of reading each word in sequence, the Transformer allows the model to see the entire sentence at once. This helps the system take advantage of the power of parallel processing on modern processors and speed up the learning process many times over.

The heart of the Transformer is the self-attention mechanism, also known as the self-attention mechanism. When humans read a sentence, we naturally focus on the words that are important to understand the meaning of the sentence. For example, in the sentence "The cat is on the red chair", the words "cat" and "chair" carry more information than conjunctions. The self-attention mechanism allows the model to calculate the degree of association between every pair of words in a sentence, thereby determining which words need more attention.

Thanks to this mechanism, AI can understand the complex relationships between words even though they are located very far apart. For example, in the sentence "My grandfather held the wooden stick because it was very heavy", the word "it" can be ambiguous. But when the whole sentence was analyzed, the system realized that the word "heavy" was closely related to "stick" and not "grandfather". This ability to understand the overall context is what makes Transformer such a major breakthrough in language processing.

From prediction to content creation

Transformers' ability to understand context quickly became the foundation for a new generation of models known as large language models. These systems are trained on huge volumes of text from the Internet, books, newspapers, and scientific literature. Through the learning process, they develop the ability to predict the next word in a sentence with great accuracy.

In essence, models like GPT act as extremely sophisticated probability prediction machines. When a user writes a sentence like "Learning is...", the model analyzes millions of similar examples learned before to calculate which word is most likely to appear next. After selecting the next word, the process repeats continuously, gradually forming a complete paragraph.

Interestingly, as the models become large enough and trained on rich enough data, they begin to exhibit new capabilities that researchers didn't anticipate. This phenomenon is called emergence. As a result, AI systems can not only complete simple tasks, but can also write essays, compose poems, explain scientific concepts, or write programming code.

Applications and impacts: changing the way we work

The development of large language models has led to the creation of artificial intelligence systems capable of interacting with humans using natural language. One of the prominent examples is ChatGPT, developed by OpenAI. These systems can answer questions, write text, summarize documents, and assist with many other intellectual tasks.

In the field of programming, AI can analyze the structure of the source code to find logical errors or suggest improvements. Thanks to the ability to identify connections between lines of code, the system can help programmers write software faster and more efficiently. This makes AI a powerful enabler in the software industry.

In education and research, AI can summarize hundreds of pages of documents into easy-to-understand explanations. However, these systems also have an important weakness: hallucination. Since AI works on probability, it sometimes generates answers that sound very logical but inaccurate. Therefore, although AI is becoming more and more powerful, humans still need to play a role in verifying and directing information.

When the computer starts to understand the language

If the turning point of 2012 helped computers see the world through images, 2017 marked the moment when computers began to understand and create human language. These two advances combined formed the foundation for the modern wave of artificial intelligence we are witnessing today.

Modern AI systems not only analyze data, but can also communicate, write content, and assist humans in a wide range of knowledge areas. From scientific research to education, from business to art, artificial intelligence is gradually becoming a new creative tool.

And it all started with a seemingly simple idea in a 2017 study: for a computer to understand language, sometimes all it needs is to learn to pay attention in the right place.