A Brief History of AI from The Economist

Dartmouth Conference 1956 - Founding Fathers of AI

Dartmouth Conference 1956 – Founding Fathers of AI

In the summer of 1956, a small but fascinating group gathered at Dartmouth College in New Hampshire: Claude Shannon, the father of information theory, and Herbert Simon, the only person in history to win both the Nobel Prize in economics, awarded by the Royal Swedish Academy of Sciences, and the most prestigious computer award, the Turing Award, given by the Association for Computing Machinery. They had been called by a young researcher, John McCarthy, who wanted to discuss “how to get machines to use language, to form abstractions and concepts,” and “to solve problems that only humans can now.” It was the first meeting of scientists devoted to what McCarthy called “artificial intelligence.” And it set a pattern for the next 60-plus years, during which the field has made no progress that would match its stated ambitions.

The Economist's series on artificial intelligence also includes articles “AI companies will soon exhaust most of the data on the internet” And “The Race to Control the Global AI Chip Supply Chain Continues

The Dartmouth meeting did not begin the scientific quest to find algorithms and machines that could think like humans. Alan Turing, for whom the Turing Award is named, had thought about it; so had John von Neumann, who inspired McCarthy. By 1956, there were several approaches to the problem. Historians believe that one reason McCarthy coined the term “artificial intelligence,” later “AI,” for his project was that it was broad enough to encompass all the known approaches, while leaving open the question of which might be best. Some researchers favored systems that combined facts about the world with the axioms of geometry and symbolic logic to derive appropriate answers; others preferred to create systems in which the probability of one event depended on the constantly updated probabilities of many others.

The top left corner shows the Dynamics of global corporate investments in AI in billions of dollars, top center - Well-known machine learning models by sector (Business, Science, Other), bottom - Dynamics of attendance at conferences on artificial intelligence, thousands of people.

The top left corner shows the Dynamics of global corporate investments in AI in billions of dollars, top center – Well-known machine learning models by sector (Business, Science, Other), bottom – Dynamics of attendance at conferences on artificial intelligence, thousands of people.

There was much intellectual ferment and debate in the field in the decades that followed, but by the 1980s there was broad agreement on the way forward: “expert systems” that used symbolic logic to capture and apply the best human know-how. The Japanese government, in particular, supported the idea of ​​such systems and the hardware that might be needed to do so. But most such systems proved too inflexible to cope with the messiness of the real world. By the late 1980s, the term “AI” had lost its relevance and had become a byword for “overpromise and underdeliver.” Those researchers still working in the field began to avoid the term.

It was from one of these pockets of persistence that today’s boom was born. In the 1940s, when the first clues were being given to how brain cells – the different kinds of neurons – worked, computer scientists began to wonder whether machines might work the same way. In biological brains, connections exist between neurons that allow the activity of one neuron to trigger or inhibit the activity of another; the actions of one neuron depend on the actions of other neurons connected to it. The first attempt to model this in the lab (by Marvin Minsky, a Dartmouth undergraduate) used hardware to simulate networks of neurons. Since then, layers of interconnected neurons have been modeled in software.

Artificial neural networks are not programmed with explicit rules; instead, they “learn” by being given many examples. During the learning process, the strength of the connections between neurons (called “weights”) is repeatedly adjusted so that given inputs, the resulting output is appropriate. Minsky himself abandoned the idea, but others took it up. By the early 1990s, neural networks were being trained to do things like sort mail by recognizing handwritten digits. Researchers believed that adding more layers of neurons would yield more complex results. But the systems were much slower.

A new kind of computer hardware has made it possible to get around this problem. Its potential was demonstrated in 2009, when researchers at Stanford University increased the speed of a neural network by 70 times using a gaming PC in their dorm room. This was possible because in addition to the “central processing unit” (CPU) that all computers have, this one had a “graphics processing unit” (GPU) to generate the game worlds on the screen. And this processor was programmed to run the neural network code.

Combining this hardware acceleration with more efficient training algorithms allowed networks with millions of connections to be trained in a reasonable amount of time; neural networks could handle larger inputs and, crucially, have more internal layers. These “deeper” networks were much more capable.

The power of this new approach, called deep learning, became apparent in the 2012 ImageNet Challenge. Image recognition systems competing in the challenge were given a database of more than a million labeled image files. For each word, such as “dog” or “cat,” the database contained several hundred photographs. The image recognition systems were trained from these examples to “map” the input images to output one-word descriptions. The systems were then asked to create these descriptions when presented with previously unseen test images. In 2012, a team led by Geoff Hinton, then at the University of Toronto, used deep learning to achieve 85% accuracy. It was immediately hailed as a breakthrough.

By 2015, nearly all image recognition experts were using deep learning, and ImageNet Challenge winners had achieved 96% accuracy, better than the average human. Deep learning has also been applied to a variety of other “problems… meant for humans,” which boil down to mapping one type of thing to another: speech recognition (turning sound into text), face recognition (turning faces into names), and translation.

In all of these applications, the vast amounts of data available over the Internet were vital to success; moreover, the number of people using the Internet meant that large markets could be created. The larger (i.e., deeper) the networks became, and the more training data they were given, the better their performance. Soon, deep learning was being applied to all sorts of new products and services. Voice-controlled devices like Amazon’s Alexa emerged. Online transcription services were used. Web browsers offered automatic translations. Saying that these things were built with AI began to sound cool rather than embarrassing, although it is a bit redundant: almost every technology then and now called AI actually has deep learning under the hood.

Chatgpt and its competitors seem to be really “using language and forming abstractions.”

In 2017, the quantitative benefits of more computing power and data were supplemented by a qualitative change: a new way of organizing the connections between neurons called a “transformer.” Transformers allow neural networks to track patterns in their input even when the elements of the pattern are far apart, allowing them to “pay attention” to specific features of the data. Transformers give neural networks a better understanding of context, which allows them to be used in a technique called “self-supervised learning.” Essentially, some words are randomly skipped during training, and the model learns to pick the most likely candidate. Because the training data does not need to be labeled in advance, such models can be trained on billions of words of raw text taken from the Internet.

Think about your language model

Transformer-based large language models (LLMs) began to gain attention in 2019, when startup OpenAI released a model called gpt-2 (gpt stands for generative pre-trained transformer). These LLMs turned out to be capable of “emergent” behaviors for which they had not been explicitly trained. Absorbing vast amounts of language constructs made them not only surprisingly adept at linguistic tasks like summarization or translation, but also at things like simple arithmetic and writing programs that were implicit in the training data. Unfortunately, this also meant that they reproduced the biases in the data they were fed, meaning that many of the biases prevalent in human society also showed up in their output.

In November 2022, OpenAI’s larger model, GPT-3.5, was released to the public as a chatbot. Anyone with a web browser could type a query and get a response. No consumer product had sold out so quickly. Within weeks, ChatGPT was generating everything from college essays to computer code. AI had taken another big leap forward.

If the first cohort of AI products focused on recognition, the second focused on generation. Deep learning models like Stable Diffusion and DALL-E, which also debuted around this time, used a technique called diffusion to turn text cues into images. Other models can create surprisingly realistic video, speech, or music.

The leap isn’t just technological. Building things matters. ChatGPT and competitors like Google’s Gemini and Claude from Anthropic, founded by researchers formerly at OpenAI, perform computations like other deep-learning systems. But the fact that they respond to queries with different learned and learned patterns makes them very different from programs that recognize faces or do translations. They actually seem to “use language” and “form abstractions,” as McCarthy suggested.

In this series, we'll look at how these models work, how much they can be expanded, what new uses they will have, and what they won't or shouldn't be used for.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *