Perspective | Open Access
Michael Wooldridge, "What Is Missing from Contemporary AI? The World", Intelligent Computing, vol. 2022, Article ID 9847630, 3 pages, 2022. https://doi.org/10.34133/2022/9847630
What Is Missing from Contemporary AI? The World
In the past three years, we have witnessed the emergence of a new class of artificial intelligence systems–—so-called foundation models, which are characterised by very large machine learning models (with tens or hundreds of billions of parameters) trained using extremely large and broad data sets. Foundation models, it is argued, have competence in a broad range of tasks, which can be specialised for specific applications. Large language models, of which GPT-3 is perhaps the best known, are the most prominent example of current foundation models. While foundation models have demonstrated impressive capabilities in certain tasks—natural language generation being the most obvious example—I argue that because they are inherently disembodied, and they are limited with respect to what they have learned and what they can do. Foundation models are likely to be very useful in many applications: but they are not the end of the road in artificial intelligence.
1. Data, Compute Power, and Competence in AI
Over the past 15 years, the speed of progress in artificial intelligence (AI) in general, and machine learning (ML) in particular, has repeatedly taken seasoned AI commentators like myself by surprise: we have had to continually recalibrate our expectations as to what is going to be possible and when. For example, at the start of the new millennium, it felt like practical automated translation tools were a still a remote prospect, and that much research was going to be required before this problem was really solved. But at some point over the past 15 years, this technology became routine—this miracle of the modern age is now taken for granted. Time and time again this century, we have seen AI leap past benchmark problems: in 2011, IBM’s Watson program beat expert human players in the notoriously difficult Jeopardy game show ; in 2013, DeepMind unveiled Atari-playing programs that learned to play a range of 8-bit video games just by repeatedly playing them, using no more than access to the game’s video feed and current score ; in 2016, DeepMind effectively solved Go, in a series of matches against a human grand master which made world headlines ; they followed this up with progressively more startling achievements in computer game playing, with world-beating chess programs training themselves to play in a matter of hours ; and finally, in 2021, the AlphaFold system demonstrated an unprecedented level of performance in the fundamental protein folding problem . It is generally accepted that these breakthroughs—and the many others that I have not mentioned here—are in part a consequence of scientific developments in machine learning (new machine learning architectures and refined training algorithms), but just as importantly, the use of large training data sets is coupled with extensive compute resources. While the fundamental machine learning algorithms underpinning the current resurgence of AI have been around for decades (gradient descent in particular), the latter has been only available in practice this century.
It has become clear throughout the current wave of AI innovation that data and compute power really are fundamental to the success of AI techniques: the competence of AI models scales directly with their size, the resources used to train them, and the scale of training data. Richard S. Sutton, one of the world’s leading reinforcement learning researchers, was so struck by this that he coined a term to describe it: the “bitter lesson” of AI is that advances in AI are dominated by the use of increasingly large data sets and increasingly greater compute resources . When it comes to building successful machine learning models, it seems, might really is right.
Given this lesson, it is then perhaps no surprise that we have seen a rush for scale in machine learning. AI researchers, seeking to steal an advantage on their competitors, have built ever-larger ML models, using ever more compute resources for training. And it seems to be working.
2. Foundation Models
Over the past three years, the rush to scale has led to the emergence of a new class of AI system: foundation models . Foundation models are very large machine learning models, trained on extremely large and very broad data sets, using substantial compute resources. They mark a shift in emphasis away from AI systems that have very narrowly focused expertise, suitable only for one tiny problem or class of problems. The bet with foundation models is that their extensive and broad training leads to them learning useful competences across a range of areas, which can then be specialised for specific applications. While symbolic AI was predicated on the assumption that intelligence is primarily a problem of knowledge, foundation models are predicated on the assumption that intelligence is primarily a problem of data. Throw enough training data at big models, and hopefully, competence will arise. I am simplifying, of course—but not much.
The most prominent foundation model developed to date is GPT-3 . Released in 2021, GPT-3 is the canonical example of a large language model (LLM). Its operation is best described by analogy. When you use your smartphone to write a text message or email, it has a feature that suggests the completions of your sentences—thus, when you type “I will be…” it uses your history of previous messages to predict that the next word will be “late” and suggests this to you as a possible completion of your message. Every time you type a message on your phone, you are training your phone so that it can make better suggestions. Large language models do something similar but on a vastly larger scale. GPT-3, for example, was trained on essentially the entirety of the English text available on the World Wide Web.
GPT-3 has been shown to have unprecedented capabilities in natural language generation, being capable of generating extended pieces of very natural-sounding text. Of perhaps even more interest is that it also seems to have acquired some competence in common-sense reasoning—one of the holy grails of AI research over the past 60 years. Many other large language models have appeared since GPT-3: one system attracting considerable attention at present is Google’s LAMDA system , the state-of-the-art in serious chatbot technology. In June 2022, LAMDA attracted a lot of (unwelcome) publicity when a Google engineer by the name of Blake Lemoine was suspended from the company after he claimed the system was sentient. Whatever the validity of Blake’s conclusions (I personally do not think this is a reasonable conclusion, and I think we still are a long way from sentient AI), it is clear that he was deeply impressed by LAMDA’s ability to converse—and with good reason.
Foundation models are impressive and an important current development in the AI landscape. We will see many creative uses of them in the years ahead—and no doubt there are fortunes to be made (and possibly business empires) on the back of them. But what do they really mean for progress in artificial intelligence? Is this it, and all that remains for AI now is scale, as one DeepMind researcher put it?
3. What Is Missing? The World
For all that their achievements are to be lauded, I think there is one crucial respect in which most large ML models are greatly restricted—and which limits their capabilities with respect to AI in general. And that is the world—and in particular, the fact that foundation models simply have no experience of it. They are, ultimately, disembodied AI systems. This has implications both for their status as AI systems, but also, and more pragmatically, for their uses in the world.
To better understand the first point, let us pause for a moment to think about what a large language model has learned.
A large language model is trained by presenting to it a very large corpus of existing text written in some language—say English. This corpus embodies knowledge about the words that we use to describe things in the world. Thus, for example, a large language model may well learn that “rain is wet,” and if asked whether rain is wet or dry, will likely respond that rain is wet. But, unlike us, the model has never experienced wetness. The word “wet” to the language model is nothing more than a symbol, which is often used in connection with words like “rain” and so on. Given this, does the model in any sense understand the concept of “wetness”? Well, in a linguistic sense, yes. GPT-3 can likely write you a plausible essay about wetness—the pleasure of diving into the Spanish Mediterranean sea, the misery of drizzle on a dank English January afternoon, and so on. But that is a very impoverished conception of understanding. I do not think you can understand a concept like wet unless you have actually experienced it. The concept of “wet,” for me, means all the experiences I have ever had of being wet. That is how I understand the concept. But large language models like GPT-3 have not experienced anything in the real world. All they have experienced is a very large collection of symbols (their training data), which stand in certain relations to one another (the word “wet” used in connection with “rain”, for example). At no point is there any grounding for these symbols—no sense in which they are given meaning with respect to concepts that have actually been experienced in the world.
There is a counter argument to this position, to the effect that at some point, if there is enough data—enough descriptions of experiences, even without the actual experiences themselves—then the lack of grounding ceases to matter. In this case, so the argument goes, a system might be able to reliably convince us that it really does understand wetness—at which point is there any point in arguing about it anymore? From a purely practical perspective, I would tend to accept that point. After all, the fact that a system has not had any actual experiences in the world does not stop it from being useful; nor does it in fact stop it from being an expert (in a restricted sense) on the subject of those experiences. But from an AI perspective, it does rather throw in doubt the possibility that such a system has the same status as us with respect to issues such as understanding. To put this into conventional AI terms, I am sceptical that AI systems generated with this methodology can exhibit strong AI—although I emphasise again, this does not mean they cannot be useful in their own right.
A less philosophical concern about this approach to AI is that models with no experience of the world actually have AI capabilities that are greatly restricted. Many of the most important—and challenging—problems in AI are problems in the physical world. Many tasks in the physical world that we regard as not requiring intelligence in any meaningful sense (riding a bicycle, catching a ball, and cooking a meal) are highly nontrivial for robotics AI. Robots that have the full range of physical capabilities that a human has are a long, long way of—arguably even more remote than AI systems that have the full range of intellectual capabilities.
It is disappointing, I think, that so many of the AI systems about which we have become so excited in the past decade are not embodied in the world in any meaningful way. Of course, it is not hard to see why this is the case. The real world is harder—much harder—than simulated/virtual worlds like computer games. The problem is, as ML researcher Michael Littman put it, the real world just does not come in tidy data structures. And we cannot let a robot learn how to cook a meal by letting it experiment in our kitchen, as we can with computer games. Similarly for driverless cars: letting them loose on the roads to learn for themselves is a nonstarter. So, for all these reasons and more, researchers choose to build their models either in virtual worlds (such as computer games), or without any pretence of a world (large language models). And in this way, we are getting excited about a generation of AI systems that simply have no ability to operate in the single most important environment of all: our world.
To be fair, there are some signs that this is changing. In May 2022, DeepMind announced the Gato system, a general purpose foundation model whose training data included large language corpus data like GPT-3, but which was also trained on robotic data—it was capable of operating in an (admittedly very simple) physical environment . Gato is an impressive achievement, and it is wonderful to see the first baby steps taken into the physical world by foundation models. But they are just baby steps: the challenges to overcome in making AI work in our world are at least as large—and probably larger—than those faced by making AI work in simulated environments. To paraphrase Winston Churchill, we are not looking at the end of the road in AI, but we may have reached the end of the beginning of the road.
Conflicts of Interest
The author declares that there is no conflicts of interest regarding the publication of this article.
Copyright © 2022 Michael Wooldridge. Exclusive Licensee Zhejiang Lab, China. Distributed under a Creative Commons Attribution License (CC BY 4.0).