In 1972, the American philosopher Hubert Dreyfus published a book called What Computers Can’t Do. It was a spicy critique of the concept of artificial intelligence at that time. In retrospect, Dreyfus was mostly right. The AI community back then, however, was not particularly welcoming – although supposedly one person was still willing to have lunch with him.
Why should we care about what happened back then?
We’ve heard a lot about Generative AI recently, many claim it will even take our jobs. I believe the late Dreyfus would have a lot to say, at least on who’s likely to be the last one to switch off the lights. Even if you don’t believe in the jobs apocalypse, understanding what AI can’t do helps us figure out how to best use it – or sometimes dismiss it altogether.
In 1972, the world of AI was about what we call symbolic AI. Computers were a relatively new invention, and the idea of processing symbols as opposed to numbers was all the rage. To academics specialized in symbolic thought, the idea of human intelligence ultimately as a symbol-processing capability was very natural. With a computer starting to take steps into doing something similar, the idea of getting it to human levels sounded relatively easy – just setting the right logic in place and scaling it, right?
It turned out not to be the case. Mainly because what seemed natural from the outside turned out to be a false hope. Specifically:
Functional humans in their everyday lives, work, or even in science mostly don’t think symbolically. Symbols are more of a way of communication and reason, and reasoning plays a relatively small part of cognition.
Knowledge in general, and particularly everyday knowledge is very hard to represent in a formalized way. Everyone can separate a mug, a glass, and a cup from one other, but try defining them! An explicit computer program that recognizes such items would need the definitions, represented as code.
Also, it turns out that even seemingly innocent problems, supposedly isolated in a narrow domain of knowledge, tend to leak into a wider context in unpredictable ways. As a result, the context then required is potentially very wide. For example, imagine you are a self-driving car and want to anticipate the behavior of pedestrians – maybe by slowing down when appropriate, taking into account traffic problems around your velocity vector, and observing pedestrians and their positions. It gets more complex when you realize it would be helpful to also recognize older people moving relatively slowly, kids playing a ball dangerously close by, the local drunk, or even the clothing and behavior of certain groups of people, let alone different kinds of balls. That’s a lot of information for one simple car to handle.
Back to the future
It’s now 2024, and two things that were unimaginable 50 years ago happened. And then the third.
First, the internet was born. For better or worse, most cultural content nowadays is in principle reachable globally in milliseconds. Second, we were able to build computation at scale. Combining these two elements in a somewhat astronomical form (1010 words of text, a neural network with ≈1010 connections, more than 1025 floating point operations), we get GPT-4. (1)
If we look at these developments from the perspective of the 1970’s, modern AI models arise almost out of thin air. Nothing of their content has been explicitly programmed in, and if you look at what actually is programmed, it’s a prompt. We have system prompts, telling the model its identity in plain English, “you are ChatGPT… trained by OpenAI”. From the perspective of the 70s’, that’s pure sci-fi.
The best models are still limited in so many ways, but are performing close-to-human levels in tests. For example, GPT-4 mostly passes Finnish matriculation examinations, grasping math questions from photos from paper tests.
With the birth of these new systems, the question of what computers can’t do has risen again – perhaps with a more serious and emotional sentiment this time. It’s fascinating to go back and revisit the assumptions of the symbolic AI era along its criticism, as a way of shedding light on latest developments.
First, are the new systems about symbol processing?
Definitely not. We had “symbolic” computers for decades, but have now managed to create ones with their inner workings devoid of symbols. As a side effect, we don’t know how they work either. The direction is more like a brain than a usual computer. In fact, it seems like we’ve gone even too far into non-symbolism. Perhaps giving these models a bit of internal symbol-like states would help with reasoning and self-reflection.
Second, formalizing practical knowledge is mostly really hard. What happened there?
In short, we gave up. At least mostly – and turned to training the models directly from a blank slate. (Not dissimilarly to what human babies do, except that they are not nearly as blank at the start.) To avoid human work, the models have been trained to predict content in a fairly self-referential way (hence the term generative AI: they are trained to predict, and in practice generate, text, pictures, and so forth). But to do that well, the models have to understand the undertones: styles, hunches, intentions, relations, consequences, and human concepts in general. So understanding arises from pure prediction.
Third, what about vast amounts of everyday knowledge that are needed in most cognitive tasks?
Thanks to what is available on the internet, the models have a pretty extensive view into the human world, and within their internals, that information is flexibly available. I’m able to address an LLM simply by writing “30yo is gen X Y Z millennial what?” with typos, then refer temporally to its reply just by saying “before that?”, and it will understand what I mean. Or, I can ask “sun 18 feb + 180 days roughly?”, and in contrast to traditional computers, it is again able to interpret my intended meaning and apply what it knows. In fact, in the latter case the model invokes a Python interpreter and writes some code to get an answer, likely because the system prompt tells it not to trust its arithmetic skills — for good reason.
Dreyfus’ limits
GPT-4 does not drive a car, however. Its input modality is text and images, and it lacks skills to use any actuators beyond “virtual typing”. With these limitations in its input and output goes a lot of everyday knowledge of the physical world too. This shows that the capability to perform well across every aspect within our society and physical environment is just not there. It’s unclear how far the current predictive approach gets us, but a research direction will definitely have to be directed towards obtaining even more everyday knowledge. One way to do this is to increase the palette of input modalities to sounds, videos, and to robotic and virtual acts.
Even in this scenario, we’ll still have blind spots. For many kinds of work, the proper context is a web of meanings and relations in the everyday human world or a subsection of it – like a scientific community or daycare facility. That is not written anywhere and may be very hard to learn, as one would only get a good grasp of it by actually behaving like a human-like creature in that particular environment. (This is a rehearsal of points two and three of the Dreyfus critique again.)
Such potentially well-hidden knowledge may be a bottleneck for machines also in seemingly more innocuous settings. Abstractions suitable for a large code base looks like an engineering problem. But to a part, they are formalizations of needs and constraints of the organization. Often the code itself would be the first actionable formalization. How could a computer be the best candidate for this job, unless it is (almost) physically embodied in the organization?
A tool, competitor, or collaborator?
Dreyfus largely hit the nail on the head. Yet, the future has unfolded in unique, and as always, unpredictable ways – allowing computers to make unforeseen cognitive leaps. Now it seems that humans and computers are indeed gradually morphing into a symbiotic couple, with their division of labor perpetually up for negotiation.
Fine-tuned with hindsight, are we able to generate any practical tokens of wisdom? This is already a well-discussed topic, so just a couple of thoughts:
Concretely: when using new models it’s important to remember: they can’t read your mind. They are lacking the context you have, and you may not realize just by how much. Sometimes passing enough context to the models is just too hard, and in that case it’s best to skip their use.
How about your job? From a purely cognitive point of view, there is still lots of implicit, non-verbalized, or locally verbalized context AIs can’t catch, not even the multimodal models. It is hard to even imagine how an AI could get that knowledge, except by living a relatively human life at the office.
If you are an AI pessimist, the latter is affirmative. If you are an AI optimist, good for you. You can enjoy the new tools and gradually tune your skills to a direction where you collaborate instead of compete with AIs.
------------------
(1) Apologies for the simplification. I want to emphasize the astonishing emergence of capabilities from large data and minimal architectural specifications, while pinpointing to GPT-4 just as an example. Things such as curation of data and transformer architectures have certainly played a major role in recent developments.