Noam Chomsky on the future of deep learning

For the past few weeks, I have been in email correspondence with my favorite anarcho-syndicalist, Noam Chomsky. I first reached out to him to ask if his recent developments in ANNs (artificial neural networks) have forced him to rethink his famous linguistic theory of universal grammar. Our conversation touched upon the possible limitations of deep learning, how well ANN actually models a biological brain, and also talked about philosophical topics. I will not quote Professor Chomsky directly here, since our discussion was informal, but I will try to summarize the key findings.

And, by the way, yesterday, December 7, Noam Chomsky turned 92 years old!

kf “Captain Fantastic”

A little about Noam Chomsky

Noam Chomsky is primarily a professor of linguistics (many call him “the father of modern linguistics”), but he is probably better known outside of academia as an activist, philosopher and historian. Chomsky is the author of over 100 books, recognized as the world’s leading public intellectual in a 2005 poll by magazines Foreign Policy and Prospect…

I admire Chomsky’s work, especially his criticism of American imperialism, neoliberalism and the media. Where our views diverged somewhat was with regard to his rejection of the continental philosophers (especially the French post-structuralists). I may have been tainted by borrowing too much from the sources of Foucault, Lacan, and Derrida in my early adulthood, but I have always found Chomsky’s analytical approach to philosophy morally appealing but a little “refined” for a satisfactory explanation of our world. While his disdain for these post-structural luminaries is striking, Chomsky’s philosophical views are subtler than his detractors believe.

Universal grammar

I must say right away that I am not a linguist, but in this part of the article I will try to give an overview of the theory of universal grammar. Before Chomsky, the predominant hypothesis in linguistics was that humans are born with a tabula rasa mind and learn language through reinforcement. That is, children hear what their parents say, imitate what they have heard, and when they use a word correctly or build a sentence, they are praised. Chomsky showed that reinforcement is only part of the process and that the human brain must have innate universal structures to facilitate language acquisition. His main arguments were:

Children learn language too quickly and there is too little data to explain this by reinforcement learning (an argument known as “incentive poverty”).
Animals do not acquire language even when presented with the same data as humans. In the 1960s, a famous experiment was conducted in which linguists tried to teach sign language to a chimpanzee named Nim Chimpski, but 10 years later, the monkey was still unable to communicate, performing only a few basic tasks.
There are common features in all human languages. This fact shows that even with the independent development of language, there are universal traits that are manifested due to common structures in the human brain in general.
Children are not programmed to learn a particular language. If you take a child born in Kenya and raise him or her in Germany, he will learn German as easily as a German child.

This theory of genetically hard coded language ability gained wide acceptance in the scientific community, but the obvious question was: “What does this universal grammar look like?” Soon, courageous researchers began to discover the common properties of all human languages, but there is still no consensus about what form our innate language abilities have. It is safe to assume that a universal grammar does not consist of specific syntactic rules, but is most likely a fundamental cognitive function.

Chomsky postulated that at some point in history, humans developed the ability to perform a simple recursive process called “fusion”, a process responsible for the properties and limitations of syntactic structures in human languages. It’s a bit abstract (and too complicated to get it right), but essentially “merging” is the process of taking two objects and combining them to form a new object. Despite the seeming prosaicity, the ability to mentally combine concepts and do it recursively is deceiving and allows us to create “an infinite variety of hierarchically structured expressions.” This small but decisive genetic leap may not only explain our ability to communicate verbally, but also lead to the fact that it may be responsible (at least in part) for our mathematical talents and human creativity more broadly. This “fusion” mutation, which occurred in one of our ancestors about 100,000 years ago, may be one of the key things separating humans from other animals.

Artificial neural network

The main reason why I contacted Professor Chomsky was this: I wanted to hear his views on artificial neural networks (I know much more about them than about linguistics). ANN is a subset of machine learning models that are modeled after the human brain and learn in a similar way: by looking at many examples. Such models require very little code and can perform a fairly wide range of complex tasks (e.g. image tagging, voice recognition, text generation) with a relatively simple architecture. An instructive example of this approach is the model AlphaGo (developed by Google), which learned how to play Go (a difficult, problematic board game) and ultimately became invincible to the human world champions. The most impressive thing about this is that she was trained to play without hard-coded rules or human intervention, that is, the model was “tabula rasa”. While ANN is certainly not a perfect analogy to the human brain, I asked the professor if ANNs were saying that we don’t really need hard-coded cognitive structures to learn from scattered data.

Chomsky correctly pointed out that ANNs are useful for highly specialized tasks, but these tasks should be sharply limited (although their volume may seem huge given the memory and speed of modern computers). He compared ANN to a massive crane working on a high-rise building; although such work is certainly impressive, both the building and the crane exist in fixed boundary systems. This line of reasoning is consistent with my observation that all the breakthroughs in deep learning that I have observed have occurred in very specific areas, and we do not seem to come close to anything like that in generalized artificial intelligence (whatever that means). Chomsky also pointed to the growing evidence that ANNs cannot accurately model human cognitive abilities, which are comparatively rich enough that the computing systems involved can extend even to the cellular level.

If Chomsky is right (and I think he is), what are the implications of advancing deep learning research? After all, there is nothing magical about the human brain. It’s just a physical structure made up of atoms, and therefore it is quite rational to believe that at some point in the future we will be able to create an artificial version of the brain capable of generalized intelligence. With that said, modern ANNs only offer a simulacrum of this kind of cognition, and by Chomsky’s logic, we won’t reach this next frontier without first deepening our understanding of how organic neural networks work.

Moral relativism

The ethical application of AI is a major problem in today’s data scientists, but at times it can seem vague and subjective in another specific area. Chomsky’s work not only provides a unique technical perspective on the future of deep learning; universal grammar also has profound moral implications, since language is how we talk about and interpret the world. For example, Chomsky believes that the aforementioned innate neural structures exclude moral relativism and that there must be universal moral constraints. There are many different varieties of moral relativism, but the basic principle is that there can be no objective basis for ethical definitions. Moral relativists argue that while we may deeply believe in statements such as “slavery is immoral,” we have no empirical way to prove it to those who disagree with us, since any proof will necessarily be based on value judgments, and our values in ultimately exogenous and determined by culture and experience.

Chomsky argues that morality manifests itself in the brain and is therefore, by definition, a biological system. All biological systems have variations (natural and due to different stimuli), but these variations also have limits. Consider the human visual system: experiments have shown that it has some flexibility and is shaped by experience (especially in early childhood). By varying the data entering the human visual system, one can literally change the distribution of receptors and thereby change the way a person perceives horizontal and vertical lines. What you cannot do is turn the human eye into an insect eye, or let someone see X-rays. According to Chomsky, biological systems (including morality) can vary widely, but not infinitely. He goes on to say that even if you believe that our morality comes entirely from culture, you still need to acquire that culture in the same way that you acquire any system (as a result of innate cognitive structures that are universal).

My first addition to what has been said in this article is this: if we assume that morality is simply a consequence of “fusion” (or something equally primitive), then while this may impose theoretical limits, my intuition is that that our morality can change so wildly that it is almost impossible to make universal judgments. Chomsky has debated in the past about how moral progress appears to follow certain trends (e.g., accepting differences, abandoning oppression, etc.), but I struggle to understand how these broad trends will consistently emerge from such simple atomic cognitive structures. When I spoke to the professor about this, he argued that this view is illusory and that when we do not understand things, they seem more varied and complex than they really are. He gave an example of the deviation observed in the skeletons of animal bodies after the Cambrian explosion. Just 60 years ago, the prevailing view in biology was that organisms change so much that each of them must be studied individually, but now we know that this is completely wrong and that the genetic differences between species are quite small. Variations in complex purchased systems should be minimal, otherwise we would not be able to acquire them.