The Empirical Triumph of Theory

July 4, 2023 Mona Baker Articles

Ted Underwood

29 June 2023

A graduate student who fell asleep in 1982 and woke up in 2022 might see large language models as a triumph for cultural theory. It is hard to imagine a clearer vindication of a thesis that linguists, critics, and anthropologists spent much of the twentieth century advancing — the thesis that language is not an inert medium used by individuals to express their thoughts but a system that actively determines the contours of the thinkable.

Ferdinand de Saussure’s distinction between parole and langue is concretely dramatized every time a user sends a prompt to a model. If we are troubled that the model’s responses are not organized by conscious intent, we can remind ourselves that “the unifying function of a subject” was for Michel Foucault an illusion. “Discourse is not the majestically unfolding manifestation of a thinking, speaking, knowing subject,” but a system of “enunciative modalities” that determine what can be said by what kind of persona. We could even conclude, with Roland Barthes, that human writing resembles a language model in being a palimpsest, “a multi-dimensional space in which a variety of writings, none of them original, blend and clash.” The social theory of meaning we need to understand this technology took shape long before the technology itself.

In a just world, every article about GPT-4 would nod toward Barthes and Foucault. But if our world is just at all, its justice is of the ironic kind that only declares a winner after the race has been forgotten. The models that dramatize twentieth-century cultural theory are aligned now with institutions that twenty-first-century theorists distrust. This makes it hard for anyone to take a victory lap. People who once nodded along with the last words of The Order of Things (1966) (“man would be erased, like a face drawn in sand at the edge of the sea”) may now feel that Foucault’s early critique of humanism needs nuance.

By the end of this essay, I will provide some nuance. But first we have to give twentieth-century theorists the victory lap they deserve. It needs to be said that mid-century structuralists understood how information technology could dovetail with a social theory of meaning and grasped the implications of that convergence more accurately than the scientists who were their contemporaries.

Bernard Geoghegan’s work on twentieth-century intellectual history has clarified at least the first part of this statement. We have learned that Roman Jakobson visited Bell Labs. We know that Claude Lévi-Strauss saw the emerging physical science of information — linking written glyphs to electrical signals and food — as a vindication of primitive man’s refusal to separate mind and world, revealing instead “a universe made up of meanings.”

But now that historians understand the close relationship between structuralism and information technology, there is some danger that we will interpret structuralist theory as a mere echo of computer science. To ward off that mistake, it is worth remembering that computer science has not generally been characterized by skepticism about the autonomous subject. On the contrary, the term “artificial intelligence” was coined in the 1950s to organize inquiry around a concept conceived as an attribute of individual minds. By the 1990s, theories of embodied cognition were shifting attention from logic to action, but the new emphasis on embodiment only reaffirmed an underlying assumption that intelligence was an attribute of individual agents.[1]

Large language models diverged from this tradition in ways that computational disciplines initially found disappointing. Instead of imitating individual minds, the new models imitated large corpora of writing. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” criticizes this approach for many reasons, but most fundamentally because the authors believe language lacks meaning without grounding in individual intention. “Human language use takes place between individuals . . . who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate.” An attempt to model language “not grounded in communicative intent” could only be “misdirected research effort.”

The success of this “misdirected” effort has tended to support theories of meaning that explain it instead as a collective phenomenon—like Lévi-Strauss’s “universe made up of meanings” or Foucault’s Archaeology of Knowledge (1969). The beautiful irony of this situation, of course, is that a generation of humanists trained on Foucault have now rallied around “On the Dangers of Stochastic Parrots” to oppose a theory of language that their own disciplines invented, just at the moment when computer scientists are reluctantly beginning to accept it.

Were late-twentieth-century humanists really as committed to a social theory of meaning as I have implied? Don’t Steven Knapp and Walter Benn Michaels connect meaning to authorial intent in “Against Theory”? Yes, but the fusion of text and intent is so complete in “Against Theory” that the article ends up resembling Barthes more than “On the Dangers of Stochastic Parrots.” The authors of “On the Dangers of Stochastic Parrots” think words require grounding in intent because they see meaning as separate from intent. That separability is exactly what Knapp and Michaels contest.[2]

As large language models begin to cast empirical light on these debates, they may also add nuance. For instance, if authors were really as dead as Barthes once claimed, there would be no need for the rhetorical reconfiguration that turned GPT-3 into ChatGPT. Predicting the next word in a corpus would be sufficient training. In practice, however, models perform better when they are trained to distinguish their own language from that of an interlocutor. The process is called “instruction tuning” because it encourages the model to interpret a prompt as an instruction (not just another piece of text that needs to be continued). Even language models, it turns out, need to make inferences about speakers and intentions. If GPT-3 dispensed with authors entirely, ChatGPT has been compelled to reconstruct a provisional “author-function”—rather as Foucault eventually did.

While the parallels to twentieth-century theory are fascinating, nothing I have said proves that humanists should welcome language models. To make a judgment of that kind we would need to predict their effects, and the future is hard to predict. This was a story about the past. I am not arguing that language models are beneficial—only that their refusal to ground language in an experiencing subject should be familiar for cultural theorists, and impossible to fully disown.

Ted Underwood is a professor of information sciences and English at the University of Illinois, Urbana-Champaign. His most recent book is Distant Horizons: Digital Evidence and Literary Change (2019), and his current research explores the affordances of language models for teaching and scholarship.

[1] See, for instance, Rodney A. Brooks, “Intelligence Without Representation.”

[2] “But once it is seen that the meaning of a text is simply identical to the author’s intended meaning, the project of grounding meaning in intention becomes incoherent.”