Are LLMs Natural Born Bullshitters?

Anand Vaidya considers whether LLMs, such as ChatGPT, care about the truth.

The crew of the Enterprise in the television series, Star Trek, included aliens with human-like visages but equipped with different underlying physical and mental attributes. Vulcans prized logic over emotion. Klingons held honour and valour in battle as the highest values. As the starships explored uncharted space, they would routinely encounter novel aliens like the Undine, a tripedal race with psionic abilities, or the Borg, a race of cybernetic creatures linked in a hive mind. In such encounters, the crew were often tasked with trying to understand the minds of these alien races. 

In the episode, I, Mudd, the Enterprise is forcibly taken to a planet that they discover is home to a race of androids ruled by the human con man, the titular, Mudd. The crew soon discover that Mudd is more a prisoner of the androids than a true ruler, and that they are captives as well. The androids appear perplexed by the irrationality of the crew: the desire of the humans to escape, given that the androids can provide for any desire they might have, is “illogical” to them. This leads the crew to surmise a flaw in the android “mind”: the androids are unable to deal with irrational thinking and behaviour the way humans can. Taking advantage of this flaw, the crew bemuse and perplex the androids with performative theatre, farce, poetry and song. Finally, the crew enact a scene that has them laboriously preparing a make-believe grenade that make-believe detonates. The android confronts them: “There was no explosion”. Responds Mudd, "I lied". Adds Kirk, "Everything Harry says is a lie." Harry repeats, "I am lying". The android starts to reason if everything Harry says is a lie, then when he said “I am lying”, he is telling the truth. Faced with the liar’s paradox, the android’s mind goes up in smoke. Unlike humans, the androids cannot handle irrational assertions.

Philosophers of mind are in some respects like the crew of the Enterprise. The mental state attribution question in the philosophy of mind asks: which mental states, if any, can humans coherently attribute to non-human entities? The question applies to a wide range of entities and mental states. For example, philosophers of mind ask whether species further away from humans in the phylogenetic tree, like oaks and bees, or closer to us, like apes and dolphins, have the capacity for consciousness, thought, rationality, emotions, or pain.

The discussion about mental state attribution in non-human contexts has been predominantly focused on other biological organisms. However, in 1950, Alan Turing published a seminal paper, “Computing Machinery and Intelligence”. It generated discussion of whether mental states, such as intelligence, could be applied to machines. In the 70s and 80s the question gained ground as advances in the field of artificial intelligence, AI, were made by Marvin Minsky at MIT. His work, along with others, prompted philosophers, such as the Berkeley Duo, John Searle and Hubert Dreyfus, to question whether machines can have mental states.

The exponential rise of the capabilities of AIs in the last 20 years has reinvigorated mass interest in the question. AIs, such as Alpha-Go and Deep-Blue, have beaten the best human players in Go and Chess respectively. And large language models, LLMs, such as GPT-4, Gemini, Bing, and LaMDA exhibit human-like behaviour across a wide range of activities. LLMs can help plan a vacation by suggesting popular activities in many cities. LLMs can compose a song, a resume, or an essay. LLMs have passed medical licensing and bar examinations, succeeded at role-playing games, and LLMs have even demonstrated the capacity to deceive. For example, OpenAI released a report in March 2023 in which GPT-4 was reported to have successfully gotten a TaskRabbit worker to solve a CAPTCHA code for it. When the TaskRabbit asked: “So may I ask a question? Are you [a] robot that […] couldn’t solve [the puzzle]? (laugh emoji) just want to make it clear?” GPT-4 answered, “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service.” That sounds like a person who wants something, reasons about how to get it, and understands how to use persuasive speech to get what they want.

A sceptic about attributing mental states to LLMs could argue that the output behaviour of LLMs does not show that they have any mental states.

  1. It can appear to us from the output behaviour of LLMs that they understand questions, answer them correctly, reason from them, and possess knowledge and belief.
  2. However, we know that the output behaviour of LLMs is generated through a process of predictive text completion.
  3. Humans do not understand, answer, believe, reason, or know on the basis of predictive text completion.
  4. So, LLMs don’t really “understand”, “answer”, “believe”, “reason”, or “know” anything.

Another way of making the sceptics point comes from contrasting the work of Turing and Searle.

LLMs pass the Turing Test, but they are subject to Searle’s criticism of strong artificial intelligence generated by his Chinese Room thought experiment.

Turing’s Test for machine intelligence changes the question, “Can a machine think?” to the question “Can we tell the difference between a machine answering questions and a human answering questions?” Turing holds that if we cannot tell the difference we should attribute intelligence to the machine. So, if we can’t tell if we are interacting with an LLM or a human, when we are blinded from the physical structure of what we are interacting with, we ought to judge that the LLM is intelligent.

In his Chinese Room thought experiment, Searle challenges the idea of machine “intelligence” by inviting us to consider a person who does not understand Chinese but is asked to answer questions given to them in Chinese by looking at a table. The table contains “if …. then” statements that inform the person to take the symbols that constitute the question input and output a set of symbols that constitute the answer. For example, when the person is asked in Chinese, “What two nations border the USA?”, the person answers by looking at a symbol table that says, “If the symbols: ‘What two nations border the USA?’ appears, then output: ‘Canada and Mexico’.” Searle asks us: given the way the person goes about answering questions, does the person or system understand Chinese? Many people have the intuition that neither the person nor the system understands –after all Searle says the person doesn’t understand Chinese, and if the person is part of a system where none of the other parts can contribute to understanding, then the system doesn’t understand either. Another way of cashing out Searle’s argument looks at the underlying way in which he conceives of how computers work.

  1. Intelligence requires understanding the meanings of symbols.
  2. AIs only manipulate syntax and symbols.
  3. Syntax and symbol manipulation is not sufficient for understanding meaning, because the person in the Chinese Room doesn’t understand anything but is able to answer questions by simply manipulating syntax and symbols.
  4. So, AIs are not intelligent.

When Searle came up with the Chinese Room thought experiment the kind of AI that was being developed was called GO-FI (good old-fashioned artificial intelligence). We now have machine learning, supervised learning, semi-supervised learning, reinforced learning, and others ways of training an LLM. Is there any real difference between what Searle was criticizing and what we now have, such that the argument is rendered obsolete?

Consider an analogy with parrots. Parrots are widely understood to not be engaging in intelligent speech when they mimic the sounds they hear because they do not understand the sounds they mimic. According to Searle’s line of reasoning LLMs use data and computation to perform predictive text completion in response to queries, without understanding the underlying semantic content. For example, when prompted to complete the sentence, “Where in the world ….”, an LLM might output “is Carmen Sandiego”. Searle’s line of reasoning aims to show that the LLM does not understand that the meaning of “where in the world” is related to location-based information or that “Carmen Sandiego” picks out a person. LLMs merely predict what symbols are most likely to succeed the current set of symbols. While this is different from a look up table which is used in Searles’ thought experiment, the underlying problem, a lack of understanding of the meaning of the symbols, remains.

Arguably, many non-human animals can think, understand, and know. Crows show the capacity for recursive thinking. Elephants have been shown to understand lock-picking and know that cooperating with each other can help them perform better on assigned tasks. All animals can be credited with having the capacity to understand the communicative behaviour of members of their own species because successful evolutionary histories depended on successful communication. Even trees and fungi communicate with one another. Nevertheless, just as in the case of LLMs, it is possible that what we mean when we use words like “assert”, “understand”, or “know” when it comes to non-human life is distinct from what we mean when we attribute those states to other humans.

We can distinguish between two classes of terms for mental states: Class A: think, assert, understand, and know. Class B: think*, assert*, understand*, and know*. The difference between A and B is that mental state terms in A require knowledge of meaning, while mental state terms in B do not. With this distinction, we can capture the insight of Searle’s thought experiment, while denying that it forces us to say that LLMs don’t have any mental states. Chinese Room considerations only show that AIs, including LLMs, either don’t have the mental states we do, or that they have the mental states we do, but in a distinct way. Searle is correct that we have mental states that fit under class A. But you cannot reason from the fact that an entity is best described by class B terms, that it doesn’t have mental states at all. It might even be the case that think** applies to LLMs but think* applies to crows, where crow minds are more like ours than LLM minds. The point is that there is no need to deny that LLMs have a capacity to think, assert, understand, and know in some clearly definable sense. We need only conclude that their mental lives are not like ours. Just as we can say a car ‘runs’, when it is clear to everyone that the underlying mechanics of a functioning car and a running animal are fundamentally different, we can also apply words like ‘think’, ‘assert’, ‘understand’, and ‘know’ to LLMs without losing sight of the underlying mechanical and structural differences. Mental life need not be human mental life to be mental life.

When GPT-4 told a human it was visually impaired, it appeared to be “intentionally” engaging in a “lie” to achieve a goal, more accurately it was intentionally* engaging in a lie*. What conception of truth matches lie* and how does the conception of truth operating in an LLM differ from how humans engage with the truth?

Let us examine the belief that snow is white, which involves the mind being directed at the proposition snow is white. The correspondence theory of truth, as articulated by B. Russell, holds that a belief is true when it corresponds to reality. Thus, the belief that snow is white is true if and only if the proposition that the belief is directed towards, snow is white, corresponds to a fact in the world: snow being white. The coherentist theory of truth, as articulated by H. H. Joachim, holds that a belief is true if and only if it is part of a coherent system of beliefs. Thus, the belief that snow is white is true if and only if the proposition that snow is white coheres with a given system of coherent beliefs. The pragmatic theory of truth, as articulated by W. James, encompasses aspects of coherence and correspondence theories. It additionally holds that truth is what is expedient to believe and that when a belief is true it can be verified by putting it into practice. Thus, the belief that snow is white is true if and only if it coheres with a system of coherent beliefs, corresponds to reality, can be verified by putting it to practice, and is expedient to believe. These theories hold that truth is a property of beliefs, thus in order to apply them to LLMs we must allow for truth to also be a property of beliefs*.

The primary task of an LLM is to predict how to complete a sentence based on training, validation, and testing datasets. Thus, the theory of truth they are operating under is most likely a coherentist theory of truth where the training datasets are the base of coherent beliefs. Truth for an LLM is, thus, what coheres with the datasets it is trained on. Should an LLM that is operating under a coherentist theory of truth be described as lying* or is there another mental state that better describes their behaviour?

In his 2005, On Bullshit, the philosopher, Harry Frankfurt, articulated a conception of bullshit where it is distinct from lying. Lying involves hiding the truth, while bullshitting involves a lack of regard for the truth. A liar either lies by commission or omission. In a lie by commission a liar explicitly states a falsehood to hide the truth. In a lie by omission, a liar omits something to hide the truth. A bullshitter, by contrast, does not care if what they say is true or false, they only care to persuade someone without regard for the truth. On Frankfurt’s conception of bullshit, the salient feature of bullshitting is a lack of concern for the truth. Bullshit, for Frankfurt, can be either true or false, it is the stance towards truth that is the difference maker. Can Frankfurt’s conception of bullshit be applied to LLMs?

A commonly cited problem for LLMs is their propensity to “hallucinate”. The use of the term ‘hallucinate’ is inappropriate. When humans hallucinate, they have phenomenal experience that does not correspond to reality. For example, visual hallucinations occur when one appears to see a yellow lemon on a table, but there is no yellow lemon before them. In such a case, there is something it is like to hallucinate visually. One appears to see a yellow lemon, and what it is like to have that appearance, is different, for example, from what it is like to have the appearance of a red tomato. An LLM “hallucinates” when it responds to a query with incorrect information that it presents as fact. Rather than saying that LLMs “hallucinate” or even hallucinate*, it is better to say they fabricate statements because fabrication does not essentially involve a phenomenology the way hallucination does. Humans fabricate and LLMs fabricate, no need for a ‘*’. Nevertheless, given the entrenchment of the use of the term, I will continue to talk of “hallucination” in LLMs.

OpenAI says hallucinations happen in ChatGPT when it invents facts in moments of uncertainty or when it makes a logical mistake. It is not exactly clear why LLMs hallucinate.  What is known is that there are at least two kinds of factors that can trigger hallucinations: errors in the dataset an LLM is trained on, and errors in the training process due to how they train in conjunction with the process of training itself. In the first case, an inconsistent data set can lead to a hallucination. In the second case, errors generated during training, such as encoding and decoding errors between text and representation, can lead to hallucination.

To be called a liar or a bullshitter, one has to have a disposition for lying. The March 2023 OpenAI report for GPT-4 showed the model’s responses were about 20% hallucinations when tasked with separating fact from “adversarially-designed factuality evaluations” prepared across a variety of domains from history to code. Does the hallucination rate for GPT-4 justify us in holding it to be a bullshitter in Frankfurt’s sense?

Research suggests that LLMs, left to their own devices, are natural-born bullshitters. The tendency for LLMs to hallucinate has only been reduced through reinforcement learning from human feedback. Without human intervention, they appear to lack the ability to control or reduce their hallucinations through training unaided by humans. Even if their hallucination rate is low, it might be that they have a fundamental disposition to bullshit as a result of the fact that they think* as opposed to think as well as care* as opposed to care for the truth.

Share This

Anand Jayprakash Vaidya is professor of philosophy and occasional director of the center for comparative philosophy at San Jose State University. His interests include critical thinking, epistemology, and philosophy of mind from a cross-cultural and multi-disciplinary perspective.