All Generative AI output is essentially a hallucination

LLMs are designed to understand and generate human language by processing vast amounts of data, literally every scrap on the internet. They leverage specific learning techniques to create sophisticated models that can perform various language-related tasks. 

The most prominent examples, such as OpenAI’s and Google’s, have demonstrated remarkable proficiency in functions like text and image generation, summarization, and now, conversational AI. While this is laudable, the fact that they can hallucinate and provide untrustworthy output needs to be better understood.

We need to understand how GenAI works to understand the problem of hallucination. Given that GenAI seems to be made of words (or images and videos), we tend to think that these tools have a vast data store of words and images they go back to for reference. 

However, the decision to use a particular word in a GenAI sentence is not based on this data store or encyclopaedia. It is rooted in the mathematics that GenAI models (and other forms of AI such as predictive AI) use to run themselves.

Almost all AI is based on the ability to predict patterns. The mathematics for these predictions lies in the realm of statistical analysis, which uses a tool called ‘regression analysis’ to predict how one variable moves, given a specified change in another (independent) variable.

For example, if variable X moves by a factor of 2, variable Y moves by a factor of 4, and so forth. So a relationship is established mathematically based on the observation of how the two variables move in relation to each other. This forms a set of plotted points on a graph for variable Y that show how it moves when variable X moves. 

If there is a pattern (such as the 2:1 pattern described above), this becomes clear through ‘fitting’ a line that shows this pattern on a graph which approximates this relationship. While the mathematics behind this takes some understanding, the basic movement of two points on a graph in relation to each other is something most of us are taught in high school. 

The mathematical ability to predict that movement is usually reserved for higher education, usually in undergraduate or postgraduate studies, where such topics as the possibility of making prediction errors—such as false positives and false negatives—are also taught.

Simple predictive AI models have been in use for quite a while and do simple tasks such as predicting that if I buy a pair of shoes, there is a significant (let’s say 80%) probability that I will buy a few pairs of socks as well. This allows marketers to place pairs of socks in front of me, whether online or at a store, in the secure knowledge that there is an 80% chance that I will take the bait. 

This is the mathematics behind the eerie ‘Google/Meta/Amazon seems to know everything about me’ feeling we sometimes have. They don’t. All they have are patterns of your browsing (and buying) behaviour, and the ability to mathematically predict what you would most likely want to buy/see/watch or search for next.

As artificial intelligence becomes more sophisticated, such as in LLMs with ‘generative’ capabilities, this regression analysis extends to multivariate regression analysis. This means that I am no longer simply predicting how Y moves with X; I am predicting how A, B, C and D also move as variables Y and X change. This broader regression analysis method is sophisticated and involves stochastic calculus and various equations, which are beyond the scope of this column.

Suffice it to say that the generative LLM models are based on a set of numbers, not words or images.

These numbers predict the probability of the next word in a sentence being a particular one. If the LLM sees ‘the boy hit,’ it is likely to guess, say, ‘ball,’ after which it would suppose that the article ‘the’ also belongs in the sentence, thereby producing ‘the boy hit the ball’ as its output. 

These guesses are based solely on the probability that the word should be next and not on an encyclopaedic look-up of all existing words. However, these guesses are prone to the same mistakes as the smaller predictive AI models. For instance, Google’s Gemini once generated images of Nazi generals being multi-racial individuals, flipping historical accuracy on its head. (bit.ly/4evqjbQ)

So, what does all this mean?

It means all GenAI output, in essence, is a hallucination, since it is all dreamt up by machines based on probabilistic analyses. We call it out as a ‘hallucination’ only when we notice an inaccuracy, though we often gloss over it if we are not intimately familiar with the topic. 

Researchers and AI companies are, of course, working on reducing errors like the ones I described above, but as long as artificial intelligence is based on probability theory, we will always have the chance that it can blunder, however slight this chance may be.

Leave a Comment