Over the past decade, neuroscience has revealed that rather than acting as a filter that simply maps sensation onto action, the brain behaves like an “inference machine” that tries to discover patterns within data by refining a model of how those patterns are likely to be generated. For instance, depending on whether the context is a crowded concert hall or a deserted forest, a sound can be perceived as either a human voice or the wind whistling through trees. The pioneering German physicist Hermann von Helmholtz articulated this idea as early as 1860, when he wrote of visual perception that “objects are always imagined as being present in the field of vision as would have to be there in order to produce the same impression on the nervous mechanism.” Now a unified understanding of how the brain makes and optimizes its inferences about the outside world is emerging from even earlier work — that of the 18th-century mathematician Thomas Bayes.
Bayes developed a statistical method to evaluate the probability of any given hypothesis being true under changing conditions. The concept is straightforward: The probability of two things happening together is the probability of the first given the second, times the probability of the second. This allows the certainty of a single inference to be weighed according to how much additional evidence exists at any particular time. The “Bayesian” approach has emerged in many guises over the past century and has proved very useful in computer science applications like machine learning.
Since at least the 1980s, neuroscientists have speculated that the brain may use Bayesian inference to make predictions about the outside world. In this view, the brain estimates the most likely cause of an observation (that is, sensory input) by computing the probability that a particular series of events generated what was observed — not unlike a scientist who constructs a model to fit his or her data. This probability is a mathematical quantity we call the “evidence.” But evaluating the evidence for most realistic models requires calculations so intricate and lengthy they become impractical. This would be particularly problematic for the brain, which must constantly make split-second decisions. Fortunately, there is an easier way. In 1972 the American physicist Richard Feynman devised an elegant shortcut to calculate the evidence using something called a “free-energy bound.” Free-energy is a concept from statistical thermodynamics — it is essentially the energy that can be used for work within a system once that system’s entropy, or useless energy, has been subtracted.
Feynman’s basic idea was simple: Instead of trying to compute the evidence explicitly, just start with a quantitative guess about the causes, which we will call a “representation,” and then adjust the representation until it minimizes the free-energy of the data. Feynman exploited the fact that the free-energy is, by construction, always greater than the negative logarithm of the evidence, a mathematical quantity we will call “surprise.” In other words, the free-energy is an upper boundary upon surprise (remember this — we’ll come back to it later). So by changing the representation to minimize free-energy, the representation becomes the most likely cause of whatever sensory inputs make up an observation, and the free-energy becomes the evidence itself. The machine-learning community has used this approach with great success, leading many researchers to wonder: If minimizing free-energy is so effective in allowing statistical machines to perceive and “learn” about their surroundings, could the brain be taking similar shortcuts?
In this formulation, a “representation” is simply a quantitative guess about the likely cause of a sensory observation. To understand representation in the brain, imagine you are in a bar having a conversation. The sounds you hear have no meaning beyond being the product of someone speaking. Your brain must first represent the deeper cause of the sounds (in this case, the concepts and words that make up the speech) via its internal variables like the activity of neurons and the strengths of connections between them. Only then can you infer any meaning. What would this process look like?
The emerging picture is that the brain makes its inferences by minimizing the free-energy of messages passing between hierarchical brain regions. Imagine the brain as an onion, where meaningful exchanges with the outside world take place on its surface (the outer sensory layer). Information from these exchanges passes on to “higher” levels (those responsible for cognitive functions) through “bottom up” connections. The higher levels respond with “top down” messages to the lower levels. This reciprocal exchange repeats itself hierarchically, back and forth, layer by layer, until the highest level (at the center of the onion, or front of the brain) becomes engaged. Only then will you consciously register a perception. In this scheme, the free-energy is essentially the collective prediction error over all levels of the hierarchy: Top-down cognitive messages provide predictions based on representations from above, and lower sensory levels reciprocate with bottom-up prediction errors. These “error messages” drive encoded representations (such as neuronal activity) to improve the predictions for lower levels (that is, to reduce free-energy).



























