Page 2 of 2
Illustration by Andy Gilmore
For example, in your hypothetical bar conversation, no matter how ambiguous the acoustics, you are more likely to hear “credit crunch” as opposed to “credit brunch.” Here the high-level conceptual representation “credit crunch” provides contextual constraints on the words, which restrict the sounds predicted or heard, namely “c,” not “b.” If the bar is very noisy, you may find yourself watching your friend’s mouth closely. This is because the cause (speaking) allows you to make both acoustic and visual predictions. Hierarchical optimization allows sounds to help you see and sights to guide hearing — binding different sensations into a coherent perceptual framework. This recurrent message-passing leads to the self-organized brain dynamics that support perception and recognition.
The perspective afforded by this hierarchical Bayesian formulation is especially important for neuroscience, because hierarchy is a key architectural principle of brain anatomy — our brains are organized in successive layers, and we can measure the neural activity encoding prediction errors and representations. But this is not the end of the story. What follows is a new theory that considers what would happen if the free-energy principle applied not just to perception but to action as well.
Let’s begin with the notion of an ensemble density — a probability distribution of the states you or I can occupy. Imagine I had 100 million copies of you, at different times in your daily life. If I could measure all your sensory states, I could construct a sample density or histogram that reflected the probability of your being in any particular state. Critically, for you to exist, the number of states you occupy must be small in relation to all possible states. For example, your temperature will always be in a certain range. Mathematically, this means your ensemble density has low entropy. Here, we meet a characteristic of adaptive biological agents (like you and I) in that they seem to resist the second law of thermodynamics (a universal tendency to disorder) by minimizing the entropy of their ensemble densities. What does minimizing entropy mean? It simply means that you will, on average, avoid surprising or improbable states (i.e., you will not find yourself at the bottom of the ocean or suddenly engulfed in flames). Though arcane, this implies something quite fundamental: To exist, you must avoid surprising states.
Adaptive agents like us are open systems that exchange with their environment. The environment acts on us, which produces sensory impressions, and we act on the environment to change its states: If you see an apple on a table, you can reach out to pick the apple up. If we can change the environment that causes sensory input, then, in principle, we can act to suppress surprising input. But there is a problem: How do we compute surprise? In fact, we do not need to compute surprise at all. Returning to Feynman’s elegant methodology, all we need to do is to minimize free-energy, because free-energy is an upper boundary on surprise. This means that free-energy can be used not only to optimize perception, but also to prescribe action. This is the basis of the free-energy principle, which states that all quantities associated with an agent will change to minimize free-energy. This line of reasoning prescribes an intimate relationship between perception and action, where both work in concert to suppress free-energy (that is, to minimize prediction errors or surprise) in our sensory experiences. In other words, we will actively sample sensory data so that it conforms to our expectations; we will constantly alter our relationship with our environment so that our expectations become self-fulfilling prophecies. A simple example of this is turning one’s head to get a better view of what seems to be a familiar face in peripheral vision, but this principle may encompass our entire navigation of the world to avoid the unexpected.
In terms of neuroscience, the key issue is not so much the information theoretic principles above, but how the brain realizes them. Multiple predictions follow from these ideas. For example, brain systems should be deployed hierarchically and connected reciprocally. Forward connections should be largely linear in their influences, whereas backward connections should embody the nonlinearities inherent in the causal structure of the world. We would expect that predictable stimuli evoke smaller responses, and unexpected stimuli larger ones. Scientists are now starting to confirm these conjectures with brain mapping, by comparing brain responses with stimuli that are coherent or incoherent, predictable or unpredictable. This principle also has implications beyond neuroscience, in the sense that it applies to all biological agents. Could single-cell organisms use the concentration of metabolites and kinetic rate-constants (as opposed to neuronal activity and connection strengths) to encode their implicit representations? In this speculative case as well as with the brain, the great challenge is to find the mapping between the internal states of a phenotype and representations that this theory mandates.
Returning to statistical machines, from which much of this work emerged, the theory suggests a profound revision of current approaches to reinforcement learning and optimal control in engineering artificial neural networks. It should be possible to teach automata (such as robots) complex adaptive behaviors by simply exposing them to a controlled environment (like a classroom), then returning them to their normal surroundings to seek out the new states they have learned to expect. The limitations of this approach are difficult to predict, but further synergy between theoretical neurobiology and machine learning, between a deeper understanding of our own minds and those we wish to create, appears inevitable. — Karl Friston is the scientific director of the Wellcome Trust Center for Neuroimaging.
Originally published January 27, 2009
Page 2 of 2