logo

Project AGI

Building an Artificial General Intelligence

This site has been deprecated. New content can be found at https://agi.io

Friday 17 October 2014

A Unifying View of Deep Networks and Hierarchical Temporal Memory


Browsing the NUPIC Theory mailing list, I came across a post by Fergal Byrne on the differences and similarities between Deep Learning and MPF/HTM. It's a great background into some of the pros and cons of each.

Given the popularity and demonstrated success of Deep Learning methods it's good to understand how they work and how they relate to MPF/HTM theory. For example, both involve construction of hierarchical data representations created via a series of unsupervised classifiers. Fergal rightly admonishes proponents of both methods for their reluctance to research the alternatives!

The article can be found here:

http://inbits.com/2014/09/a-unifying-view-of-deep-networks-and-hierarchical-temporal-memory/

Thursday 9 October 2014

On Predictive Coding and Temporal Pooling

Introduction

Predictive Coding (PC) is a popular theory of cortical function within the neuroscience community. There is considerable biological evidence to support the essential concepts (see e.g. "Canonical microcircuits for predictive coding" by Bastos et al).

PC describes a method of encoding messages passed between processing units. Specifically, PC states that messages encode prediction failures; when prediction is perfect, there is no message to be sent. The content of each message is the error produced by comparing predictions to observations.

A good introduction to the various theories and models under the PC umbrella has been written by Andy Clark ("Whatever next? Predictive brains, situated agents, and the future of cognitive science"). As Clark explains, the history of the PC concept goes back at least several decades to Ashby, quote: "The whole function of the brain is summed up in: error correction." Mumford pretty much nailed the concept back in 1992, before it was known as predictive coding (the cited paper gives a good discussion of how the neocortex might implement a PC-like scheme).

The majority of PC theories also model uncertainty explicitly, using Bayesian principles. This is a natural fit when providing explicit messaging of errors and attempting to generate predictions. Of course, it is also a robust framework for generative models.

It can be difficult to search for articles regarding PC because a similar concept exists in Signal Processing, although this seems to be coincidental, or at least the connection goes back beyond our reading. Unfortunately, many articles on the subject are written at a high level and do not include sufficient detail for implementation. However, we found work by Friston et al (example) and Rao et al (example, example) to be well described, although the former is difficult to grasp if one is not familiar with dynamical systems theory.

Rao's papers include application of PC to visual processing and Friston's work includes both the classification of birdsong and extends the concept to the control of motor actions. Friston et al wrote a paper titled "Perceptions as hypotheses; saccades as experiments" in which they suggest that actions are carefully chosen to optimally reduce uncertainty in internal predictive models. The PC concept throws up interesting new perspectives on many topics!

Comparison to MPF/CLA

There are significant parallels between MPF/CLA and PC. Both postulate a hierarchy of processing units with FeedForward (FF) and reciprocal FeedBack (FB) connections. MPF/CLA explicitly aims to produce increasingly stable FF signals in higher levels of the hierarchy. MPF/CLA tries to do this by identifying patterns via spatial and temporal pooling, and replacing these patterns with a constant signal.

Many PC theories create "hierarchical generative models" (e.g. Rao and Ballard). Hierarchical is enforced by restrictions on the topology of the model. The generative part refers to the fact that variables (in the Bayesian sense), in each vertex of the model, are defined by identification of patterns in input data. This agrees with MPF/CLA.

Both MPF/CLA and PC posit that processing units use FB data from higher layers to improve local prediction. In conjunction with local learning, this serves to reduce errors and therefore, in PC also stabilizes FF output.

In MPF/CLA it is assumed that cells' input dendrites determine the set of inputs the cell represents. This performs a form of Spatial Pooling - the cell comes to represent a set of input cells firing simultaneously, and hence the cell becomes a label or symbol representing that set. In PC it is similarly assumed that the generative model will produce objects (cells, variables) that represent combinations of inputs.

However, MPF/CLA and PC differ in their approach to Temporal Pooling, i.e. changes in input over time.

Implicit Temporal Pooling

Predictive coding does not expressly aim to produce stability in higher layers, but increasing stability over time is an expected side-effect of the technique. Assuming successful learning within a processing unit, its FF output will be stable (no signal) for the duration of any periods of successful prediction.

Temporal Pooling in MPF/CLA attempts to replace FF input with a (more stable) pattern that is constantly output for the duration of some sequence of events. In contrast, PC explicitly outputs prediction errors whenever they occur. If errors do not occur, PC does not produce any output, and therefore the output is stable. A similar outcome has occurred, but via different processes.

Since the content of PC messages differs to MPF/CLA messages, it also changes the meaning of the variables defined in each vertex of the hierarchy. In MPF/CLA the variables will represent chains of sequences of sequences ... in PC, variables will represent a succession of forks in sequences, where prediction failed.

So it turns out that Predictive Coding is an elegant way to implement Temporal Pooling.

Benefits of Predictive Coding

Where PC gets really interesting is that the amplitude or magnitude of the FF signal corresponds to the severity of the error.  A totally unexpected event will cause a signal of large amplitude, whereas an event that was considered a possibility will produce a less significant output.

This occurs because most PC frameworks model uncertainty explicitly, and these probability distributions can account for the possibility of multiple future events. Anticipated events will have some mass in the prior distribution; unanticipated events have very little prior probability. If the FF output is calculated as the difference between prior and posterior distributions, we naturally get an amplitude that is correlated with the surprise of the event.

This is a very useful property. We can distribute representational resources across the hierarchy, giving the resources preferentially to the regions where larger errors are occurring more frequently. These events are being badly represented and need improvement.

In biological terms this response would be embodied as a proliferation of cells in columns receiving or producing large or frequent FF signals.

Next post

In the next post we will describe a hybrid Predictive-Coding / Memory Prediction Framework which has some nice properties, and is appealingly simple to implement. We will include some empirical results that show how well the two go together.