This post is about a fantastic new paper by Michael R. Ferrier, titled:
Toward a Universal Cortical Algorithm: Examining Hierarchical Temporal Memory in Light of Frontal Cortical Function
The paper was posted to the NUPIC mailing list and can be found via:
The paper itself is currently hosted at:
It isn't clear if this is going to be formally published in a journal at some point. If this happens we'll update the link.
So, what do we like about this paper?
Purpose & Structure of the paper
The paper is mostly a literature review and is very well referenced. This is a great introductory work to the topic.
The paper aims to look at the evidence for the existence of a universal cortical algorithm - i.e. one that can explain the anatomical features and function of the entire cortex. It is unknown whether such an algorithm exists, but there is some evidence it might. Or, more likely, variants of the same algorithm are used throughout the cortex.
The paper is divided into 3 parts. First, it reviews some relevant & popular algorithms that generate hierarchical models. These include Deep Learning, various forms of Bayesian inference, Predictive Coding, Temporal Slowness and Multi-Stage Hubel Wiesel Architectures (MHWA). I'd never heard of MHWA before, though some of the examples (such as convolutional networks and HMAX) are familiar. The different versions of HTM are also described.
It is particularly useful that the author puts the components of HTM in a well-referenced context. We can see that the HTM/CLA Spatial Pooler is a form of Competitive Learning and that the proposed new HTM/CLA Temporal Pooler is an example of the Temporal Slowness principle. The Sequence Memory component is trained by a variant of Hebbian learning.
These ties to existing literature are useful because they allow us to understand the properties and alternatives to these algorithms: Earlier research has thoroughly explored their capabilities and limitations.
Although not an algorithm per se, Sparse Distributed Representations are explained particularly well. The author contrasts 3 types of representation: Localist (single feature or label represents state), Sparse and Dense. He argues that Sparse representations are preferable to Localist because the former can be gradually learnt and are more robust to small variations.
Frontal Cortex
The second part of the paper reviews the biology of frontal cortex regions. These regions are not normally described in computational theories. Ferrier suggests this omission is because these regions are less well understood, so they offer less insight and support for theory.
However these cortical areas are of particular interest because they are responsible for representation of tasks, goals, strategy and reward; the origin of goal-directed behaviour and motor control.
Of particular interest to us is discussion of biological evidence for the hierarchical generation of motor behaviour and output to motors directly from cortex.
Thalamus and Basal Ganglia
The paper discusses the role of the Thalamus in gating messages between cortical regions, and discusses evidence that the purpose of the Striatum and Basal Ganglia could include deciding which messages are filtered in the Thalamus. Filtering is suggested to perform the roles of attention and control (this all perfectly matches our understanding of the same).
There is a brief discussion of Reinforcement Learning (specifically, Temporal-Difference learning) as a computational analogue of Thalamic filter weighting. This has been exhaustively covered in the literature so wasn't a surprise.
Towards a Comprehensive Model of Cortical Function
The final part of the paper links the computational theories to the referenced biology. There are some interesting insights (such as that messages in the feedback pathway from layer 5 to layer 1 in hierarchically lower regions must be "expanding" time; personally I think these messages are being re-interpreted in expanded time form on receipt).
Our general expectation is that feedback messages representing predicted state are being selectively biased or filtered towards "predicting" that the agent achieves rewards; in this case the biased or filtered predictions are synonymous with goal-seeking strategies.
Overall the paper does a great job of linking the "ghetto" of HTM-like computational theories with the relevant techniques in machine learning and neurobiology.
Yes, I agree, this paper was fantastic. Pulls together a good story and grounds it in existing fields.
ReplyDeleteUnfortunately the last section on Abstract Representations, which was in many ways the most thought provoking, took as its starting point this, which I think was a misconception based on mistakes in the Numenta CLA white paper:
"CLA generates [...] spatial invariance by overlaying the sparse active representation of a given visual feature with the predictive representations of those visual features that are subsequently active with the highest probability."
Though I suppose something analogous might arise from temporal pooling... with or without Predictive Coding.