Project AGI: November 2015

Figure 1: The physical architecture of general intelligence in the brain, namely the Thalamo-Cortical system. The system comprises a central hub (the Thalamus) surrounded by a surface (the Cortex, shown in blue). The Cortex is made up of a number of functionally-equivalent units called Columns. Each Column is a stack of cell layers. Columns share data, often via the central hub. The hub filters the data relayed between Columns.

Authors: Rawlinson and Kowadlo

This is part 2 of our series on how to build an artificial general intelligence (AGI). This article is about what we can learn from reverse-engineering mammalian brains. Part 1 is here.

The next few articles will try to interpret some well-established neuroscience in the context of general intelligence. We’ll ignore anything we believe is unrelated to general intelligence, and we’ll simplify things in ways that will hopefully help us to think about how general intelligence happens in the brain.

It doesn’t matter if we are missing some details, if the overall picture helps us understand the nature of general intelligence. In fact, excluding irrelevant detail will help, as long as we keep all the important bits!

These articles are not peer reviewed. Do assume everything here is speculation, even when linked to a source reference (our interpretation may be skewed). There isn’t space to repeatedly add this caveat throughout these articles.

1. Physical Architecture

First we’ll review and interpret the gross architecture of the brain, focusing on the Thalamo-Cortical system, which we believe is primarily responsible for general intelligence.

The Thalamo-Cortical system comprises a central hub (the Thalamus and Basal Ganglia), surrounded by a thin outer surface (the Cortex). The surface consists of a large number of functionally-equivalent units, called Columns. The Cortex is wrinkled so that it’s possible to pack a large surface area into a small space.

Why are the units called Columns? It’s the physical structure of their connectivity patterns. Cells within each Column are highly interconnected, but connections to cells in other Columns are fewer and less varied. Columns occupy the full thickness of the surface, approximately 6 distinct layers of cells. Since these layers are stacked on top of each other and are loosely connected between stacks, we have the appearance of a surface made of Columns.

Confusingly, there are both Macro and Micro-Columns, and these terms are used inconsistently. In these articles we will simply say ‘Column’ when referring to a Macro-Column as defined in a previous post.

In the previous article we described the ideal general intelligence as a structure made of many identical units that have each learned to play a small part in a distributed system. These theoretical units are analogous to Columns in real brains.

Columns can be imagined to be independent units that interact by exchanging data. However, data travelling between Columns often takes an indirect path, via the central hub.

The hub filters messages passed between Columns. In this way, the filter acts as a central executive that manages the distributed system made up of many Columns.

We believe this is a fundamental aspect of the architecture of general intelligence.

In mammalian brains, the filter function is primarily the role of the Thalamus, although its actions are supported and modulated by other parts, particularly the Basal Ganglia.

Other brain components, such as the Cerebellum, are essential for effective motor control but maybe not essential for general intelligence. They are not within the scope of this article.

2. Logical Architecture

The Cortex has both a physical structure (a layered surface, partitioned into columns) and a logical structure. The logical structure is a hierarchy - a tree-like structure that describes which columns are connected to each other (see figure 2).

Connections between columns are reciprocal: “Higher” columns receive input from “Lower” columns, and return data to the same columns. This scheme is advantageous: Higher columns have (indirect) input from a wider range of sources; lower columns use the same resources to model more specific input in greater detail. This occurs naturally because each column tries to simplify the data it outputs to higher columns, allowing columns of fixed complexity to manage greater scope in higher levels, as data is incrementally transformed and abstracted.

Only Columns in the lowest levels receive external input and control external outputs (albeit, often indirectly via subcortical structures).

Figure 2: The logical architecture of the cortex, a hierarchy of Columns. The hierarchy gives us the notion of “levels”, with the lowest level having external input and output, and higher levels being increasingly abstract. The logical architecture is superimposed on the physical architecture. Note that inter-Column connections may be gated by the central hub (not shown).

Note that there are not necessarily fewer columns in each hierarchy level; there may be, but this is not essential. However, abstraction increases and scope broadens as we move to higher hierarchy levels.

We can jump between the physical and logical architectures of the Cortex. Moving over the surface implies moving within the hierarchy. It also implies that moving between areas we will observe responses to different subsets of input data. Moving to higher hierarchy levels implies an increase in abstraction. We can observe this effect in human brains, for example by following the flow of information from the processing of raw sensor data to more abstract brain areas that deal with language and understanding (see figure 3).

Figure 3: Flow of information across the physical human Cortex also represents movement higher or lower in the logical hierarchy (increasing or decreasing abstraction). In fact, we can observe this phenomenon in human studies. Different parts of the hierarchy are specialised to conceptual roles such as understanding what, why and where things are happening. Image source.

One final point about the logical architecture. The hierarchical structure of the Cortex is mirrored in the central hub, particularly in the Thalamus and Basal Ganglia, where we see the topology of the cortical Columns preserved through indirect pathways via central hub structures (figure 4).

Figure 4: Data flows between different Columns within the Cortex either directly, or via our conceptual “central hub”. Our hub includes Basal Ganglia such as the Striatum, and the Thalamus. Throughout this journey the topology of the Cortex is preserved. Image source.

3. Layers and Cells

Each Column has approximately 6 distinct “layers”. Like every biological rule, there are exceptions; but it suffices for the level of detail we require here. The layers are visual artefacts resulting from variations in cell type, morphology, connectivity patterns and therefore function between the layers (figure 5).

Figure 5: Various stainings showing variation in cell type and morphology between the layers of the Cortex. Image source.

The Cortex has only 5 functional layers. Structurally, it has 6 gross layers; but one layer is just wires; no computation. In addition, the functional distinction between layers 2 and 3 is uncertain, so we will group them together. This gives us just 4 unique functional layers to explain.

We will use the notation C1 ... C6 to refer to the gross layers:

C1 - just wiring, no computation; not functional
C2/3 (indistinct)
C4
C5
C6 (known as the “multiform” layer due to the variety of cell types)

The cortex is made of a veritable menagerie of oddly shaped cells (i.e. Neurons) that are often confined to specific layers (see figure 6). Neurons have a body (soma), dendrites and axons. Dendrites provide input to the cell, and reach out to find that data. Axons transmit the output of the cell to places where it can be intercepted by other cells. Both dendrites and axons have branches.

Figure 6: Some of the cell types found in different cortical layers. Image source.

An important feature of the Cortex is the presence of specialized Neuron cells with pyramidal Soma (bodies) (figure 7). Pyramidal cells are predominantly found in C2/3, C5 and C6. They are very large and common cells in these layers.

Pyramidal cells do not behave like classical artificial neurons. We agree with Hawkins' characterisation of them. Pyramidal cells have two or three dendrite types: Apical (proximal or distal) dendrites and Basal dendrites. Distal Apical dendrites seem to behave like a classical integrate-and-fire neuron in their own right, requiring a complete pattern of input to “fire” a signal to the body of the cell. In consequence, each cell can respond to a number of different input patterns, depending on which apical dendrites become active from their input.

Hawkins suggests that the Basal dendrites provide a sequential or temporal context in which the Pyramidal cell can become active. Output from the cell along its axon branches only occurs if the cell observes particular instantaneous input patterns in a particular historical context of previous Pyramidal cell activity.

Within one layer of a Column, Pyramidal cells exhibit a self-organising property that results in sparse activation. Only a few Pyramidal cells respond to each input stimulus. The Pyramidal cells are powerful pattern and sequence classifiers that also perform a dimensionality-reduction function; when active, the activity of a single Pyramidal cell represents a pattern of input over a period of time.

The training mechanism for sparsity and self-organisation is local inhibition. In addition to Pyramidal cells, most of the other Neurons in the Cortex are so-called “Interneurons” that we believe play a key role in training the Pyramidal cells by implementing a competitive learning process. For example, Interneurons could inhibit Pyramidal cells around an active Pyramidal cell ensuring that the local population of Pyramidal cells responds uniquely to different input.

Unlike Pyramidal cells, which receive input from outside the Column and transmit output outside the Column, Interneurons generally only work within a Column. Since we consider Interneurons play a supporting role to Pyramidal cells, we won’t have much more to say about them.

Figure 7: A Pyramidal cell as found in the Cortex. Note the Apical and Basal dendrites, hypothesised to recognise simultaneous and historical inputs patterns respectively. The complete Pyramidal cell is then a powerful classifier that when active represents a particular set of input in a specific historical context. Image source.

Summary

That's all we feel is necessary to say about the gross physical structure of the Thalamo-Cortical system and the microscopic structure of its cells and layers. The next article will look at the circuits and pathways by which these cells are connected, and the computational properties that result.

The artificial neuron model used by Jeff Hawkins and Subutai Ahmad in their new paper (image reproduced from their paper, and cropped). Their neuron model is inspired by the pyramidal cells found in neocortex layers 2/3 and 5.

It has been several years since Jeff Hawkins and Numenta published the Cortical Learning Algorithm (CLA) whitepaper.

Now, Hawkins and Subutai Ahmad have pre-published a new paper (currently to arXiv, but peer-review will follow):

http://arxiv.org/abs/1511.00083

The paper is interesting for a number of reasons, most notably the combination of computational and biological detail. This paper expands on the artificial neuron model used in CLA/HTM. A traditional integrate-and-fire artificial neuron has one set of inputs and a transfer function. This doesn't accurately represent the structure or function of cortical neurons, that come in various shapes & sizes. The function of cortical neurons is affected by their structure and quite unlike the the traditional artificial neuron.

Hawkins and Ahmad propose a model that best fits Pyramidal cells in neocortex layers 2/3 and 5. They explain the morphology of these neurons by assigning specific roles to the various dendrite types observed.

They propose that each dendrite is individually a pattern-matching system similar to a traditional artificial neuron: The dendrite has a set of inputs to which it responds, and a transfer function that decides whether enough inputs are observed to "fire" the output (although nonlinear continuous transfer functions are more widely used than binary output).

In the paper, they suggest that a single pyramidal cell has dendrites for recognising feed-forward input (i.e. external data) and other dendrites for feedback input from other cells. The feedback provides contextual input that allows the neuron to "fire" only in specific sequential contexts (i.e. given a particular history of external input).

To produce an output along its axon, the complete neuron needs both an active feed-forward dendrite and an active contextual dendrite; when the neuron fires, it implies that a particular pattern has been observed in a specific historical context.

In the original CLA whitepaper, multiple sequential contexts were embodied by a "column" of cells that shared a proximal dendrite, although they acknowledged that this differed from their understanding of the biology.

The new paper suggests that basket cells provide the inhibitory function that ensures sparse output from a column of pyramidal cells having similar receptive fields. Note that this definition of column differs from the one in the CLA whitepaper!

The other interesting feature of the paper is its explanation of the sparse, distributed sequence memory that arises from a layer of the artificial pyramidal cells with complex, specialised dendrites. This is also a feature of the older CLA whitepaper, but there are some differences.

Hawkins and Ahmad's paper does match the morphology and function of pyramidal cells more accurately than traditional artificial neural networks. Their conceptualisation of a neuron is far more powerful. However, this doesn't mean that it's better to model it this way in silico. What we really need to understand is the computational benefit of modelling these extra details. The new paper claims that their method has the following advantages over traditional ANNs:

- continuous learning
- robustness of distributed representation
- ability to deal with multiple simultaneous predictions

We follow Numenta's work because we believe they have a number of good insights into the AGI problem. It's great to see this new theoretical work and to have a solid foundation for future publications.

Project AGI

Building an Artificial General Intelligence

Sunday, 29 November 2015

How to Build a General Intelligence: Reverse Engineering

1. Physical Architecture

2. Logical Architecture

3. Layers and Cells

Summary

Thursday, 12 November 2015

New HTM paper - “Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex”

Recent activity