logo

Project AGI

Building an Artificial General Intelligence

This site has been deprecated. New content can be found at https://agi.io

Saturday, 31 May 2014

TP 2/3: Jeff's new Temporal Pooler

By David Rawlinson and Gideon Kowadlo

Jeff's new Temporal Pooler

This is article 2 in a 3 part series about Temporal Pooling (TP) in MPF/CLA-like algorithms. You can read part 1 here. For the rest of this article we will assume you've read part 1.

This article is about the new TP proposed by Jeff Hawkins. The original TP was described in the CLA white paper. We will also assume you've at least had a quick read of the linked articles. Despite our best efforts this article is only an interpretation of those methods, and it may not be entirely correct or as Numenta intended.

Separation of internal and external causes

The first topic in Hawkins' proposal covers the possible roles of specific cortical layers and the separation of internal and external causes.

Hawkins suggests that cortical layers 3,4,5 and 6 are all implementing variants of the same algorithm, with minor differences. He also suggests that each layer is performing all functions (Spatial Pooling, Sequence Memory and Temporal Pooling. In higher hierarchy levels, Spatial Pooling may be absent). In CLA, these 3 components are implemented as a matrix of sequence-memory cells. Rules concerning how and when the cells activate each other in different input contexts implement the pooling and prediction features.

Hawkins also states that one distinction between layers 3 and 4 may be that cells in layer 4 are using copies of motor actions (internal causes), to predict better. In consequence, cells in layer 3 are left trying to predict the actions that layer 4 could not predict, i.e. relying more on historically-observed sequential patterns of activation. Although both layers will learn sequential patterns of activation, layer 3 will rely more heavily on history. External causes will more often generate input that can’t be explained by motor actions, so we might expect layer 3 to more often respond to external events.

This article will not discuss these ideas any further. We note them for clarity and to distinguish our focus. Instead, we will talk about how the new CLA TP could be used to construct a hierarchical representation of changes in input patterns over time.

Temporal Slowness

Both old and new temporal poolers exploit a principle known as "Temporal Slowness", which means that output activity varies more slowly than input activity. You can read more about this general principle here.

Another, related feature of the HTM and CLA temporal poolers is that they emit a constant pattern of cell activity to generate stability. This is achieved by marking cells as "active" regardless of whether they are active via prediction or via feedfoward input. Although active-by-prediction and active-by-FF-input are distinguished within the region for learning purposes, this distinction is not visible to the next higher region in the hierarchy.

Old Temporal Pooler Cell Activity

For reference, this is an outline of the TP functionality in the "old" TP from the Numenta CLA white paper.

The diagrams in this article each have 3 parts. At the top is a graph showing a fragment (in graph terminology, a component) of the Sequence Memory encoded by the cells in the region. Arrows show the learnt transitions between cells. Below the graph is a series of observations (marked "FF input") and the corresponding pattern of cell activity when each Feed-Forward (FF) input is observed. Each column represents a single cell from the sequence memory and its activity over time. Each row in the lower part of the diagrams shows all cells' activity at one moment. Cells are filled white when they are active and black when not active:



Figure 1: Spatial pooler and temporal pooler cell activity in the original CLA white paper. This image compares two patterns of cell activity over time, shown left and right. The left subfigure shows cell activity in spatial pooler cells, where the active cell[s] are the ones whose input bits most closely match current FF input. The right subfigure shows the original CLA temporal pooling method, where cells become active when predicted far in advance of their FF input being observed, and remain active until after the associated FF input is observed. In this example, ⅔ of the active cells are identical after each FF input change. A ‘P’ denotes cells activated due to prediction. Each subfigure has 3 parts. The main part is a matrix showing sequence memory cell activity over time. Each row is one time-step, numbered 1 to 5. The FF input observed at each time step is shown in a column to the left. The top row shows a fragment of Sequence Memory formed by the cells, and the colours each cell responds to. In this simple example, the Sequence Memory graph is simply a sequence of states that are always observed in the same order.
Spatial Pooler cell activity is easiest to explain (figure 1, left). In figure 1, we see that the Sequence Memory has learnt that the colours Red, Yellow, Green, Blue and Black occur in order. One cell responds uniquely to each of these colours, creating the diagonal line of active cells over time. Each cell is only active for the duration of its FF input being observed.

The original temporal pooler premise was to drive cells to an active state via prediction (cells active via prediction are marked with a 'P') as early as possible. The cells would then remain active until either the prediction was no longer made (e.g. due to observation of an unexpected FF input pattern) or the cell becomes active via its FF input.

Time and Stability

You can see in the figure above that in a sequence of predictable inputs each cell is active over a period of 3 input changes (the only meaningful way to measure time in these examples). So let's assume each input change is one time step.

The cells are shown to be active for an arbitrary period of time - to keep the diagrams simple the minimum period is shown. In reality a fixed activation period is unlikely; it will depend on the activation or prior cells in the sparse distributed representation. However, it is still possible to make the point that at any time during a predictable sequence, a set of cells is active. Most of those cells are not changing between time steps and will be active after the next FF input change. This is the temporal pooler in action.

In the example above, each cell is predicted up to 2 steps before the corresponding input is observed. Therefore, in a predictable sequence 3 cells are simultaneously active, 2 of them due to prediction and one due to FF input.

Although the output of the temporal pooler is continuously changing, most of the active cells are not changed between inputs. In this case, 2/3 of the output is stable. With longer activation of TP cells, a larger fraction of the output becomes stable.

Noisy recognition and resource constraint assumptions

To simplify the problem observed by the next level in the hierarchy it is necessary to have cell activity changes in the lower level without corresponding cell activity changes in the higher level. Given that the TP output is continuously changing, how do we sometimes avoid cell activity changes in the higher level?

There are two parts to the answer. First, all cells' FF input are Sparse Distributed Representations (SDRs). These are large sets of input bits, of which only a few are active at any given time. The Spatial Pooler in CLA recognises FF inputs when only a fraction of the expected (synapsed) input bits are active. For example, a cell may become active from FF input when 80% of its FF input bits are active - any 80%. The set of active input bits can change while the cell is active. This means that cells' recognition is tolerant to noisy FF input.

Noisy recognition of TP output from lower hierarchy levels is one assumption necessary for increasing stability in higher levels. But this assumption is actually a useful feature, allowing classification to be tolerant to noise.

The other necessary assumption is a resource-constraint. If an infinite supply of cells were available, then after much slow learning every FF input pattern would have a dedicated cell (due to inhibition between cells). Cell activity changes would occur throughout the hierarchy after every input change, no matter how tiny. Obviously, resource constraints are physically necessary.

The finite size of a CLA region ensures that there aren't enough cells to represent each FF input pattern perfectly. Instead, some similar (probably successive) FF inputs will be represented by the same set of active cells (quantization error).

These are both very reasonable assumptions, but worth stating and understanding.

New Temporal Pooler Cell Activity

The new TP proposes that cells are only counted as active when confirmed by observation of the corresponding FF input, and that they stay active for a period of time after this:


Figure 2: Spatial pooler and temporal pooler cell activity as described by the new TP method. Each TP cell is active for a period of time after its corresponding FF input is observed. There may be no distinct SP or TP cells, but we show them separately to illustrate differences in activation behaviour.
This in itself doesn't change much, but because we are now building patterns forwards we can represent unpredicted events more accurately. Cells are no longer active until the corresponding FF input is observed or prediction is cancelled; instead they are never fully activated prior to the corresponding FF input. When a prediction error occurs, the results are immediate and lasting. Given noisy recognition of FF input, the old method would be more likely to have hidden prediction failures.

Temporal pooling “replaces” graph components (specifically sequences of vertices) with a single vertex that represents the component by being constantly active for the duration of those inputs. It is also worth noting that to simplify any graph, the minimum number of vertices in each replaced sequence is 3. In a temporal pooler, this means that the minimum number of FF input changes for which a predicted cell must remain active is 3. If temporal pooler output is constant for sequences of length 2, the next hierarchy level will encode transitions between cells instead of sequences of cells (i.e. no simplification or effective pooling has occurred).

Activity after a Successful Prediction

The new TP proposes that there are two cortical layers of cells. One layer of cells embodies the Spatial Pooler. The other layer forms the TP. In the TP layer, cells remain active for a long time after being successfully predicted in the SP layer, but for only a short time when not predicted in the SP layer.

Prediction failures will occur regularly, whenever there are multiple future states and the available data does not allow the correct future to be determined. This looks like a fork in the Sequence Memory graph:

Figure 3: In this non-deterministic sequence, Red is followed by Yellow or Green. When the prior Red is observed, Yellow is predicted. Since it was predicted, the Yellow cell stays active for 3 steps in total. Since Blue always follows Yellow and Green, and Black follows Blue, the other cells are all active for the full 3 steps.
In the example above, the Yellow cell is successfully predicted leading to a long activation of the sequence memory cell that responds to Yellow after Red.

Activity after a Failed Prediction

The new TP proposes that in the event of a failed prediction, cells only remain active briefly. This is shown in the example below, where Yellow was predicted but Green was observed:

Figure 4: New TP cell activity after a failed prediction. Yellow was predicted but Green was observed.
The pattern of activity after a failed prediction is initially different to the pattern after a correct prediction, with only a short activation of the Green cell and no full activation of the predicted cell at all. This means that now, cells are only active when their corresponding FF input is actually observed.

The FF output of the TP after a prediction failure is quite different to the FF output during predictable sequences before and after. This helps to ensure that the unpredictable transition is modelled in higher hierarchy levels, passing the problem up the hierarchy rather than obscuring it. We anticipate that higher levels of the hierarchy will have the ability to understand and hence predict the problematic transition.

Analysis


Prediction with/without motor output distinguishes Cortex layers 3,4

Copying motor actions back to the cortex to help with prediction makes sense, especially in lower hierarchy levels. However, recent motor signals become increasingly irrelevant when trying to predict more abstract, longer term events. For example, getting sacked from your job is less likely due to the way you just now sipped your coffee, and more likely to do with some events that happened days or weeks ago. These older events will be hierarchically represented as more abstract causes.

At higher hierarchy levels, with greater abstraction, the "motor actions" that are necessary to explain & predict events are not simple muscle contractions, but complex sequences of decisions and behaviour with specific intents and expectations. The predictive data encoded in the Feed-Forward Indirect and Feed-Back (FB) pathways contains this data in a form that is appropriate and meaningful at each level of the hierarchy. If predictions and decisions are synonymous, then we can treat selected predictions as if they were actions.

For these reasons we are skeptical about the idea that the use of motor actions to aid prediction is enough to distinguish the functionality of different cortical layers. However, in support of the idea, layer 4 does disappear in higher hierarchy levels where motor actions would be less relevant.

Propagation of uncertainty to higher regions

The way that uncertainty (as prediction failure) is propagated up the hierarchy is vital to being able to reliably assemble a useful hierarchical representation of FF input. In fact, we believe that unpredictable events should be propagated until they reach a level of abstraction and invariance where they become predictable (see the Newtonian world assumption in the previous post). Therefore, we believe TP output should be highly orthogonal to prior and subsequent output in the event of a prediction failure. In the case of an SDR, highly orthogonal means that many bits should have dissimilar activity (a small intersection between the sets of active bits before and after).

Only a fraction of synapsed input bits are needed to activate a cell, and therefore CLA features “noise-tolerant” recognition of FF input. Only a few output bits would be dissimilar between the outcomes of prediction success and failure. This seems to raise the risk that unpredictable events could be “hidden” by noise-tolerance, and not passed up the hierarchy for higher levels to solve. From the perspective of a higher level, the set of active cells has not been significantly affected by their failure to predict.

Some loss of uncertainty in propagation may be acceptable in a sufficiently complex system. These are toy examples with only a few cells, whereas real CLA regions have hundreds or thousands of cells. However, we are working through some simple examples to try to better understand the behaviour and limits of the CLA.

Another detail that is not fully described in Hawkins’ current TP proposal is how long cells should be active when predicted. Maximum stability is achieved when cells are active for long periods, but we are limited by the conflicting objective to not hide uncertainty. Should we truncate activity when other prediction failures occur? In the next article we will propose explicitly making TP cells active unless uncertainty is too high, thereby implementing an auto-tuning of activation period.

Representations of random sequences

There is one comment in Hawkins' proposal that we disagree with. He says: "One of the key requirements of temporal pooling is that we only want to do it when a sequence is being correctly predicted. For example, we don't want to form a stable representation of a sequence of random transitions." In fact, it may be necessary to build a framework of some random sequences, in order to build sufficiently complex representations to explain any of the simpler events. Although the random sequences may not be the right ones, we need to have a mechanism of assembling more complex hierarchical representations even when there is no incremental explanatory power in doing so (this was discussed in the previous article). This would mean looking for structure in randomness, on the assumption that it would eventually be worthwhile due to explanatory models in higher levels of the hierarchy.

Summary

To wrap up:
- Feed-Back or Feed-Forward indirect pathway data may be a more appropriate source of data than motor actions in the FF direct pathway, for predicting events based on internal causes at higher levels of abstraction

- Reliable propagation of uncertainty (prediction failure) up the hierarchy is critical to move unexplained events to a level of abstraction where they can be understood

- We would like to extend activity period for maximum stability, balanced against the desire to avoid hiding prediction errors. How this is done is not detailed

- It may be necessary to perform temporal pooling even when there are no predictable patterns, in order to construct higher-order representations that may be able to predict the simpler events.

The next and final article in our 3 part series will present some specific alternative temporal pooler ideas.

Tuesday, 27 May 2014

Thalamocortical architecture

by Gideon Kowadlo and David Rawlinson

Introduction

One of the keys to understanding the neocortex as a whole, and the emergence of intelligence, is to understand how the cortical hierarchical levels interconnect. This includes:
  • the physical connections,
  • the meaning of the signals being transmitted,
  • and possibly also the way the signal is encoded.
Physical connections: Physical connections refer to gross patterns of neuron routing throughout the brain. This is known as the connectome. Below is an image from the Human Connectome Project, that beautifully illustrates many connections including thalamocortical ones.
Figure 1. Courtesy of the Laboratory of Neuro Imaging and Martinos Center for Biomedical Imaging, Consortium of the Human Connectome Project - www.humanconnectomeproject.org

Meaning of signals: One classification that can be applied to thalamocortical neurons is drivers versus modulators. A driver can be thought of as a neuron that carries information, whereas a modulator modulates or alters the transmission of information in a driver. They have different functional and anatomical properties, as nicely described in (Sherman and Guillery 2011). If a neuron is a driver, what information does it encode, and if it is a modulator, is it inhibiting or excitatory and what effect does this have?

Signal Encoding: Signal encoding refers to the details of how the information is represented. This includes timing and amplitude information. The way the signal is encoded in the neurons may have a bearing on the properties of the system. Specific information has been added to the diagram where this looks relevant.

Our aim is to build AI with general intelligence characteristic of biological organisms such as primates. Therefore, we draw inspiration and insight from these working examples. Understanding the biology obviously gives us the best insight into how to do that. However, what level of abstraction do we need to capture the essential qualities?
  • at the lowest level: molecular structure, interactions and neurotransmitters,
  • above that, firing patterns and newly discovered molecular machinery (that excitingly shows this is more complex and interesting than previously thought - see paper and work by Seth Grant),
  • higher still, the brain as a set of modules that interact with each other,
  • or multi scale simulation of the whole brain (see the Human Brain Project).
For simplicity, we want to understand it at the highest level that is still capable of capturing the essential qualities, and drill down where necessary. Therefore, are factors such as the way that the signal is encoded important? Not in and of themselves, but they may have a bearing on emergent qualities, that are significant.

In order to understand the above, including drawing conclusions about the appropriate level of abstraction, we've elaborated on a figure first published in the CLA White Paper that was included in a previous post (in the section 'Regions'). In that article, we started to explore these topics in the context of Numenta's work. The figure shows the thalamo-cortical connections to specific cortical layers and is very useful for exploring the concepts described above. Here, we will expand on that figure, shown below. We will go over a first version, and we plan to make further posts in the future, as we develop it further. Each of the initial annotations are explained in the sub sections below.


Figure 2. Thalamocortical architecture including cortical layers and connections between hierarchy levels. This figure is an annotated version of a figure from the 'CLA White Paper'. Some information is added from the text of that document. Other sources used are Sherman and Guillery 2011Grossberg 2007, Sherman 2006 and Sherman 2007.


We invite the community to make use of and contribute to this annotated diagram. The diagram is publicly available in a universal vector graphics format called SVG. Being vector based, it is easily modifiable. SVG is a common format, which many graphics packages are capable of editing.

The file is available from a git repository hosted on github called cortico-thalamic-circuit. Anyone can download, clone, make a pull request or fork the repository.

Pull requests allow you to make modifications and then give them back to the shared repository so that they are available to everyone. This is the action to take if you share our purpose for the diagram - staying as high level as possible, filling in details where they contribute to a holistic view or emergent properties of the thalamocortical architecture. Forking allows you to create a new repository that diverges from the main one. Use this option if you’d like to use the diagram for a different purpose, such as documentation of all the neurotransmitters in the different pathways.

The first set of diagram additions are described below.

Diagram Additions

Cortico-Cortical Feedback

The illustrated feedback between levels from layer 6 in Level (n+1) to layer 1 in Level (n) is described briefly in the CLA white paper. We have included an additional illustration from Grossberg 2007 (see figure 3 below), that shows in more detail how internal neural circuitry completes the intra-cortical, inter-level, feedback loop from:

H[n+1]:C[6] → H[n]:d[1]C[5]
H[n]:d[1]C[5] → H[n]:C[6]
H[n]:C[6] → H[n]:C[4]

Note: The connections above are described in a notation we have adopted for succinctly describing cortical neural pathways. Refer to our post for more details.
Figure 3. Inter-level feedback loop, reproduced from Grossberg 2007. The circles and triangles are neuron bodies, with varying shape depicting different neuron types. Two hierarchy levels are shown (V1,V2 from the visual cortex). Each hierarchy level has 6 cortical layers (numbered 1 to 6 where relevant). You can see that feedback from V2 affects activation of neurons in V1 layer 4.
The feedforward/feedback architecture gives rise to at least three important qualities, the first of which has been explored in the MPF literature. They are described below, reproduced from Grossberg 2007:
  1. the developmental and learning processes whereby the cortex shapes its circuits to match environmental constraints in a stable way through time; 
  2. the binding process whereby cortex groups distributed data into coherent object representations that remain sensitive to analog properties of the environment; and 
  3. the attentional process whereby cortex selectively processes important events. 
We may elaborate with a summary of Grossberg 2007 in a future post.

Gating by the Thalamus

Our main references for this section are Sherman 2006 and Sherman 2007.

We've seen that the thalamus acts as a relay for information passing up the hierarchy between cortical levels, which we're referring to as the feedforward indirect pathway (FF Indirect). It has been postulated that via this gating, the thalamus plays an important role in attention.

What inputs and computations determine that gating? This is one of the questions we are attempting to learn more about, and so have explored inputs to the gating.

Cortical feedback

One of the significant inputs is FB from Layer 6 in the level above. That is to say that the gating from Level (n) to (n+1), is modulated by FB from Layer 6 in Level (n+1).

Thalamic feedback and TRN

There is a substructure of the Thalamus called the Thalamic Reticular Nucleus (TRN) that receives cortical and thalamic excitatory input, and sends inhibitory inputs to the relay cells of the thalamus.

These gating cells also receive inhibitory input from other Thalamic cells, labelled interneurons. Thalamic interneurons receive input from the very same relay cells, layer 6 of the cortex and the brainstem.

These circuits between TRN, BRF and thalamus are complex. They are simplified in the figure below, which appears in Sherman 2006 (Scholarpedia on the Thalamus), a version of which is found in Sherman 2007.

Figure 4. "Schematic diagram of circuitry for the lateral geniculate nucleus. The inputs to relay cells are shown along with the relevant neurotransmitters and postsynaptic receptors (ionotropic and metabotropic) Abbreviations: LGN, lateral geniculate nucleus; BRF, brainstem reticular formation; TRN, thalamic reticular nucleus." Caption and figure reproduced from Sherman 2006.

We are currently representing this complexity as a black box (as shown in the diagram) that receives input from the Thalamus, BRF and cortex, and inhibits the relay cells. The purpose and transfer function require analysis and exploration. It may be necessary to model the complexity explained above, or some simpler equivalent may provide the necessary functionality.

BRF

The BRF is the Brainstem Reticular Formation, which as the name suggests, is a part of the brainstem. It has a number of functions that could be very important for attention and general functioning of the cortex, and therefore, we have included it and it’s connections to the Thalamus. Some of these functions include:
  1. Somatic motor control
  2. Cardiovascular control
  3. Pain modulation
  4. Sleep and consciousness
  5. Habituation
The Wikipedia page for the BRF gives a very good summary.

Modulation Signal Characteristics

It is interesting to note that the firing mechanism for the BRF and Layer 6 modulation of the Thalamic relay is Burst Mode rather than the more common Tonic Mode. Tonic firing has a frequency that is proportional to the 'activation' of a neuron. The frequency can be interpreted as the "strength" of the signal. Some have interpreted it in the past as a probability or confidence value. For Burst Mode firing, after a 'silent' period, the initial firing pattern is a burst of activity. This "results in a very different message relayed to cortex, depending on the recent voltage history of the relay cell" (Sherman 2006). It is thought that this acts as a ‘wake up call’ to the cortex when there has been some external change. We plan to speculate and elaborate further on possible purposes of this in the future.

Timing Information

The CLA White Paper makes mention of timing information being fed back from the thalamus to layer 5 via layer 1. This has been added to the diagram for visibility. It is thought to be important for prediction of the next state at the appropriate time.

Other Factors

There are a number of other significant brain components that may substantially affect the operation of the neocortex. Based on the literature, the most significant of these is probably the Basal Ganglia, which forms circuits with the Thalamus and Cortex. Another interesting and possibly important component are Betz cells, which directly drive muscles from the cortex.

Conclusion

This post was a first attempt to create an enhanced diagram of cortical layers and thalamocortical connectivity in the context of MPF/HTM/CLA theory. We'll continue to elaborate on this in future posts.

Wednesday, 21 May 2014

Constraints on intelligence

by Gideon Kowadlo and David Rawlinson

Introduction

This article contains some musings on the factors that limit the increase of intelligence as a species.

We speculate that ultimately, our level of intelligence is limited by at least two factors, and possibly a third:
  1. our own cultural development, 
  2. physical constraints, and
  3. an intelligence threshold.
We’ll now explore each of these factors.

Cultural Development

Natural Selection

Most readers are familiar with Natural Selection. The best known and dominant mechanism is that fitter biological organisms in a population tend to survive longer, reproduce more frequently and successfully, and pass on their traits to the next generation. Given some form of external pressure and therefore competition, such as resource constraints, the species on average is likely to increase in fitness. In competition with other species, this is necessary for species survival.

Although this is the mechanism we are focusing on in this post, there are other important forms of selection. Two examples are ‘Group Selection’ and ‘Sexual Selection’. Group selection favours traits that benefit the group over the individual, such as altruism. Especially when the group shares common genes. Sexual selection favours traits that improve an individual’s success in reproducing by two means: being attractive to the other gender, and ability to compete with rivals of the same gender. Sometimes sexually appealing characteristics are highly costly or risky to individuals, for example by making them vulnerable to predators.

Culture

Another influence on ability to survive is culture. Humans have developed culture, and some form of culture is widely believed to exist in other species such as primates and birds (e.g. Science). Richard Dawkins introduced the concept of memes, cultural entities that evolve in a way that is analogous to genes. The word meme now conjures up funny pictures of cats (see Wired magazine’s article on the re-appropriation of the word meme), and no-one is complaining about that, but it's hard to argue that these make us fitter as a species. However, it's clear that cultural evolution, by way of technological progress, can have a significant influence. This could be negative, but is generally a positive, making us more likely to survive as a species.

Culture and Biology

A thought experiment regarding the effect on survival due to natural selection and cultural development, and due to their relationship with each other, is explored with a graph below.

Figure 1: A thought experiment: The shape of survivability vs time, due to cultural evolution, and due to natural selection. The total survivability is the sum of the two. Survivability due to natural selection plateaus when it is surpassed by survivability due to cultural evolution. Survivability due to cultural evolution plateaus when cultural development allows almost everyone in the population to survive.


For humans, the main biological factor contributing to survival is our intellect. The graph shows how our ability to survive steadily improves with time as we evolve naturally. The choice of linear growth is based on the fact that the ‘force’ for genetic change does not increase or decrease as that change occurs*. On the other hand, it is thought that cultural evolution improves our survivability exponentially. In recent years, this has been argued by well known authors and thinkers such as Ray Kurzweil and Eliezer S. Yudkowsky in the context of the Technological Singularity. We build on knowledge continuously, and leverage our technological advances. This enables us to make ever larger steps, as each generation exploits the work of the preceding generations. As Isaac Newton wrote, “If I have seen further it is by standing on the shoulders of giants” **. Many predict that this will result in the ability to create machines that surpass human intelligence. The point at which this occurs is known as the aforementioned Technological Singularity.

Cultural Development - Altruism

Additionally, cultural evolution could include the development of humanitarian and altruistic ideals and behaviour. An environment in which communities care for all their people, which would increase the survivability of (almost) everyone to the threshold of reproduction - leaving only a varied ability to prosper beyond survival. This is shown in the figure above as a plateau in survivability due to cultural evolution.

Cultural Development - Technology

Cultural factors dominate once survivability due to cultural evolution and technological development surpasses that due to natural selection. For example, the advantages given by use of a bow and arrow for hunting, will reduce the competitive advantage of becoming a faster runner. Having a supermarket at the end of your street will render faster running insignificant. The species would no longer evolve biologically through the same process of natural selection. Other forces may still cause biological evolution in extreme cases, such as resistance to new diseases, but this is unlikely to drive the majority of further change. This means that biological evolution of our species would stagnate***. This effect is shown in the graph with the plateau in survivability due to natural selection.

* On a fine scale, this would not be linear and would be affected by many many unpredictable factors such as climate changes, other environmental instability as well as successes/failures of other species.

** Although this metaphor was first recorded in the twelfth century and has been attributed to Bernard of Chartres.

*** Interestingly, removal of selective pressure does not allow species to rest at a given level of fitness. Deleterious mutations rapidly accumulate within the population, giving us a short window of opportunity to learn to control and improve our own genetic heritage.

Physical Constraints

One current perspective in neuroscience, and the basis for our work and this blog, is that much of our intelligence emerges from, very simply put, a hierarchical assembly of regions of identical computational units (cortical columns). As explained in previous posts (here and here), this is physically structured as a sheet of cortex, that form connections from region to region. The connecting regions are conceptually at different levels in the hierarchy. The connections themselves form the bulk of the cortex. We believe that with an increasingly deep hierarchy, the brain is able to represent increasingly abstract and general spatiotemporal concepts, which would play a significant role in increasing intelligence.

The reasoning above predicts that the number of neurons and connections is correlated with intelligence. These neurons and connections have mass and volume and require a blood supply. They cannot increase indefinitely.

Simply increasing the size of the skull has its drawbacks. Maintaining stable temperature becomes more difficult, and structural strength is sacrificed. The body would become disproportionately large to carry around extra mass, making the animal less mobile, coupled with the fact that there would be higher energy demands. Larger distances for neuronal connections leads to slower signal propagation which could also have negative impact. Evidence of the consequences of such physical constraints is found in the fact that the brain folds in on itself, appearing wrinkled, in order to maximise surface area (and hence the number of neurons and connections) in the given volume of the skull. Evolution has produced a tradeoff between these characteristics that limits our intelligence to promote survival.

It is possible to imagine completely different architectures that might circumvent these limitations. Perhaps a neural network distributed throughout the body, such as exists for some marine creatures. However, it is implausible that physical constraints would not ultimately be a limiting factor. Also, reality is more constrained than our imagination. For example, it must be physically and biologically possible for the organism to develop from a single cell to a neonate, and on to a reproducing adult.

An Intelligence Threshold

There could be a point at which the species crosses an intelligence threshold, beyond which higher intelligence does not confer a greater ability to survive. However, since the threshold may be dictated by cultural evolution it is very difficult to separate the two. For example, the threshold might be very low in an altruistic world, and it is possible to envision a hyper-competitive and adversarial culture in which the opposite is true.

But perhaps a threshold exists as a result of a fundamental quality of intelligence, completely independent of culture. Could it be, that once you can grasp concepts at a sufficient level of abstraction, and have the ability to externalise and record concepts with written symbols (thereby extending the hierarchy outside of the physical brain), that it would be possible to conduct any ‘thought’ computation, given enough working memory, concentration and time? Similarly, a Turing Machine is capable of carrying out any computation, given infinite memory.

The topic of consciousness and it’s definition is beyond the scope of this post. However, accepting that there appears to be a clear relationship between intelligence and what most people understand as consciousness, this ‘Intelligence Threshold’ has implications for consciousness itself. It is interesting to ponder the threshold as having a corresponding crossing point in terms of conscious experience.

We may explore the existence and nature of this potential threshold in greater detail in the future.

Impact of Artificial General Intelligence (AGI)

The biological limitations to intelligence discussed in this article show why Artificial General Intelligence (AGI) will be such a dramatic development. We still exist in a physical world (at least perceptibly), but building an agent out of silicon (or other materials in the future), will effectively free us from all of these constraints. It also allows us to modify parameters, architecture and monitor activity. It will be possible to invest large quantities of energy into ‘thinking’ in a mind that does not fatigue. Perhaps this is a key enabling technology on the path to the Singularity.

Friday, 16 May 2014

A notation for intracortical neural connections

by Gideon Kowadlo and David Rawlinson

In a previous post (section 'Regions'), we discussed the distinction between cortical layers and levels. The literature often describes connections between cortical layers in one level to cortical layers in another level. We are not aware of a written convention for this purpose and propose one here.

The basic form describes the direction of signal transmission from the location of the soma (cell body) to the efferent synapse (where the nerve connects to another nerve or motor unit), separated by an arrow. One statement should be used per neuron.

soma  →  efferent synapse

The location is specified by:

H[n]:C[i], where:

H[n] = hierarchy level n
C[i] = cortical layer i

If one neuron branches and terminates in two locations, they are comma separated. Where these are co-located in the same Hierarchy Level, the subsequent Level specifier is omitted for brevity.

e.g. Referring to the figure in the section 'Regions' in the aforementioned post, the FF direct pathway can be described as follows:

H[n]:C[5] → H[n+1]:C[4], C[2/3]
Dendrites are specified by prepending d[i]. Using the same example, it would be more complete as:

H[n]:d[2/3]C[5]→H[n+1]:C[4], C[2/3]

Importantly, this convention is easy to type in to any text editor. Arrows can be typed with standard keyboard characters hyphen and greater than symbol "->". Also, many word processors will convert an arrow with two hyphens to a special arrow character i.e. "-->".