Project AGI: March 2016

A Minecraft API is now available to train your AGIs

Our News

We are working hard on experiments, and software to run experiments. So this week there is no normal blog post. Instead, here’s an eclectic mix of links we’ve noticed recently.

First, AlphaGo continues to make headlines. Of interest to Project AGI is Yann LeCun agreeing with us that unsupervised hierarchical modelling is an essential step in building intelligence with humanlike qualities [1]. We also note this IEEE Spectrum post by Jean-Christophe Baillie [2] which argues, as we did [3], that we need to start creating embodied agents.

Minecraft

Speaking of which, the BBC reports that the Minecraft team are preparing an API for machine learning researchers to test their algorithms in the famous game [4]. The Minecraft team also stress the value of embodied agents and the depth of gameplay and graphics. It sounds like Minecraft could be a crucial testbed for an AGI. We’re always on the lookout for test problems like these.

Of course, to play Minecraft well you need to balance local activities - building, mining etc. - with exploration. Another frontier, beyond AlphaGo, is exploration. Monte-Carlo Tree Search (as used in AlphaGo) explores in more limited ways than humans do, argues John Langford [5].

Sharing places with robots

If robots are going to be embodied, we need to make some changes. Wired magazine says that a few small changes to the urban environment and driver behaviour will make the rollout of autonomous vehicles easier [6]. It’s important to meet the machines halfway, for the benefit of all.

This excellent paper on robotic grasping also caught our attention [7]. A key challenge in this area is adaptability to slightly varying circumstances, such as variations in the objects being grasped and their pose relative the the arm. General solutions to these problems will suddenly make robots far more flexible and applicable to a greater range of tasks.

Hierarchical Quilted Self-Organizing Maps & Distributed Representations

Last week I also rediscovered this older paper on Hierarchical-Quilted Self-Organizing Maps (HQSOMs) [8].This is close to our hearts because we originally believed this type of representation was the right approach for AGI. With the success of Deep Convolutional Networks (DCNs) it’s worth looking back and noticing the similarities between the two. While HQSOM is purely unsupervised learning, (a plus, see comment from Yann LeCun above) DCNs are trained by supervised techniques. However, both methods use small, overlapping, independent units - analogous to biological cortical columns - to classify different patches of the input. The overlapping and independent classifiers lead to robust and distributed representations, which is probably the reason these methods work so well.

Distributed representation is one of the key features of Hawkins’ Hierarchical Temporal Memory (HTM). Fergal Byrne has recently published an updated description of the HTM algorithm [9] for those interested.

We at Project AGI believe that a grid-like “region” of columns employing a “Winner-Take-All” policy [10], with overlapping input receptive fields, can produce a distributed representation. Different regions are then connected together into a tree-like structure (acyclic). The result is a hierarchy. Not only does this resemble the state-of-the-art methods of DCNs, but there’s a lot of biological evidence for this type of representation too. This paper by Rinkus [11] describes columnar features arranged into a hierarchy, with winner-take-all behaviour implemented via local inhibition.

Rinkus says: “Saying only that a group of L2/3 units forms a WTA CM places no a priori constraints on what their tuning functions or receptive fields should look like. This is what gives that functionality a chance of being truly generic, i.e., of applying across all areas and species, regardless of the observed tuning profiles of closely neighboring units.”

Reinforcement Learning

But unsupervised learning can’t be the only form of learning. We also need to consider consequences, and so we need reinforcement learning to take account of these. As Yann said, the “cherry on the cake” (this is probably understating the difficulty of the RL component, but right now it seems easier than creating representations).

Shakir’s Machine Learning blog has a great post exploring the biology of reinforcement learning [12] within the brain. This is a good overview of the topic and useful for ML researchers wanting to access this area.

But regular readers of this blog will remember that we’re obsessed with unfolding or inverting abstract plans into concrete actions. We found a great paper by Manita et al [13] that shows biological evidence for the translation and propagation of an abstract concept into sensory and motor areas, where it can assist with perception. This is the hierarchy in action.

Long-Short-Term Memory (LSTM)

One more tack before we finish. Thanks to Jay for this link to NVIDIA’s description of LSTMs [14], an architecture for recurrent neural networks (i.e. the state can depend on the previous state of the cells). It’s a good introduction, but we’re still fans of Monner’s Generalized LSTM [15].

Fun thoughts

Now let’s end with something fun. Wired magazine again, describing watching AlphaGo as our first taste of a superhuman intelligence [16]. Although this is a “narrow” intelligence, not a general one, it has qualities beyond anything we’ve experienced in this domain before. What’s more, watching these machines can make us humans better, without any nasty bio-engineering:

“But as hard as it was for Fan Hui to lose back in October and have the loss reported across the globe—and as hard as it has been to watch Lee Sedol’s struggles—his primary emotion isn’t sadness. As he played match after match with AlphaGo over the past five months, he watched the machine improve. But he also watched himself improve. The experience has, quite literally, changed the way he views the game. When he first played the Google machine, he was ranked 633rd in the world. Now, he is up into the 300s. In the months since October, AlphaGo has taught him, a human, to be a better player. He sees things he didn’t see before. And that makes him happy. “So beautiful,” he says. “So beautiful.”

References

[1] https://www.facebook.com/yann.lecun/posts/10153426023477143

[2] http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/why-alphago-is-not-ai

[3] http://blog.agi.io/2016/03/what-after-alphago.html

[4] http://www.bbc.com/news/technology-35778288

[5] http://cacm.acm.org/blogs/blog-cacm/199663-alphago-is-not-the-solution-to-ai/fulltext

[6] http://www.wired.com/2016/03/self-driving-cars-wont-work-change-roads-attitudes/

[7] http://arxiv.org/pdf/1603.02199v1.pdf

[8] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.1401&rep=rep1&type=pdf

[9] http://arxiv.org/pdf/1509.08255v2.pdf

[10] https://en.wikipedia.org/wiki/Winner-take-all_(computing)

[11] http://journal.frontiersin.org/article/10.3389/fnana.2010.00017/full

[12] http://blog.shakirm.com/2016/02/learning-in-brains-and-machines-1/

[13] https://www.researchgate.net/profile/Masanori_Murayama/publication/277144323_A_Top-Down_Cortical_Circuit_for_Accurate_Sensory_Perception/links/556839e008aec22683011a30.pdf

[14] https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-sequence-learning/

[15] http://www.overcomplete.net/papers/nn2012.pdf

[16] http://www.wired.com/2016/03/sadness-beauty-watching-googles-ai-play-go/

What's AlphaGo?

AlphaGo is a system that can play Go at least as well as the best humans. Go was widely cited as the hardest (and only remaining) game at which humans could beat machines, so this is a big deal. AlphaGo has just defeated a top-ranked human expert.

AlphaGo Nature paper (Silver et al 2016)

Why is Go hard?

Go is hard because the search-space of possible moves is so large that tree search and pruning techniques, such as those used to beat humans at Chess, won't work - or at least, they won't work well enough, with a feasible amount of memory, to play Go better than the best humans.

Instead, to play Go well, you need to have "intuition" rather than brute search power: To look at the board and spot local (or gross) patterns that represent opportunities or dangers. And in fact, AlphaGo is able to play in this way. It beat the next best computer algorithm "Pachi" 85% of the time without any tree search - just predicting the best action based on its interpretation of the current state. The authors of the AlphaGo Nature paper say:

“During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in its chess match against Kasparov; compensating by selecting those positions more intelligently, using the policy network, and evaluating them more precisely, using the value network—an approach that is perhaps closer to how humans play.”

How does AlphaGo work?

AlphaGo is trained by both supervised and reinforcement learning. Supervised learning feedback comes from recordings of moves in expert games. However, these are finite in size and used naively, would lead to overfitting.

Instead, in AlphaGo a Supervised Learning deep neural network learns to model and predict expert behaviour in the recorded games, via conventional deep learning techniques. Then, a reinforcement learning network is used to generate reward data for novel games that AlphaGo plays against itself! This mitigates the limited size of the supervised learning dataset.

Of course, AlphaGo also wants the play better than the best play observed in the training data. To achieve this, the reinforcement learning network is further trained by playing pairs of them (networks) against each other - mixing the pairs up to prevent policies overfitting each other. This is a really clever feature because it allows AlphaGo to go beyond its training data.

Note also that the neural networks cannot possibly fully represent a sufficiently deep tree of board outcomes within their limited set of weights. Instead, the network has to learn to represent good and bad situations with limited resources. It has to form its own representation of the most salient features, during training.

The neural networks function without pre-defined rules specific to Go; instead they have learned from training data collected from many thousands of human and simulated games.

Key advances

AlphaGo is an important advance because it is able to make good judgments about play situations based on a lossy interpretation in a finitely-sized deep neural network.

What’s more, Go wasn’t simply taught to copy human experts - it went further, and improved, by playing against itself.

So, what doesn't it do?

The techniques used in deep neural networks have recently been scaled to work effectively on a wide range of problems. In some subject areas, narrow AIs are reaching superhuman performance. However, it is not clear that these techniques will scale indefinitely. Problems such as vanishing gradients have been pushed back, but not necessarily eliminated.

Much greater scale is needed to get intelligent agents into the real world without them being immediately smashed by cars or stuck in holes. But already, it is time to consider what features or characteristics constitute an artificial general intelligence (AGI), beyond raw intelligence (which AIs now have).

AlphaGo isn't a general intelligence; it's designed specifically to play Go. Sure, it's trained rather than programmed manually, but it was designed for this purpose. The same techniques are likely to generalize to many other problems, but they'll need to be applied thoughtfully and retrained.

AlphaGo isn't an Agent. It doesn't have any sense of self, or intent, and its behaviour is pretty static - its policies would probably work the same way in all similar situations, learning only very slowly. You could say that it doesn't have moods, or other transient biases. Maybe this is a good thing! But this also limits its ability to respond to dynamic situations.

AlphaGo doesn't have any desire to explore, to seek novelty or to try different things. AlphaGo couldn't ever choose to teach itself to play Go because it found it interesting. On the other hand, AlphaGo did teach itself to play Go…

All in all, it's a very exciting time to study artificial intelligence!

by David Rawlinson & Gideon Kowadlo

Project AGI

Building an Artificial General Intelligence

Wednesday, 23 March 2016

Reading list: Assorted AGI links. March 2016