Project AGI: What's after AlphaGo?

What's AlphaGo?

AlphaGo is a system that can play Go at least as well as the best humans. Go was widely cited as the hardest (and only remaining) game at which humans could beat machines, so this is a big deal. AlphaGo has just defeated a top-ranked human expert.

AlphaGo Nature paper (Silver et al 2016)

Why is Go hard?

Go is hard because the search-space of possible moves is so large that tree search and pruning techniques, such as those used to beat humans at Chess, won't work - or at least, they won't work well enough, with a feasible amount of memory, to play Go better than the best humans.

Instead, to play Go well, you need to have "intuition" rather than brute search power: To look at the board and spot local (or gross) patterns that represent opportunities or dangers. And in fact, AlphaGo is able to play in this way. It beat the next best computer algorithm "Pachi" 85% of the time without any tree search - just predicting the best action based on its interpretation of the current state. The authors of the AlphaGo Nature paper say:

“During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in its chess match against Kasparov; compensating by selecting those positions more intelligently, using the policy network, and evaluating them more precisely, using the value network—an approach that is perhaps closer to how humans play.”

How does AlphaGo work?

AlphaGo is trained by both supervised and reinforcement learning. Supervised learning feedback comes from recordings of moves in expert games. However, these are finite in size and used naively, would lead to overfitting.

Instead, in AlphaGo a Supervised Learning deep neural network learns to model and predict expert behaviour in the recorded games, via conventional deep learning techniques. Then, a reinforcement learning network is used to generate reward data for novel games that AlphaGo plays against itself! This mitigates the limited size of the supervised learning dataset.

Of course, AlphaGo also wants the play better than the best play observed in the training data. To achieve this, the reinforcement learning network is further trained by playing pairs of them (networks) against each other - mixing the pairs up to prevent policies overfitting each other. This is a really clever feature because it allows AlphaGo to go beyond its training data.

Note also that the neural networks cannot possibly fully represent a sufficiently deep tree of board outcomes within their limited set of weights. Instead, the network has to learn to represent good and bad situations with limited resources. It has to form its own representation of the most salient features, during training.

The neural networks function without pre-defined rules specific to Go; instead they have learned from training data collected from many thousands of human and simulated games.

Key advances

AlphaGo is an important advance because it is able to make good judgments about play situations based on a lossy interpretation in a finitely-sized deep neural network.

What’s more, Go wasn’t simply taught to copy human experts - it went further, and improved, by playing against itself.

So, what doesn't it do?

The techniques used in deep neural networks have recently been scaled to work effectively on a wide range of problems. In some subject areas, narrow AIs are reaching superhuman performance. However, it is not clear that these techniques will scale indefinitely. Problems such as vanishing gradients have been pushed back, but not necessarily eliminated.

Much greater scale is needed to get intelligent agents into the real world without them being immediately smashed by cars or stuck in holes. But already, it is time to consider what features or characteristics constitute an artificial general intelligence (AGI), beyond raw intelligence (which AIs now have).

AlphaGo isn't a general intelligence; it's designed specifically to play Go. Sure, it's trained rather than programmed manually, but it was designed for this purpose. The same techniques are likely to generalize to many other problems, but they'll need to be applied thoughtfully and retrained.

AlphaGo isn't an Agent. It doesn't have any sense of self, or intent, and its behaviour is pretty static - its policies would probably work the same way in all similar situations, learning only very slowly. You could say that it doesn't have moods, or other transient biases. Maybe this is a good thing! But this also limits its ability to respond to dynamic situations.

AlphaGo doesn't have any desire to explore, to seek novelty or to try different things. AlphaGo couldn't ever choose to teach itself to play Go because it found it interesting. On the other hand, AlphaGo did teach itself to play Go…

All in all, it's a very exciting time to study artificial intelligence!

by David Rawlinson & Gideon Kowadlo

1 comment :

rob_freeman25 April 2016 at 03:51
Go is in many ways a very inhuman task. A Go board has a lot of detail, and those details can develop in unpredictable ways.

Humans do poorly at details. So the computer's advantage on detail may still be what is carrying the day for it. It is doing better than brute force, but it might still be generalizing very poorly, and winning because it can remember more, and calculate detail further ahead. We still don't know if it is generalizing the same way, or as well, as humans, because it hasn't been tested in a domain where humans' known limitations of memory and detail matter less.

Personally I think even deep hierarchies, trained top down, and so globally optimized, will prove inadequate at generalization necessary to get some key features we associate with intelligence, our perceptual world, and especially consciousness.

Project AGI

Building an Artificial General Intelligence

Thursday, 10 March 2016

What's after AlphaGo?