Project AGI: February 2015

In the past year or so, there has been a spate of high-profile pronouncements by respected scientists and engineers, cautioning the world about the potential ill-effects of AI. Stephen Hawking, one of the foremost experts on theoretical and astro physics told the BBC that "The development of full artificial intelligence could spell the end of the human race" [1] and Elon Musk, serial entrepreneur and founder of Tesla Motors and SpaceX, told an audience at MIT, that “AI represented an existential threat to mankind [2].”

AI has been around since the 1950s and Hollywood movies like the Terminator have fired the public imagination about rogue AI, at least in the west, where the general public views AI with much suspicion. But the scientific community has always been very cautious about its ability to do anything beyond narrowly defined problem statements, like defeating a grandmaster at chess, as IBM’s Deep Blue did, in the 1990s. AI just a few years ago, was hopeless at tasks that a 6 year old could easily do, like identify people and objects in a family photograph, or communicate with her elderly grandfather, or answer abstract questions about what she wants to become when she grows up.

What is it about AI now, that is spurring these alarmist statements by respected scientists and engineers?

In the past couple of years, a type of learning algorithm called Deep Learning, has taken the world of AI in general, and Computer Vision and Natural Language Processing (NLP) in particular, by storm. Computer Vision and NLP are both sub-fields within AI; one deals with the science of making computers understand the world around them through vision, and the other deals with trying to make computers understand human speech.

Deep Learning is a learning algorithm that uses a many-layered (hence the word deep) Artificial Neural Network to learn representations of objects or faces (in images) or words, phrases and sentences (in speech) from many known examples, called training data. Given a new image, it is then able to find and label objects in that image, or understand the meaning of a spoken sentence.

Deep Learning techniques are excelling at pretty much every AI-related application that engineers are throwing at it, and incremental advances in the field that used to take years and decades to achieve, is now happening in months. An illustration of the rapid advances made in the field through Deep Learning can be made through the example of ImageNet. Computer Vision engineers test the competence of each others’ algorithms using large databases of images containing objects, whose positions and labels have been painstakingly labelled by humans, and the biggest of these is ImageNet [3]. ImageNet contains 16 million images and 20,000 object classes, and every year, Computer Vision algorithms are compared against each other based on whether or not they can detect the presence and location of one or more objects in each of those 16 million images.

The sort of labelling a computer vision algorithm is expected to produce, on an example image from ImageNet [4]

The Error Rate (a measure of the number of labels an algorithm gets wrong on these images, and consequently a measure of the competence of an algorithm) has dropped precipitously since the introduction of Deep Learning algorithms into the competition, from about 40% in 2010, to 6.7% in 2014, by Google’s Deep Learning algorithm, called GoogLeNet [5], named after Yann Le Cunn, one of the pioneers of Deep Learning research. This is a big deal in the field. The improvement per year, every year prior to that in the previous decade on ImageNet and similar databases had been about 1-2%. Engineers were using what were known as hand-engineered features and making very small, incremental advances. Image features are statistics of how objects and parts thereof look like on a pixel level and are supposed to be invariant to changes in viewpoint, lighting, etc. and engineers had been manually designing them for the better part of a decade.

A Convolutional Neural Network or CNN (a particular kind of Neural Network where the first few layers are Convolutional or averaging layers) learns to identify a face using a series of representations whose levels of abstraction, from edge-like segments - to face parts, ears, noses, mouths, etc. - to full faces increase through the layers of the network [6]

Suddenly in 2012, Geoff Hinton and his team from the University of Toronto, used a Convolutional Neural Network or CNN (a particular kind of Artificial Neural Network that also uses Deep Learning techniques) that learnt its own features, and doubled the performance of the 2nd best team that used conventional, hand-engineered features [7]. By 2014, almost every competing algorithm in the competition was a CNN-Deep Learning based algorithm, and the latest error measures of about 6-7% are closing in on the human error rate of identifying objects in these images, of 5% [8].

The error rate on Image Net - over the years [9]

Not only can Deep Learning algorithms determine the type and location of objects in a scene, but they have also been shown to develop a relationship between those objects, i.e. a semantic understanding of the scene [10].

Semantic scene understanding using Deep Learning [11]

These algorithms are remarkably versatile. Instead of designing specialist algorithms, one for each class of AI problem, as had been done throughout the 90s and the first decade of this century, engineers are using the same Deep Learning algorithm for other problems like medical diagnosis [12, 13] and Natural Language Processing (NLP), alongside Computer Vision. Speech recognition using this technique has shown similar improvements (of about 30% in the error rate), and an astonishing demo of Microsoft’s Chief Research Officer speaking to a Chinese audience in English, while his speech is instantaneously machine-translated into a Chinese voice is available on Youtube [14].

Some of the factors driving this exponential improvement in the capabilities of Deep Learning algorithms now (Neural Networks on which they are based, have been around since the early 80s) are:

more data to learn from (billions of images and video, some labelled, but mostly unlabelled from the internet)
unsupervised learning - previously, image recognition algorithms learnt from a large amount of labelled data, but versions of the Deep Learning algorithm (auto-encoders [15] and Restricted Boltzmann Machines or RBMs [16]) figure out the labels from the data itself - the Google “cat classifier” [17] for example, learnt what a cat was by looking at millions of youtube videos. It wasn’t told which videos had cats in them. Labelling data is a slow, error-prone and painstaking process, so an ability to learn from unlabelled data is critical.
significantly more computing power available today (compared to the 80s and 90s, when Neural Networks were previously in vogue), is able to train networks with many more (deep) layers, and the fact that unlike a regular Neural Network, all the neurons in one layer of a CNN (and other modern Neural Network topologies) are not connected to all the neurons in the next layer, which makes training them faster and easier. Clusters with tens of thousands of processors are now being used for Deep Learning - Google’s cat classifier had 1 billion neuronal connections.

http://m.eet.com/media/1160839/googlecatface315.jpg

An “average” cat image, learnt by Google’s cat classifier [18]

The exponential growth in Computing Power for 110 years [19]

The ability to harness Graphics Processor Units or GPUs, whose growth has been accelerated by the computer gaming industry, is another recent advantage. A GPU is traditionally used by a computer to render images and has only recently been co-opted for parallelizing Deep Learning algorithms. GPUs, with thousands of processor cores (as opposed to a handful in the CPU), can process in parallel the thousands of pixels or pixel-groups in an image - a CPU would have to do that sequentially - drastically speeding up the learning process. The Google learning system required 1000 computers and cost 1 million dollars to build, whereas a similar system was recently built for 20,000 dollars, using GPUs on just 16 computers [20].

GPU growth (in terms of the number of floating point operations per second - FLOPS - has surpassed CPU growth in the last decade [21]

Quantum Computing, another field that is slowly taking off, is another reason why this exponential growth in the performance of these algorithms is set to continue. Regular computers store information in the form of binary bits, which can be either 0 or 1. Quantum computers, on the other hand, take advantage of the strange laws of Quantum Physics, and store information in Qubits, which can be both 1 and 0 at the same time. This is especially useful in solving one specific type of problem, called optimization. Most learning algorithms, including Deep Learning algorithms need to perform optimization, the task of finding the minimum in a “landscape” whose peaks and troughs represents a cost function that describes the learning problem, which could be learning how to recognize a particular object in an image, or a word in an audio recording. Typically, this landscape is big, and multi-dimensional, and it is not possible to search each part of the landscape exhaustively in a finite amount of time. Instead, algorithms explore a few candidate hills and “walk” down to the valleys closest to them, and the lowest point amongst these valleys is taken to be an acceptable compromise.

A classical computer with a sequential algorithm can explore only one part of this optimization landscape at a time. Because there is no time to exhaustively search the entire landscape, algorithms using classical computers, using the “compromise solution” often get stuck in a local minimum, a local valley, that might not be the absolute lowest point on the landscape. Thanks to superposition, the possibility of the qubits being both 1 and 0 at the same time, an algorithm running on a quantum computer is able to explore multiple points in the optimization landscape simultaneously and find the real minimum much more quickly than a classical computer.

A quantum computer from DWave [22], reminiscent of the black “monolith” from 2001, A Space Odyssey

It is this exponential advance in the capability of AI, likely to be sustained in the coming years by GPUs and quantum computing, that has eminent scientists worried. However, long before some of the humanity-ending, doomsday scenarios that are being bandied about have any chance of becoming a possibility (scientists disagree on the amount of time it will take for AI to become self-aware and surpass human intelligence - estimates vary from decades to centuries), there is a much more immediate problem. This is one of job-losses and the subsequent social upheaval, brought about by AI algorithms and machines that are very good at doing some of the jobs done by humans today.

The most common jobs in the U.S. in 2014 - truck driving is the most common job in a majority of states [23]

Like driving. Google’s autonomous cars have been zipping around the laneways and highways of the Bay Area for some time now, and Google is now looking to team up with major players in the auto-industry to bring their autonomous cars to the market within the next 5 years [24]. Uber, the taxi-company that pioneered real-time ride-sharing - users use their GPS-equipped smart-phones to hail cabs and share rides, has just announced a collaboration with Carnegie Mellon University to bring autonomous taxis to the market [25]. You would use your app to hail (and possibly share) a ride on an autonomous taxi, just like you do today, on human-driven Uber taxis.

Self-driving cars, driven by never-tiring robots with eyes in the back of their heads (actually the cars ARE the robots - equipped with sensors and AI software) have the potential to completely disrupt the taxi and trucking industries (truck driving is the most common job in the U.S. today - see above figure). Highway driving is easier than city driving (in spite of the higher speeds, highways are constrained environments and there are fewer things you’ve got to look out for) and autonomous driving on highways was in fact demonstrated much earlier, in the mid 90s [26], and is likely to get over the legal hurdles quicker.

The services sector (blue), that occupies a major chunk of the employment pie in developed economies is most at risk from AI [27]

Drivers are part of the services sector, and it is their jobs, along with others in the service industry, that occupies a disproportionately large slice (80%) of the employment pie in advanced economies (dark blue in the above figure). It is these jobs that AI is currently getting better than humans at, and where there is the biggest potential for disruption [28].

This upcoming upheaval in the labour market is different from the changes in productivity (and resulting fall in employment) during the industrial revolution in the 18th and 19th centuries. Change during that revolution was more gradual, allowing workers 50 years or more to transition from hard, manual labour to less labour-intensive jobs in mills and factories, whereas current trends indicate an exponentially increasing capability for AI to take over the services jobs - jobs that form the backbone of western economies - within the next 10 years.

Employment trends in the US since 1975, divided between routine and non-routine jobs [29].

Even before these latest advances in AI, the job market in advanced economies since the 1970s has been increasingly polarized, with job growths at two ends of the spectrum - highly skilled, non-routine jobs for engineers, managers and medical practitioners and low skilled, non-routine jobs that require physical activity, like waiters, mechanics and security guards. This is at the expense of jobs that are routine and repetitive, involving following a pre-defined set of instructions - a lot of assembly-line work for example, has either been replaced by automation, or moved to lower income countries.

Economics researchers at the New York Federal Reserve have broken down the job market into the following matrix, with routine and non-routine jobs that require cognitive vs manual skills.

Job matrix: the employment market broken down into routine and non-routine jobs that require cognitive vs manual skills [30]

They’ve analyzed data that demonstrates that non-routine jobs which require cognitive skills have been increasing compared to non-routine jobs that require manual skills, and routine jobs that require cognitive and manual skills have both been decreasing.

I would further divide routine, manual jobs into ones that involve dextrous manipulation and some skill, like that of a technician or a mechanic, vs the manual jobs that don’t require these things - picking and placing and sorting of parts on an assembly line in a factory. With robots yet to achieve the levels of dexterity of humans, the latter class of job is disappearing much sooner than the former.

Now, let’s look at the top 10 occupations in Australia (a typical advanced Western economy, and a country I’m familiar with, given half a lifetime spent there), sorted according to the number of people employed in decreasing order [31], and classify them according to the above matrix.

Sales Assistants (General): routine, cognitive

Registered Nurses: non-routine, cognitive and manual

Retail Managers: non-routine, cognitive

General Clerks: routine, cognitive

Truck Drivers: routine, cognitive and manual

Receptionists: routine, cognitive

Accountants: routine, cognitive

Commercial Cleaners: non-routine, manual

Primary School Teachers: non-routine, cognitive

Accounting Clerks: routine, cognitive

At risk from AI are jobs for Sales Assistants, General Clerks, Truck Drivers, Receptionists, Accountants and Accounting Clerks - that’s 1.4 million out of 2.2 million jobs in the top-10 list. In other words, more than half of Australia’s most popular jobs are at risk of disappearing over the next decade, and this figure is roughly the same for other developed economies as well.

What will these people do, and how will they earn a livelihood? Thankfully, it is not all bad news for employment in the developed world, and there are a number of ways in which AI is also likely to generate new employment, and the industry where this is most likely to happen is healthcare.

High-level architecture of Deep QA used in IBM’s Watson [32]

As populations in industrialized economies age, a far greater share of their GDP will be employed in ensuring the well-being of their people (The U.S. spent 18% of its GDP on healthcare in 2012 [33]). However, the numbers of doctors and physicians either in practice today or in training for the future are nowhere near the numbers required to take care of an ageing population. The solution could be to use AI systems, like IBM’s Watson, to help medical assistants take over some of the primary health-care duties of physicians.

Watson, a Deep QA (Question and Answer) system named after the company’s founder, can understand a natural language query (a question you would pose to a person, as opposed to one tailored to find answers on a search engine), search very quickly through a vast database and give you a response, also in a natural language format. Watson shot to fame in 2011, when it defeated two former winners of the TV game show Jeopardy, where participants are quizzed on a variety of topics with questions that are “posed in every nuance of natural language, including puns, synonyms and homonyms, slang and jargon” [34]. Watson is designed to scan documents from a vast corpus of knowledge (medical journals for example) very quickly - currently at the rate of 60 million documents per second - and come up with a range of hypotheses for a question, which are then ranked based on confidence and merged to give a final answer, also with a confidence level associated with it.

Given a set of symptoms, a system like Watson would be able to list out the top diagnoses, ranked in terms of confidence. These diagnoses relying on information gleaned from a far larger database that a human mind can ever learn in a life-time, and improving all the time (Watson, like Deep Learning systems, is a continually learning system that improves with feedback), would be free from something called the anchoring bias, a human tendency to rely on a limited number of pieces of evidence that account for the symptoms and discount everything else [35]. It will enable an army of physician-assistants, nurses and nurse-practitioners to administer primary health care, and free up physicians and doctors to do other things that require a higher level of skill.

So, the large numbers of middle class job losses in the services industry could potentially be replaced with jobs in healthcare.

The Baxter manufacturing robot [36]

AI and robotics is also, ironically likely to create jobs in manufacturing, after previously having taken away some of those manufacturing jobs. The previous generation of robots that replaced humans, in automobile manufacturing for example, were huge and unwieldy, and unsafe to be around - they had to be installed in cages to prevent injury to their human co-workers. Requiring experts to program, they were efficient in one specific type of task, like spot-welding or spray-painting, but once programmed, were not able to deviate from their pre-programmed routines. A new generation of manufacturing robots like Rethink Robotics’ Baxter is smarter, smaller, and safer to be around, because it can sense and interact with humans around it. Baxter can be trained interactively by someone who does not need to be a robotics engineer - the robot is hand-guided to perform a series of movements in response to sensor inputs - and can be quickly reconfigured to perform different sets of tasks.

Repetitive assembly line work like putting together the components that go into making an iPad, or stitching together fabrics to make a garment, work that has largely gone to the developing world in the past 30 years, is likely to come back to the West, thanks to robots like Baxter. Small to medium enterprises, with a smaller, but smarter workforce, armed with their robotic assistants, will be able to turn out custom-made consumer electronics and compete with devices mass-produced in China.

AI is also likely to create new jobs that we are unable to predict today, just as the rise of Information Technology gave rise to software developers and data analysts, jobs that didn’t exist just 30 years ago.

Over the next decade, it is very likely that AI will result in an unprecedented churn in the job market, and potentially threaten the livelihoods of hundreds of millions of people in the developed world. Protectionism or government regulation against AI will only delay the inevitable, because ultimately, the economic gains to be had from AI will outweigh the negatives.

The increased productivity and revenue from increasing levels of automation brought about by AI in developed economies could be used to provide Basic Income, a guaranteed minimum income to all citizens. Such an income would help people weather the storm of unemployment and would have the added benefit of allowing people to pursue activities out of interest, and not out of economic necessity.

However, governments and policy makers need to be informed and cognizant of the disruptive power of AI and craft policy to help their people deal with it. Otherwise, they will be left confronting a situation of mass-scale unemployment and social unrest that will make the Global Financial Crisis look like a walk in the park.