The algorithm that we are using today was created in the ’90s and has created a lot of hype in the media,” says Hijazi. “But this all stems from one algorithm and its ability to solve one interesting problem — computer vision. The hype about extrapolating this capability has created a lot of enthusiasm, and the media loves the original premise of AI from the ’50s that may be coming to roost. AI did not make a significant leap. One algorithm was developed that enabled one advance.
AI may well be able to help us optimize what we have, but that is not the Singularity. Engineers, it would seem, see the future in a more rational manner.
It is, at a minimum, the shiny new hammer that everyone has in hand. They look around for nails to hit with it, but they are also banging away on screws and everything else.
Hundreds of new learning algorithms are invented every year, but they’re all based on the same few basic ideas. Far from esoteric, and quite aside even from their use in computers, they are answers to questions that matter to all of us: How do we learn? Is there a better way? What can we predict? Can we trust what we’ve learned? Rival schools of thought within machine learning have very different answers to these questions. The main ones are five in number:
- Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic.
- Connectionists reverse engineer the brain and are inspired by neuroscience and physics.
- Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology.
- Bayesians believe learning is a form of probabilistic inference and have their roots in statistics.
- Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.
There are three types of learning
Imagine an organism or machine which experiences a series of sensory inputs:
- Supervised Learning: The machine is also given desired outputs , and its goal is to learn to produce the correct output given a new input.
- Unsupervised Learning: The goals of the machine is to build a model of that can be used for reasoning, decision making, predicting things, communicating, etc.
- Reinforcement Learning: The machine can also produce actions , which affect the state of the world, and receives rewards (or punishments) . Its goal is to learn to act in a way that maximizes rewards in the long term.
- Probabilistic Machine Learning
- Excllent textbooks by Kevin P. Murphy; Chris Bishop; David MacKay; Hastie, Tibshirani, and Friedman
The Multiplier and the Singularity
AI makes interesting reading, but physics will limit just how far it can go and how quickly.
In 1993, Vernor Vinge, a computer scientist and science fiction writer, first described an event called the Singularity—the point when machine intelligence matches and then surpasses human intelligence. And since then, top scientists, engineers and futurists have been asking just how far away we are from that event.
In 2006, Ray Kurzweil published a book, “The Singularity is Near,” in which he extended the hypothesis that artificial intelligence (AI) would enter a ‘runaway reaction’ of self-improvement cycles. He suggested that with each new and more intelligent generation appearing more and more rapidly, that it would cause an intelligence explosion resulting in a powerful superintelligence that qualitatively surpasses all human intelligence. Various dates have been assigned to when this would happen with the current consensus being 2040, but even Elon Musk fears that in five years “AI will have become dangerous.”
But it’s not clear that AI, and the march toward the Singularity are even close to reality. In five years, we may have one more technology node under our belt, meaning that we can expect twice the number of transistors that we have today. But while power per transistor may drop a little, heat will continue to limit what can be done with those chips. Many chips today cannot use all of the compute power at the same time due to thermal limitations.
If we rewind the clock a few of decades we can trace what got us to this point.
The heart of computing
At the heart of every advance has been an advance associated with the multiply operation, along with the ability to move data into and out of those multipliers and to have an element of programmability associated with them.
“The multiply is the most noticeable arithmetic operation, and plays a central role in the computation of many essential functions — filters, convolutions, transforms, weightings,” says Chris Rowen, CEO of Cognite Ventures. However, Rowen always warns against ignoring the other aspects mentioned.
The first major advance was wireless communications and the rise of the Digital Signal Processor (DSP). It provided single-cycle multiply operations, which until then only had been available in fixed-function hardware. “Wireless communications used to be seen as the epitome of hard compute problems,” says Samer Hijazi, senior architect in the IP Group of Cadence. “It has been and continues to be one of the hardest compute problems. It is an NP-complete (nondeterministic polynomial-complete) problem. The DSP gave you a wide array of multipliers, specifically an array of fixed-point multipliers. How many bits can you trust and use? As people learn more about what is needed, the type of accuracy needed is evolving.”
As applications get more complex, they tend to use a rich variety of arithmetic. “The computation often uses a mix of bit precisions (8b, 16b, 32b, and sometimes odd bit-lengths) and a mix of data formats (integer, fixed point, floating point),” explains Rowen. “This means that an implementation typically needs sufficient flexibility to cover a mix of arithmetic operations, bit precisions and data formats — not just a single form of multiply — to handle the necessary computation without compromising accuracy, efficiency or programmer productivity too much.”
The birth of AI
Artificial intelligence always has been an element of Science Fiction and this, like many other things in the technology world, does have an impact on the course of development. “For AI, there is one algorithm that has made a big comeback and has enabled the whole industry to rise again,” says Hijazi. “It is an algorithm from the late ’90s called Convolutional Neural Networks (CNN).”
At the crux of it, convolution is just a 3D filter. “It performs a repeated filter that is applied to an entire scene,” explains Hijazi. “It is looking for a specific pattern that you are correlating with every location in the scene and trying to see if it exists. You are doing multiples of patterns at a time and you are doing it in layers. In the first layer, you are looking for some pattern and creating a pattern correlation map or a feature map and then running another correlation map on the first map produced, and so on. So, I am building a sequential pattern layers on top of each other. Each of them is limited in some field of view.”
Convolutional Neural Networks were first developed by Yann LeCun while director for the NYU Center for Data Science. He is currently director of AI research for Facebook. The first application was an attempt to recognize the zip codes on letters. “It did not become mainstream because they did not have the necessary compute power,” points out Hijazi. “It was only the availability of massive GPUs that it became possible to show the superiority of the algorithms over the ones that had been developed by the experts.”
But while the multiplier may be important, it just one piece of a system. “Even an extreme vision processor, built to sustain hundreds of multiplies per cycle for convolutional neural network inner loops, dedicates little more than 10% of the core silicon area to the multiply arrays themselves,” says Rowen. “The other area is allocated to operand registers, accumulators, instruction fetch and decode, other arithmetic operations, operand selection and distribution and memory interfaces.”
The modern-day graphics processing unit (GPU), which is being used a lot for implementation of CNNs, also has an extensive memory sub-system. “Another piece that is essential for graphics is the massive hierarchical memory sub-system where data is moving from one layer to another layer in order to enable smooth transitions of pixels on the screen,” says Hijazi. “This is essential for graphics but not as needed for AI tasks. It could live with a memory architecture that is less power hungry.”
Another solution being investigated by many is the Field Programmable Gate Array (FPGA). “FPGAs have many DSP slices and these are just an array of fixed point multipliers,” continues Hijazi. “Most of them are 24-bit multipliers, which is actually three or four times what is needed for the inference part of deep learning. Those DSP slices have to be coupled to the memory hierarchy that would be utilizing the FPGA fabric to move the data around. The power consumption of an FPGA may not be that much different from a GPU.”
Rowen provides another reason for favoring programmable solutions. “Very few applications are so simple and so slowly evolving that they can tolerate completely fixed-function implementations. Programmability may come in the form of FPGA look-up tables and routing, or in the form of processors and DSPs, but some degree of programmability is almost always required to keep a platform flexible enough to support a set of related applications, or even just a single application evolving over time.”
But those DSP slices in the GPU and FPGA may not be ideal for AI. “It may be possible that only 4-bit multiplication is necessary,” says Hijazi. “So the race to reduce the cost of the multiplier is at the core of how we can advance AI. The multiplier is expensive, and we need a lot of it. It limits the flexibility of this newfound capability.”
It would seem likely that chips dedicated to AI will be produced. “2017 will see a number of chips targeted at AI and several demonstrable technologies by year end,” predicts Jen Bernier, director of technology communications for Imagination Technologies. “As companies develop chips for AI, they need to consider the increased demands to process data locally and relay data to the cloud for onward processing and data aggregation.”
The reality today
So how close to the Singularity are we? “The algorithm that we are using today was created in the ’90s and has created a lot of hype in the media,” says Hijazi. “But this all stems from one algorithm and its ability to solve one interesting problem — computer vision. The hype about extrapolating this capability has created a lot of enthusiasm, and the media loves the original premise of AI from the ’50s that may be coming to roost. AI did not make a significant leap. One algorithm was developed that enabled one advance.”
People are finding ways to use that algorithm for other tasks, such as Google using it to play the game of Go. Another example is related to voice recognition. “Virtual assistants will be virtually everywhere,” says Bernier. “Voice recognition and interaction will be incorporated into an increasing number of devices and we’ll see new classes of hearable devices. The technology will continue to evolve for more and more interactivity.”
Other advances expected in this area are discussed in the Predictions for 2017.
But does any of this directly lead us to the singularity? Would an AI have been able to invent the algorithms or the hardware structures that got us to this point? AI may well be able to help us optimize what we have, but that is not the Singularity. Engineers, it would seem, see the future in a more rational manner.
What’s missing in Deep Learning?
Neural networking is the rising star in the world of computer science. How does the landscape currently shape up?
It is impossible today to be unaware of deep learning/machine learning/neural networks — even if what it all entails is not even clear yet.
Someone who is intimately familiar with this area, and has some thoughts on this is Chris Rowen, founder of Tensilica (now part of Cadence), who is now a self-described hat juggler. He is still active Cadence several days a month, working technically on new architectures, working with selected customers in key strategic relationships, and providing strategic advice to some major initiatives at a corporate level.
The next hat he wears is that of Cognite Ventures that he established as a vehicle to focus his work in early stage investing through seed class investing in startups in the deep learning area.
We had a chance to speak this week by phone, and he said he has a number of ideas about some of the major holes in terms of what’s being offered in this space. He is considering developing some technical ideas that may end up being spun out, worked on or funded in some fashion. “I’m taking a more proactive approach, not just waiting for things to come to me but creating the things that I think are probably going to be the most interesting technically there.”
Rowen’s third hat is strategic advice to Stanford’s SystemX Alliance, where in April he will conduct a few workshops on advanced computing architectures, and next generation design productivity.
His passion for deep learning is infectious.
“Certainly there is this tremendous potential and tremendous enthusiasm around deep learning methods. It is nothing less than a revolution in thinking about how to do computing, so it is, at a minimum, the shiny new hammer that everyone has in hand. They look around for nails to hit with it, but they are also banging away on screws and everything else so that it is this technique or this philosophy, which appears to have very broad potential, but nobody really knows what it will ultimately look like or which problems are ultimately a good fit or a bad fit. But just hanging out at a university, the number of students, and the number of researchers that are plunging into deep learning and neural network is really staggering. You show up at a graduate course with an obscure title about natural language interpretation or vision, and you will find hundreds of students that are signed up for these courses because they see this as, A) the most interesting; and B) the hottest area for increasing their economic value,” he said.
Rowen pointed out that, interestingly, at Stanford computer science has become the number one major, and guesses that neural networks is the number one topic in computer science. “It is all the buzz, and I think it’s not very different in the major technical universities around the world. You have this intellectual curiosity, and you have this entrepreneurial spirit, which is so well honed around the world, but especially in American universities, and most especially in places like Stanford that immediately connects some breakthrough idea with people who are saying, ‘Let’s start a company to exploit it.’ It’s no big surprise that we’ll see hundreds of companies spring up over the course of a few years to try and take advantage of it.”
He explained he’s just been trying to get his arms around it, but said it’s clear that it’s quite hard to do because there are so many companies (http://www.cogniteventures.com/the-cognitive-computing-startup-list/), and because there’s so much hype around it. “Everybody puts, ‘AI,’ or ‘machine learning,’ or some other phrase in their description of what they are doing even if they’re using it only in a small way. Part of the task is to sort through and figure out which are deeply exploiting it, and for whom it is a strategic element of their technology portfolio, or what they are doing is likely to have lots and lots of interactions. For example, people in robotics are likely to be big users of it so I’ve been pretty generous in including robotics-related companies in my list, whereas there are lots of people doing business intelligence, and predictive marketing, and customer relationship management, which may have some data mining piece to it but which I’m just a little bit less generous in assuming they are serious deep learning companies. I’ve applied this filter of which are likely to be the ones for whom deep learning, neural networks are really critical to their success.”
…and there are still 190 of these companies…
There’s no question that there is a flood of activity, and of course there will be some winners and some losers, Rowen continued. “Companies will evolve as they figure out what they are doing but the number of companies working on it, the number of people working on it, the rate of progress in terms of people coming up with genuinely clever ways to apply these learning based algorithms I think will be very substantial.”
Some of them are even moving along pretty well. “The lowest barrier to entry is probably people who are deploying a cloud-based service for some kind of recognition activity where you can go to one of these recognition-as-a-service sites, and they provide an API so that you can call the service from within your own application, and the service will return information on what’s in the picture, what’s the sentiment of the people in the images, what are some standard characteristics of obvious people or objects in these image streams. That’s something where the amount of effort to deploy one of these services is pretty moderate that people know how to take advantage of them, and where there’s kind of an existing pay-as-you-go business model that people are generally comfortable with thanks to Amazon web services, and the like. There, there’s a big variety of things, and people are doing it,” he observed.
There’s also a good bit of activity around things like monitoring and surveillance where someone can buy smarter and smarter cameras that provide additional information, Rowen reminded. “There’s a very rich set of things in terms of identifying patterns in documents or text where again, via services or installable applications, lots of specialization — ones for looking automatically at contracts or ones for looking at social media interactions or looking at datasheets and product specifications, or customer service dialogue. All these different specialized forms of text-based interaction where you can extract out information and compare it to standard patterns and use that as a mechanism within some larger application.”
Medical is another area of particular focus for deep learning development.
Still, he admitted a particular fondness for what’s going on in embedded systems, as those will be the things that touch us directly, and which go into the real time interactions that we want to have. “Sure, there are these quasi-real time things like Siri and Alexa, which are using neural networks in a big way but which are not really quite real-time. You pose a question and get an answer back some seconds later, where, if you’re driving a car or you’re interacting with a home device (a television or refrigerator or cell phone), you’re going to want something that is even quicker, and has an even better understanding of context, and who you are. I think there is a lot of progress to be made there because it will change a lot of how human-machine interactions really feel.”
5 Big Predictions for Artificial Intelligence in 2017
Expect to see better language understanding and an AI boom in China, among other things.
Last year was huge for advancements in artificial intelligence and machine learning. But 2017 may well deliver even more. Here are five key things to look forward to.
Reinforcement learning takes inspiration from the ways that animals learn how certain behaviors tend to result in a positive or negative outcome. Using this approach, a computer can, say, figure out how to navigate a maze by trial and error and then associate the positive outcome—exiting the maze—with the actions that led up to it. This lets a machine learn without instruction or even explicit examples. The idea has been around for decades, but combining it with large (or deep) neural networks provides the power needed to make it work on really complex problems (like the game of Go). Through relentless experimentation, as well as analysis of previous games, AlphaGo figured out for itself how play the game at an expert level.
The hope is that reinforcement learning will now prove useful in many real-world situations. And the recent release of several simulated environments should spur progress on the necessary algorithms by increasing the range of skills computers can acquire this way.
In 2017, we are likely to see attempts to apply reinforcement learning to problems such as automated driving and industrial robotics. Google has already boasted of using deep reinforcement learning to make its data centers more efficient. But the approach remains experimental, and it still requires time-consuming simulation, so it’ll be interesting to see how effectively it can be deployed.
Dueling neural networks
At the banner AI academic gathering held recently in Barcelona, the Neural Information Processing Systems conference, much of the buzz was about a new machine-learning technique known as generative adversarial networks.
Invented by Ian Goodfellow, now a research scientist at OpenAI, generative adversarial networks, or GANs, are systems consisting of one network that generates new data after learning from a training set, and another that tries to discriminate between real and fake data. By working together, these networks can produce very realistic synthetic data. The approach could be used to generate video-game scenery, de-blur pixelated video footage, or apply stylistic changes to computer-generated designs.
Yoshua Bengio, one of the world’s leading experts on machine learning (and Goodfellow’s PhD advisor at the University of Montreal), said at NIPS that the approach is especially exciting because it offers a powerful way for computers to learn from unlabeled data—something many believe may hold the key to making computers a lot more intelligent in years to come.
China’s AI boom
This may also be the year in which China starts looking like a major player in the field of AI. The country’s tech industry is shifting away from copying Western companies, and it has identified AI and machine learning as the next big areas of innovation.
China’s leading search company, Baidu, has had an AI-focused lab for some time, and it is reaping the rewards in terms of improvements in technologies such as voice recognition and natural language processing, as well as a better-optimized advertising business. Other players are now scrambling to catch up. Tencent, which offers the hugely successful mobile-first messaging and networking app WeChat, opened an AI lab last year, and the company was busy recruiting talent at NIPS. Didi, the ride-sharing giant that bought Uber’s Chinese operations earlier this year, is also building out a lab and reportedly working on its own driverless cars.
Chinese investors are now pouring money into AI-focused startups, and the Chinese government has signaled a desire to see the country’s AI industry blossom, pledging to invest about $15 billion by 2018.
Ask AI researchers what their next big target is, and they are likely to mention language. The hope is that techniques that have produced spectacular progress in voice and image recognition, among other areas, may also help computers parse and generate language more effectively.
This is a long-standing goal in artificial intelligence, and the prospect of computers communicating and interacting with us using language is a fascinating one. Better language understanding would make machines a whole lot more useful. But the challenge is a formidable one, given the complexity, subtlety, and power of language.
Don’t expect to get into deep and meaningful conversation with your smartphone for a while. But some impressive inroads are being made, and you can expect further advances in this area in 2017.
Backlash to the hype
As well as genuine advances and exciting new applications, 2016 saw the hype surrounding artificial intelligence reach heady new heights. While many have faith in the underlying value of technologies being developed today, it’s hard to escape the feeling that the publicity surrounding AI is getting a little out of hand.
Some AI researchers are evidently irritated. A launch party was organized during NIPS for a fake AI startup called Rocket AI, to highlight the growing mania and nonsense around real AI research. The deception wasn’t very convincing, but it was a fun way to draw attention to a genuine problem.
Turns out anyone can make a multi-million dollar company in 30 minutes
…with a website editor whilst in a spanish mansion found on AirBnB. ‘Temporally Recurrent Optimal Learning’ is a combination of buzzwords we put together to spell out TROL(L) that were conjured up over breakfast. If we hadn’t put significant effort into making sure people realized it was a joke, Rocket AI would be in the press right now.
One real problem is that hype inevitably leads to a sense of disappointment when big breakthroughs don’t happen, causing overvalued startups to fail and investment to dry up. Perhaps 2017 will feature some sort of backlash against the AI hype machine—and maybe that wouldn’t be such a bad thing.