On the third day of Google I/O 2019, Geoffrey Hinton, the latest Turing Award winner and Google Brain senior researcher, had a conversation with Nicholas Thompson, the current editor of Wired magazine. Despite being the last day of the Developers Conference and the interviews were scheduled for lunch, this is still the most interesting event of this year's Google I/O, except for the official keynote on the first day.
In the 1980s, Hinton proposed the idea of using artificial neural networks as the cornerstone of machine learning research, but for a long time, his views were regarded by the academic circles and the industry as & margino things & rsquo;, even a kind ‘ wishful thinking & rsquo;. Until the new century, with the rapid increase of the speed of computer operation, the deep neural network has a large amount of data for training, artificial intelligence finally ushered in a new situation.
In 2012, Hinton led two students to propose a deep convolutional neural network model, AlexNet, which made a major breakthrough in image recognition. The importance of his many years of research work was recognized by the entire industry. In addition to wearing the ‘Deep Learning Godfather & rsquo; Corolla, Hinton was awarded the Turing Award for 2018 with Yoshua Bengio and Yann LeCun two months ago.
Yann LeCun, Geoff Hinton, and Yoshua Bengio WIRED
Hinton, who had been able to stand up for many years because of back pain, stood by to complete the conversation, as he ridiculed in the live video, he was far ahead of the current trend & rsquo; The same is true in his field of expertise. After the Turing Award, the genius who was rarely interviewed talked about his own research, confidence and expectations for machine intelligence, and the enlightenment of the future world and dreams.
The following is an interview with the reporters in front of the Geek Park from the scene, edited by the Geek Park, there are cuts.
Q: Nicholas Thompson
A: Geoffrey Hinton
Geoff Hinton debuts at Google I/O Geek Park Frontline Reporter
Q: 20 years ago, when you published some influential articles. Everyone said that this is a good idea, but in fact we can't design a computer like this. Talk to us, why do you insist, why do you believe that your findings are important?
A: Actually it was 40 years ago. For me, there is only one way for human brain work —— to work by learning the strength of connections between neurons. If you want a piece of equipment to do some smart work, you have two choices. You can program it yourself or let the machine learn by yourself. Of course we don't choose programming. We can only find ways to let the machine learn. So (I think) this must be the right way.
Q: Most of you are familiar with neural networks, but please explain your initial thoughts and how it is formed in your mind.
A: Relatively simple processing elements, that is, loose neurons, are connected together. Each connection point has a weight, which is learned by changing the weight on the connection point. What neurons do is multiply the activity at the join point by the weight, add them up, and decide whether to send an output. If the sum is large enough, an output is sent, and if the sum is negative, it sends nothing. All you have to do is connect the countless weights and find ways to adjust the weights, then the neural network can do everything. So this is a question of adjusting weights.
Q: So, when did you first know that it works like a brain?
A: The neural network has always been designed in such a way that it simulates how the brain works.
Q: So at some point in your career, you start to understand how the brain works. Maybe when you were 12 years old, maybe when you were 25 years old, when did you decide to use a computer to simulate how the brain works?
A: This is the key to the problem. The whole idea of neural networks is to have a device that learns like a brain. It's not my idea, as people think the brain learns by changing the strength of the connection. Turing has the same idea. Although he invented many of the basics of standard computer science, he believes that the brain is an unorganized, random-weighted device that uses reinforcement learning to change connections, and it learns everything. He believes this is the best way to get smart.
Q: So you follow Turing's thoughts. The best way to make a machine is to imitate the human brain. This is how the human brain works. Let us build such a machine.
A: This is not just the idea of Turing. Many people think so.
Q: So you have this idea, many people have this idea. You got a lot of praise in the late 1980s, because the published works are famous, right?
Q: When is the darkest? Those who have supported Turing's ideas have begun to shrink, but when did you move forward?
A: There is always a group of people who have always believed in it, especially in psychology. But among computer scientists, I think in the 1990s, when the data set was very small, the computer was not so fast. In small dataset processing, other methods, such as support vector machines, can achieve better results without being affected by noise. This is very frustrating because we developed back propagation in the 1980s, which we thought could solve all problems, but the opposite. This is just a matter of scale, but we didn't really understand them at the time.
Q: Then why do you think this doesn't work?
A: This is because we don't have a very correct algorithm. We don't have the correct objective function. I think for a long time, because we are trying to supervise learning, you have to label the data, and we should have unsupervised learning, you only need to learn from the unlabeled data, and finally found that this is mainly a scale. The problem.
Q: This is very interesting. The problem is that you don't have enough data. You think you have enough data but you didn't mark it correctly. So you just misunderstood the problem?
A: I think that using labels is wrong. You should not use the label to complete most of the learning, but try to simulate the structure based on the data. I still believe that as computers get faster and faster, any given data set, as long as the computer is fast enough, you can do unsupervised learning better, once you complete unsupervised learning, you will be less Label learning.
Q: So in the 1990s, you were still doing research and still publishing research in academia. But there is no big breakthrough. Have you ever thought about giving up the study of deep learning and doing something else?
A: This kind of work is a must. I mean the brain learns the connection of neurons. We have to understand this. There may be a lot of ways to learn the strength of the connection. The brain uses one of them, and there are other ways. But you have to learn a method, I have never doubted this.
Q: Well, you never doubt, when did you start, what did you insist on?
A: In the 80s, if you built a network with many hidden layers, you can't train them. Yann LeCun developed the Convolutional Neural Network (CNN) to train only fairly simple tasks, such as implementing machine-readable handwriting, but for most deep networks, we don't know how to train them.
In 2005, I developed an unsupervised training method for deep networks, such as you input pixel values and then learn a set of feature detectors that explain why pixel values have such characteristics, and then you Using this set of feature detectors as data, you learned another set of feature detectors, and we can explain why these feature detectors have these correlations. Then you continue to learn layer by layer. Interestingly, you can do mathematical calculations and prove that every time you get a layer that doesn't necessarily have a better data model than the last time, you are constantly moving forward.
Q: I know, you are doing observations, the results are not correct, but they are getting closer and closer. For example, I am doing some generalizations to the audience, not right at once, but I will do better and better. Is this the meaning in general?
Q: In 2005, you made a breakthrough in mathematics. When did you get the right answer, what kind of data are you computing, and you took your first step breakthrough in voice data.
A: This is just a lot of data, very simple measurement. Around the same time, they started developing GPUs, and people who researched neural networks used GPUs around 2007. I have a very good student who uses the GPU to find the way in aerial remote sensing images. He wrote some code, which was then reused by other students to identify the phonemes in the voice, and then they used the Pre-Training idea, after pre-Training, put the tag on it and used backpropagation. It turns out that based on Pre-Training, you can get a good deep network and then use backpropagation. The results are indeed defeating the standard of speech recognition at the time. At first, it was only a very small step.
Q: Does it beat the best commercially available speech recognition or defeat the academic research of speech recognition?
A: On a relatively small data set called TIMIT, it performed slightly better than the best academic research and was better than IBM's. Soon, people realized that this 30-year-old technology is defeating the standard model, and as long as it goes further, it will develop even better.
So my graduate students went to Microsoft, IBM, Google, and Google to switch to the production of speech recognizers. By 2012, the results have appeared on Android, and since the development of 3 years in 2009, Android has suddenly become more adept at speech recognition.
Q: So, you developed this technical concept 40 years ago. Since you published your article for 20 years, you are finally ahead of your peers. What was your mood at the time?
A: I have this idea for 30 years.
Q: Haha, yes, 30 years, this is still & lsquo; new & rsquo; ideas.
A: It finally reached the most advanced level in a real problem, which feels good.
Q: When you start applying it to other problems, you realize that it is effective in speech recognition.
A: Let me give you a few examples. George Dahl, one of the first people to work on speech recognition, applied deep learning to the molecular world, and you want to predict whether the molecule will bind to a substance as a good drug. There was a game at the time. He applied the standard techniques we designed for the American Heart Association to predictive drug activities. Their victory is a sign that deep learning can be applied universally. I have a student called Ilya Sutskever who told me, Geoff, do you know? Deep learning should be applied to image recognition. Li Feifei has created the correct data set, and the open competition has begun. We must do it. So we developed a technical method based on Yann LeCun's theory. One of my students is Alex Krizhevsky. He is really a magician who is good at programming CPUs. We get better results than the standard computer vision of 2012.
Q: Modeling, Chemistry, Speech. This is the three areas in which it succeeded. In which areas did it fail?
A: Failure is only temporary.
Q: In which area did it fail? (laugh)
A: For example, machine translation, I think we need to take a long time to succeed. For example, if you have a string of symbols coming in and another string of symbols to go out, between the two, you are working on a string of symbols. This is very reasonable. This is the classic AI. In fact, it doesn't work like this. Symbols come in, you turn them into huge vectors in your brain, these vectors interact, and then you convert them back instead of converting the string of symbols. If you said to me in 2012, in the next 5 years, you should use the same technology to achieve the effect of translating between multiple languages, Recurrent Net, but if it is only random initial weight The random gradient drops, and I don't believe that things will happen much faster than we expected.
Q: So, what is the difference between the fastest field and the most time-consuming field, such as visual processing, speech recognition, is the core human activity we use sensory perception, will this be the first obstacle to be cleared?
A: Other things like motion control, we humans are good at motion control, but deep learning will eventually win. Abstract reasoning, I think it is one of the last things we have to learn.
‘Things that humans can do, neural networks can do ’ Google I/O
Q: So you always said that the neural network will eventually win everything?
A: We have our own neural network, right? What humans can do, neural networks can do too.
Q: The human brain is not necessarily the most efficient computer ever. Is there a way to model machines that is more efficient than the human brain?
A: Philosophically, I don't object to the idea that I can do all this in a completely different way. This approach may be that you start with logic, you try to automate the logic, and make some good improvements, you make the reason and then decide to visually perceive through reasoning. This method is likely to succeed, but the result is not successful. But I am not against philosophical victory, but we know that the brain can't.
Q: But there are also some things that our brains don't do well. Will these things be done badly?
A: It's quite possible.
Q: There's a separate problem. We don't know how these things work at all. We don't know the top-down neural network.
A: Look at the machine vision system now. Most of them are basically feed-forward, they don't need feedback links. Another thing about current machine vision systems is that they are very prone to confrontational examples. You can change a few pixels, like a picture of a panda. You see it's a panda, but the machine suddenly says it's an ostrich, but the problem is that you know it's a panda. At first we thought these machines were okay, and then there was something like a panda ostrich, but after that, we began to worry.
I think part of the problem is that they are not reconstructed from advanced but representational ones. They try to learn differently. You just need to learn feature detectors layer by layer. The whole goal is to change weights so that you can get the right answers better. They don't do similar things on each level of feature detectors, check if you can reconstruct the underlying data from the activity of these feature detectors.
Recently, in Toronto, we found out, or Nick,
Q: Let's talk about a larger topic. Now that neural networks can solve a variety of problems, is there any mystery in the human brain that neural networks can't capture?
Q: No? So emotions, love and consciousness can be reconstructed through neural networks?
A: Of course. Once you understand what these things are. We are neural networks, aren't we?
In addition, consciousness, I am very interested in this. People don't really know what it is, and there are various interpretations of consciousness. I think it's a term that hasn't been scientifically validated. One hundred years ago, for example, you asked people, what is life? They would say that all living things have vitality, and once they die, the vitality drifts away. That is the difference between life and death, that is, whether you have such vitality or not.
Now? Now we don't say we have any vitality, we think it's a superstition, now we understand biochemistry, we understand molecular biology, we don't need vitality to explain life. I think the same is true of consciousness. I think
Nicholas Thompson, Front Line Reporter for Geoffrey Hinton Geek Park
Q: When it comes to studying the human brain to improve computers, what are we actually studying? What happens on the other hand? Can we learn from computer research how to improve our brains?
A: I think what we've learned in the last 10 years is that you use a system with billions of parameters and do Stochastic Gradient Descent in some objective functions, which may get the right labels to fill in the gaps in a string of characters. Any old objective function. It works much better than you think. Most traditional AI people, like you, would like to take a system with a billion parameters and start with random values, measure the gradient of the objective function, that is, the gradient of each parameter, and figure out what will happen if you slightly change the objective function of this parameter. Then change it in this direction to improve the objective function. You might think it's a hopeless algorithm and they'll get into trouble, but it turns out to be a very good algorithm. The bigger you zoom in, the better it works. It's just an empirical discovery. There are some theories, but it is still an empirical discovery at present, because we have found that it makes it more reasonable. The brain is calculating the gradient of some target functions and updating the weight of synaptic strength to follow the gradient. We just need to figure out how it gets the gradient and what the objective function is.
Q: But we don't understand the brain.
A: It's a theory, a long time ago, so it's a possibility. But there's always a traditional computer scientist behind the scenes who says yes, but it's all random. You just learn its idea by gradient descent, which is not feasible for a billion parameters. You have to have a lot of knowledge about connectors, and we now know that this is wrong. You can fill in random parameters and learn everything.
Q: Let's expand it. As we learn more and more, when we test these large-scale models based on how we think it works, we may learn more about how the human brain works. Once we understand it better, can we fundamentally reconstruct our brains to be more like the most efficient machines, or change our way of thinking? The relationship it uses should be simple, but not in simulation.
A: You will think that if we really understand what is happening, we should be able to make education better, and I think we will do that. It would be interesting if you could finally understand what your brain is doing, how it learns, and why it can't adapt to the environment for better learning.
Q: Let's not go too far into the future. In the next few years, how do you think we will use our knowledge of the brain and in-depth learning to change the function of education? How will you change the classroom?
A: In a few years, I'm not sure how much we can learn. I think it will change education, but it will take longer. But if you think about it, you will find that virtual assistants are becoming smarter and smarter. Once the system can really understand the dialogue, virtual assistants can talk to children and educate them. So I think most of the new knowledge I've learned comes from thinking. I'm thinking, typing something into Google, and then Google will tell me. You just need dialogue to get better knowledge.
Q: In theory, when we get to know the brain better, you can improve and program virtual assistants. After learning, virtual assistants can have better conversations with children.
A: Yes, I haven't thought much about it. It's not my field of expertise, but it seems reasonable to me.
Q: We will also be able to understand how dreams work, which is one of the biggest mysteries. So robots can really dream of electronic sheep. Last question. I heard about one of your podcasts. You said that what you treasure most is the ideas of young graduate students who have just entered your lab, because they will not be imprisoned in the old ideas, there are many new ideas, and they also know a lot. Are you looking for inspiration beyond your own research? Do you think you have limitations? Are there any new graduate students who work with you, or even people in this room coming over and saying they don't agree with you?
A: Well, everything I said (there were objections). (Laughter off the stage)
Q: We have a separate question. Deep learning used to be an independent term, but now it has become synonymous with artificial intelligence, and now artificial intelligence has become a marketing propaganda means, meaning has become how to use machines, dare to say that they are artificial intelligence. As a pioneer in this field, what do you think of the change in the terminology?
A: once artificial intelligence meant that you were logically inspired to manipulate symbolic strings. Then the neural network means that you want the machine to learn in the neural network. These two are completely different enterprises, and the relationship between the two categories is not good, often scrambling for investment, I grew up in such an environment, that will I will be much happier. Now, I see a lot of people who used to spray our neural networks, and now they open their mouths and close their mouths.
Q: What do you mean is that you have succeeded in your field of study and have absorbed other fields to some extent, which has also given researchers in other fields a chance to invest in the East Wind, which will make you a little depressed?
A: Well, it's not fair to say that, because many of them have changed.
Q: I find I still have time for a problem. In the podcast I was talking about, you said you'd think AI was like an excavator, either digging a hole or killing yourself with a shovel. The key is how to design the excavator and let it successfully dig a hole instead of hammering itself. When do you think you're going to do this in your job?
A: I should never intentionally make weapons. You can design a excavator and easily shovel the human head. I think it's too bad to use the excavator. I will never study it in this direction.