Johannes Schmidt-Hieber is waiting with bated breath. Almost every week, the media write about the extraordinary capabilities of self-learning algorithms: computer programmes that can learn from the vast quantities of data they are being fed. They can recognise faces, distinguish tumours, speak new languages, decipher handwriting, engage in trade and so on. The possibilities are seemingly endless and the expectations are high – too high, if you ask Schmidt-Hieber. In early October, the government announced its plan to work together with businesses to invest two billion euros in artificial intelligence over the next seven years. Good news, you might think, yet the professor is worried. ‘No one is asking what the result of these investments should be. It is primarily being done because other countries are doing it too. That is not a very good reason.’
Schmidt-Hieber fears that the government has allowed itself to be swept away by lofty expectations that will eventually go unrealised. ‘These past five years, quite a few problems have been solved. People believe you can extrapolate that rising trend into the future, but it will probably level out instead. If the expectations are too high and go unrealised, no one will want to invest in the field anymore. The result will be an ‘AI winter’, just like we saw in the ‘70s and ‘90s.’
Deep learning
The breakthroughs in the field of deep learning are clouding our view of the problems that continue to plague the field. While computer scientists and tech companies from all over the world are working on myriad applications, no one can oversee the field as a whole. ‘Even experts cannot see the big picture. Too many articles are being published and it is impossible to keep up,’ he says in his office. ‘You'll probably end up repeating someone else's work. Doctoral candidates need time to conduct their research, but others may publish two or three articles about the same topic in the meantime.’
Developments occur so rapidly that peer review is often skipped over. This may compromise the quality of the publications. Schmidt-Hieber: ‘During a summer school on self-learning algorithms at Berkeley, a panel discussion between people at the very top of the field was held exclusively about this topic: how can we reduce the output and increase the quality of publications?’
'We must acquire knowledge at a higher, more abstract level'
The professor's solution: statisticians must return to their ivory towers. That might come as a shock in a time when valorisation and the social application of theoretical knowledge are key concerns of the scientific community. However, the problem his field faces is that there is no solid theoretical foundation yet. ‘We must acquire knowledge at a higher, more abstract level, which we can then impart on the next generation. After all, our descendants cannot read thousands of articles or explore five hundred different variations of a method. It has practical use too, because how can we hope to improve anything about our methods in five years’ time if no one can read even one percent of everything that is being published?’
Neural networks
Self-learning algorithms, also known as neural networks, are loosely based on the human brain. They contain several layers of ‘neurons’, countless units that solve sub-problems and share the results with each other. Although the result is visible, as are the data you put in, the computational process that occurs in between remains clouded in obscurity. Schmidt-Hieber does not, as is commonly done, compare the algorithms to black boxes. Instead, he likens them to croquettes: we can examine what is inside, but that mass of numbers does not tell us a whole lot.
Nevertheless, the professor believes there are ways to assess the value of these algorithms. ‘At the moment, this is commonly done by feeding them new data and empirically assessing the method's performance. We are doing just that, but then at the theoretical level. We do not have any real data, but we can predict in a theoretical manner how the data will be processed and how well a method will perform. That allows us to calculate the margin for error.’ To go back to the croquette: it is possible to predict how it will taste based on the recipe, even without actually trying one.
'Scientists love new phenomena that cannot be explained with standard theories'
Another matter that the professor would like to examine is why self-learning algorithms appear to circumvent a key principle of statistics. According to this principle, a mathematical method used to describe a phenomenon works better, the more variables are incorporated in it. Here's the croquette again: if you try to make one with just some information about the frying fat and the cooking time, you won't get very far. At the same time, using too many variables increases the number of errors. In other words, using a ten-page recipe needlessly complicates the process of making a croquette. Mathematicians are therefore always looking for the perfect balance. However, this principle does not seem to apply to self-learning algorithms. Once we get up to thousands or even millions of variables, more appears to be better. Mathematicians have no clue why that is, which makes this question all the more appealing to inquisitive minds. ‘Scientists love new phenomena that cannot be explained with standard theories,’ the professor says enthusiastically. ‘This is a very exciting topic, especially for young people.’
Looks like things are about to get interesting in that ivory tower.