Machine learning and deep transfer learning

Tweet

2013.07.05

This short text explains the basic idea behind deep transfer learning, an ambitious attempt to build machine learning algorithms capable of exploiting prior knowledge. If you're looking for a technical treatment, I highly suggest Mihalkova's Mapping and Revising Markov Logic Networks for Transfer Learning, one of the best algorithm for deep transfer. Also, I do not dwell on the distinctions between deep and shallow transfer, and the various subtypes of machine learning algorithms (supervised vs nonsupervised, online vs batch): I want to provide a strategic overview of what deep transfer learning is about and why it's important.

The standard approach to machine learning

Machine learning is straightforward: data is fed to an algorithm that builds a model and, hopefully, generate good predictions:

Machine learning

The data can be pretty much anything from ecological data to movie preferences. Machine learning algorithms can build effective models because they are tailored for the input data. It is hard, if not impossible, to build by hand the right mathematical model to solve complex problems such as handwriting recognition, spam detection, language processing and many, many, other problems where no simple equation can be found. In these cases, we have to step back and, instead of focusing on building the model ourselves, we design algorithms to do it in our place. That's the essence of machine learning. This approach has been incredibly powerful to solve a wide array of difficult problems in pretty much all fields of inquiry: it's the unreasonable effectiveness of data.

Building models this way is good, but it has a few problems. What can we do when we have little data? If the situation has changed since we collected our data, is our model still good? When we face a similar situation, can we reuse our previous model or do we need to build a new one?

Deep transfer learning algorithms

Machine learning algorithms use a Tabula rasa approach: the algorithms start with nothing and build the model only with the supplied data. It's simple, but it's also inefficient. Deep transfer learning is about transferring knowledge between different tasks. Instead of starting from scratch, deep transfer algorithms can exploit accumulated knowledge to learn faster (we also have good reasons to think deep transfer is a key component to build reliable models, but that's a more complicated topic). It looks like this:

Deep transfer learning

The algorithm, instead of simply reading the input data, will exploit data from a large data-set of prior knowledge. This, in itself, is tricky. The algorithm must make a judgment call: what is relevant to the present subject, what can be used, and what should be discarded? Certainly, our model for US presidential elections will be awful if we try, say, to bring data from football games. So there are risks to deep transfer learning, but the benefits are huge.

To make an analogy with human learning, imagine you need to learn to run. Of course, running is very similar to walking so you won't start from zero. You're able to see that running and walking are similar tasks and thus you can transfer your knowledge of walking into running. It allows you to learn much faster, and also yield interesting information on how the two tasks are related to each other. If you need to learn Mandarin though, running and walking won't serve you. It's a more general approach: a very conservative deep transfer learning algorithm could choose to always reject prior information and would build the model just as before.

Machine learning starts from 0. Big data is nice, but it would be much nicer if we could build models with more than a tiny fraction of it. Deep transfer is about determining what is relevant in previous data-sets and use this information to design better models, and faster! My thesis focuses on doing just that, using the complex heterogeneous data-sets found in ecology.

let world = "世界" in print $ "Hello " ++ world ++ "!"