Both sides now

I’ve looked at clouds from both sides now
From up and down and still somehow
It’s cloud illusions I recall
I really don’t know clouds at all.

-Joni Mitchell

I had a wonderful conversation with a colleague this week about machine learning. Not about the specific algorithms and mathematics, but about the philosophy that makes ML tick — the general approach that makes it work as well as it does.

“Maching learning”, as some of you know, is an approach to heuristic algorithms (sometimes known by the sexier term “artificial intelligence”). When a problem is too difficult for a computer to solve by straight ahead computation, sometimes we resort to sneakier methods — approaches that try to look for shortcuts to a solution, and usually (but not always) find them.

What’s generally called “cloud computing” — looking at lots of examples of “things like this” by sifting through large amounts of data, and then using those examples to make better guesses about new things — makes heavy use of such shortcuts. For example, if you want your machine learning algorithm to recognize faces, you can “train” it by showing it lots of examples of photos “in the cloud” that somebody has already labeled as pictures of faces.

The conversation I had this week was about something a little more subtle: The fact that machine learning usually works because it uses information about big things to figure out something about small things, but also information about small things to figure out something about big things.

For example, early techniques for recognizing faces usually started by looking at a low resolution version of a picture and saying “hey, here’s a fuzzy blob that might be a face.” Then it looked at a higher resolution version of the same picture to check for things like eyes, nose and mouth in the proper place.

This didn’t work very well, because in a low res picture there are lots of fuzzy blobs that might be a face, but when you look more closely, most of them turn out not to be faces. Machine learning ups the game by going in both directions at once.

Not only does it look for faces, and check whether there are eyes and noses and mouths inside, but it simultaneously looks for smaller features like eyes, noses and mouths, and checks to see whether they are inside bigger features that look like faces.

The big power-up here is that we’re checking both “big to small” and “small to big”, looking in particular for connections that work in both directions.

It seems pretty simple when you put it like that. Yet this simple change in thinking has had a huge impact on our ability to use computers to recognize things.

2 Responses to “Both sides now”

  1. Ben says:

    > The big power-up here is that we’re checking both “big to small” and “small to big”, looking in particular for connections that work in both directions.

    > It seems pretty simple when you put it like that. Yet this simple change in thinking has had a huge impact on our ability to use computers to recognize things.

    This doesn’t seem to me like the most important contribution of machine learning to computer vision. My candidate for that distinction would be the fact that modern machine learning algorithms automatically learn high-level sub-concepts of a “face” like “eyes” or “nose” or “mouth” themselves, rather than having to have a human specify them. That means that (a) they can learn what “eyes” are more precisely than if a human had to write it down for them; and (b) they can learn other high-level features of a face that a human might not think to make it look for, if those turn out to be helpful.

  2. admin says:

    I completely agree. We are saying different parts of the same thing. Sorry if I wasn’t clear in my attempt to say it in a way that would be clear to non-experts.

    ML indeed does not know a priori that something is a “nose” or an “eye”. Rather, it responds to consistent configurations between things.

    Indeed, as you say, ML is looking at possible relationships between features. It’s not the identification of particular features themselves that make the approach so powerful, but the way that ML is able to discover ways that those features relate to each other.

    There were earlier techniques that attempted to, for example, recognize faces without knowing what a nose is. My point, to say it in a somewhat more technical way, was that with ML, the graph of identifiable relationships between parts is being built from all directions at once.

Leave a Reply