Making the future happen

August 9th, 2018

I gave a talk last night in which I showed a vision for the future. Everything I literally demonstrated on my computer is clearly possible today — after all, people were seeing me do the demo right there in front of them.

Yet I was telling a tale of things that are not yet possible, yet will be in the future. This relationship between what was real and what was merely suggested created a dramatic tension, and that tension was the heart of the story.

I think we feel a similar tension whenever we experience a play or a movie, attend an opera or read a novel. We are being introduced to a world that clearly does not exist, in any literal sense.

Yet it could exist, at least in our collective imaginations. The task of the players is not to fool their audience, but rather to invite that audience to willingly enter a shared land of make-believe.

There is an extra dimension to this invitation when the “land of make-believe” is presented as though it is real. We feel this dimension when we watch a movie filmed in a style of Cinéma vérité, or a performance by a stage conjurer. Logic tells us that the thing we are watching is clearly not happening, yet our senses tell us otherwise.

I think my talks about the future fit into this latter category. I want people to experience an exciting and positive future as though it is already here.

My goals in doing so are quite specific: I’m not trying to fool my audience. Rather, I am inviting them to join me in making the future happen.

South of the border

August 8th, 2018

Since I will be traveling to Canada tomorrow, my mind is turning to the mystery of borders. How is it that we can take a single step, and end up in an entirely different country?

When we take such a step, our body may move only a matter of inches, yet the human laws governing our body radically change, sometimes in unexpected ways. What a tortured definition of “reality” we humans must have, for such a thing to make any sense to us at all.

I think of this when I ponder the concept of my tax dollars going to build a border wall between our country and another. What if I don’t want such a wall? Should I still be required to help pay for it?

Unfortunately it seems to be official U.S. policy that the nation south of such a wall is financially responsible for its construction. So when that border wall is finished, how much will our nation be required to pay?

I guess Justin Trudeau will tell us when he is good and ready.

One of those crazy days

August 7th, 2018

Today was one of those crazy days.

I mean, I knew going in that today was going to be crazy, but I had woefully underestimated the level of craziness. It all felt a bit like some alternate version of This is Spinal Tap where Nigel says “This day goes to eleven.”

In addition to everyone in our project running around madly to get our big CAVE project ready for SIGGRAPH (starting in just four days!), there was an extremely large film crew from Oculus making a documentary about us.

They were really nice people, but they were a film crew.
Interestingly, they had been told not to show anything with logos on it. Imagine trying to film a whole crew of computer science students at work without showing a single tee shirt with a logo. It isn’t easy.

I came in at seven in the morning to get all my programming done for the day. I knew that by 9am I would be spending all my time putting out fires, continually switching roles depending on who I was talking to.

Sure enough, most of today felt like a cross between the stateroom scene from A Night at the Opera and the birthday cake scene from Bugsy. I guess when you think about it, there are worse ways to spend your day than feeling like a cross between Groucho Marx and Warren Beatty.

With maybe just a dash of Nigel Tufnel thrown into the mix.

Super glasses!

August 6th, 2018

Speaking of eyeglasses, I’ve always wondered about Superman and Supergirl, ever since I was a little kid. In the comics — and more recently in the movies and TV shows — these two superheroes can always walk about unrecognized, just by putting on a pair of glasses.

Now, I don’t know about you, but if I put on a pair of glasses, I just look like me wearing a pair of glasses. This seems like a far more remarkable superpower than their ability to fly or to burn through walls with death-ray eyes, or even to bend steel with their bare hands.

There are lots of amazing wonders in the Marvel Universe, including a billionaire who flies around in an iron suit, a guy who can shrink down really really small, a giant green id monster, another guy who’s part spider, a talking raccoon, and whole family of Norse gods — the list goes on.

But not one of these people has that far more awesome and eerily inexplicable superpower possessed by our heroes from DC Comics: Just let these folks slip into an ordinary pair of glasses, and nobody will ever recognize them.

How cool is that?

Notes on Future Language, part 8

August 5th, 2018

At some point, implementation of future language will need to move past a discussion of principles, and into the empirical stage. This will require an actual hardware and software implementation.

Unfortunately, the hardware support to make all of this happen does not quite yet exist in a way that is accessible to a large population. Yet it can be created in the laboratory.

Kids don’t need to be wearing future augmented reality glasses to be able to hold visually augmented conversations with other kids. They just need to be able to have the experience of doing so.

For this purpose we can use large projection screens that allow kids to face each other, placing cameras directly behind those screens so that our young conversants can look directly into each others’ eyes. We can also place a number of depth cameras behind and around each screen, and use machine learning to help us convert that depth data into head, hand and finger positions.

When this setup is properly implemented, the effect to each participant will be as though they are facing their friend, while glowing visualizations float in the air between them. They will be able to use their own gaze direction and hand gestures to create, control and manipulate those visualizations.

What we learn from this experimental set-up can then be applied to next-gen consumer level wearables, when that technology becomes widely available. At that point, our large screen will be replaced by lightweight wearable technology that will look like an ordinary pair of glasses.

Little kids will simply take those glasses for granted, just as little kids now take SmartPhones for granted. All tracking of head, eye gaze and hand gestures will be done via cameras that are built directly into the frames.

The eye worn device itself will have only modest processing power, sufficient to capture 3D shapes and to display animated 3D graphical figures. Those computations will be continually augmented by a SmartPhone-like device in the user’s pocket, which will use Deep Learning to rapidly convert those 3D shapes into hand and finger positions. That intermediate device will in turn be in continual communication with the Cloud, which will perform high level tasks of semantic interpretation.

The transition to a widely available lightweight consumer level platform will take a few years. Meanwhile, nothing prevents us from starting to build laboratory prototypes right now, and thereby begin our empirical exploration of Future Language.

Notes on Future Language, part 7

August 4th, 2018

As we discussed yesterday, ideally we want our augmentation of language to be naturally learnable. But how can we do such a thing?

Suppose we were to put together a committee of the smartest and most aware linguists. Alas, any gestural vocabulary or grammatical rules proposed by such a committee would be doomed to failure.

The adult mind simply cannot determine what is naturally learnable. Otherwise, millions of people would now be speaking that carefully constructed and beautifully rational language Esperanto.

The key is to allow our language extensions to be designed by little children. One might object that little children are not equipped to program computers.

Yet we can get around that objection as follows: Let us assume, thanks to forthcoming augmented reality technology, that little children can see the results of gestures they make in the air.

We can then observe the corpus of gestures those children make as they converse with one another, using machine learning to help us categorize that corpus. Initially we put only the most basic of behaviors into the system.

For example, a spoken word might generate a visual representation of that word (eg: saying the word “elephant” would generate a cartoon image of an elephant). Also, as children point or reach toward various virtual objects floating in the air, we might highlight those objects for them.

We then gather information from how they use this basic system to tell stories. As we do this, we aim to interfere as little as possible.

As we observe patterns and correlations between how children relate meaning and gestures, we may periodically add “power-ups”. For example, we might observe that children often grab at an object and then move their hand about. From such an observation, we may choose to make objects move around whenever children grab them.

We then observe over time whether children make use of any given power-up. If not, we remove it. Our goal is not to add any specific features, but to learn over time what children actually do when they have the opportunity to communicate with each other with visually augmented gesture.

By taking such an approach, we are guaranteed that the language which evolves out of this process will be naturally learnable by future generations of children.

Tomorrow: The last chapter in our story.

Notes on Future Language, part 6

August 3rd, 2018

The foregoing gesture examples may all seem plausible, but that doesn’t make them correct. “Correctness” in this case means whatever is naturally learnable.

Linguists have a very specific definition for the phrase “naturally learnable”. It doesn’t mean something that can be learned through conscious practice and study. Rather, it means something that one learns even without conscious practice or study.

For example, one’s native spoken language is naturally learnable. We didn’t need to go to school to learn our first spoken language — we began to gradually speak it when we were still young children, simply by being exposed to it.

In contrast, written language is not naturally learnable. Most people need to put in the effort required to consciously study and practice before they can read or write effectively.

Attempts to create a synthetic “natural language” generally fail, in the sense that children will not learn them. For example, when children are exposed to Esperanto, they will spontaneously try to alter it, because its rules violate their innate instinct for natural language.

There is now a general consensus amongst evolutionary linguists that natural language and children below around the age of seven are a co-evolution: Natural language evolved to be learnable (and modifiable over time) by little children, while simultaneously little children evolved to learn and modify natural language.

Tomorrow we will discuss what this means for our topic of future language.

Notes on Future Language, part 5

August 2nd, 2018

In addition to relying on speech to fill in the meaning of iconic gestures, it will also be useful to provide each conversant the option to sketch specific shapes in the air while speaking. This would be helpful in situations where a physical shape contributes strongly to the intended meaning of the speech act.

For example, saying the word “time” while drawing a rectangle might result in a calendar, whereas saying the same word while drawing a circle might result in a clock. In each case, the drawn shape acts as a modifier on the spoken word, lending it a more context-specific meaning.

It will also be useful to distinguish between three ways to gesture with the hands: one-handed, symmetric two handed, or asymmetric two-handed.

An example of a one-handed gesture would be: Close the fist on a visual icon, move the hand, and then release the fist, which could be a way to indicate: “I move this to over there.”

An example of a symmetric two-handed gesture would be: Hold the two hands open with palms facing each other so that an icon is positioned between them, then spread the hands further apart, which could be a way to indicate “Let’s zoom in on this to see more detail.”

An example of an asymmetric two-handed gesture would be: Close the fist of one hand on an object, then pull the other hand, with fist closed, away from the first hand, which could be a way to indicate: “Let’s make a copy of this.”

More tomorrow.

Notes on Future Language, part 4

August 1st, 2018

We’re talking about using gesture to create visible representations, as a way to augment conversational speech. To break this down into different types of gesture, we should refer back to the reasons we already use various types of gesture.

Pointing is easy. Anything we have visually created can be pointed to, while saying words like “this” or “that”. Also, we can point at one item and then at another to establish a relationship between the two objects.

If we have drawn anything that has a process component, beat gestures are a natural way to iterate through that process. Beat gestures are essentially a way saying “here is the next thing.”

There is an interesting relationship between pointing and beat gestures when it comes to describing time-varying processes: To go back to our cooking recipe example, we can use pointing to refer to a particular place in the recipe. Then we use beat gestures to advance step by step through the recipe instructions.

When used to augment speech, symbols essentially act as adverbs. For example, we can use symbolic gestures to make it clear that things are happening fast or slow, calmly or with agitation, definitively or with confusion, or in a friendly or hostile manner.

Lastly, icons, particularly when used in tandem with spoken words, can be used to create visual representations of actual topics of conversation — a chair, a tree, a calendar, the Sun. Because we are speaking while gesturing, we don’t actually need to draw the objects under discussion. Rather, we can use iconic gestures to indicate a location and size for the visual representation of each object or concept under discussion.

More tomorrow.

Notes on Future Language, part 3

July 31st, 2018

So the gestural tools we have to work with are symbols, pointing, beats and icons. In addition, we will have the ability for all participants in a conversation to see the results of their gestures, perhaps as glowing lines floating in the air.

As we think about how to use our gestural tools, it is important to remember that we are not trying to replace verbal speech, but rather to augment it. The situation is somewhat analogous to having a conversation at a whiteboard. As each participant speaks, they draw pictures that help clarify and expand upon the meaning of their words.

One key difference is that if you have a computer in the loop, then the lines you draw can spring to life, animating or changing shape as needed. You are no longer stuck with the static pictures that a whiteboard can provide.

For example, if you are trying to convey “this goes there, then that goes there”, you can do better than just draw arrows — you can actually show the items in question traveling from one place to another. Which means that if you are trying to describe a process that involves asynchronous operations (for example, a cooking recipe), your visualization can act out the process, providing an animated representation of the meaning that you are trying to convey.

So how do we use symbols, pointing, beats and icons to make that happen? That’s a topic for tomorrow.