Passthrough, part 3

Actually, the future I described yesterday already exists for millions of people. But not for their eyes — for their ears.

Consumer audio equipment such as the Apple Earbuds do something that could be called “audio passthrough”. They take reality (in the form of sound waves entering the ear), digitize it, modify that digital signal to taste, combine it with a synthetic digital signal (eg: recorded songs), and then convert back to sound waves for the user to hear.

This lets those audio devices do some pretty impressive things. For example, they can selectively filter out or enhance sound in the world around you, let you hear only sound in front of you but not from other directions (like when you are talking with a friend in a crowded restaurant), or block out sudden loud sounds that might damage your ears.

The key is those four steps: (1) digitizing sound, (2) modifying that digital signal to taste, (3) mixing with synthetic sound, and finally (4) turning that mixture back into sound waves. This is exactly is exactly what you would want (but cannot yet have) in visual passthrough.

So why is that capability available for audio, but not for video? It’s because of Moore’s Law.

Moore’s Law states that computers get approximately 100 times faster every decade. And it turns out that the computer power needed to interactively process an audio signal is about 100 times less than the computer power needed to interactively process a video signal.

I realized back in the 1980s, when I was developing the first procedural shaders for computer graphics, that some of my colleagues in the field of computer music had gotten there a decade earlier. In the 1970s, they had already been adding synthetic noise to audio signals, modifying frequencies, applying filters that turned one musical instrument into another, and much more — all in real time.

As I learned more about computer music synthesis, I gradually came to understand that I was following in their footsteps. And I think that principle is just as valid today. If you want to understand future video passthrough, study present-day audio passthrough.

More tomorrow.

Passthrough, part 2

Today’s mixed reality headsets let you put computer graphics in front of a video capture of the real world around you. The video itself is not quite up to the resolution and color richness of actual reality, but over time, as technology continues to advance, that gap will close.

Today’s headsets only let you see the real world behind synthetic computer graphics. You are not given the ability to modify your view into reality.

But in the future you will be able to edit the world around you through your glasses. You will be able to zoom in, enhance colors, highlight objects, or selectively sharpen details of things that interest you.

More tomorrow.

Passthrough, part 1

The video passthrough feature of the Meta Quest 3 and Apple Vision Pro may seem like a novelty today, but I think there is a very profound principal at work here.

The concept of perceptual passthrough can be generalized in many ways. I think that these first devices are just the tip of the iceberg.

More tomorrow.

Leonardo and the Two Cultures

Today, on what would have been Leonardo DaVinci’s 572nd birthday, is a good time to talk about the two times that I saw Lester Codex.

The Lester Codex is a folio that explains, with beautiful illustrations, various theories that Leonardo had about the physical world around us. Not surprisingly, many of his theories were absolutely correct, such as his surmise that the discovery of seashells on mountaintops suggested that those mountaintops were once at the bottom of the ocean.

Many years ago, soon after Bill Gates purchased the Lester Codex for sixty million dollars, he lent it to the NY Museum of Natural History for a public exhibition. The codex was presented together with interactive Microsoft software that let you interactively explore its contents. Not surprisingly, the software was for sale.

That exhibit, which was wonderful, helped the public to experience the greatness of Leonardo the scientist.

Soon after, Gates lent the Lester Codex to the Metropolitan Museum of Art. The curators of that worthy museum put on a public exhibition in which they placed the codex in the context of many other works of art by Leonardo.

That exhibit, which was wonderful, helped the public to experience the greatness of Leonardo the artist.

I saw both exhibitions, and was struck by the implicit war at play. Two of our city’s greatest temples of culture — sitting across each other on opposite sides of Central Park — seemed to be fighting over the meaning of Leonardo’s life and work.

I also saw that this is far from a new battle. In the second exhibition, there was a letter written by an important personage of the day, which the museum curators helpfully translated from the Italian. He was complaining that Leonardo was wasting time in the frivolous pursuit of science, when he could have been spending more time on something actually important — making more paintings.

As C.P. Snow observed in his brilliant essay The Two Cultures, none of this should come as a surprise.

Cherish that

One of the wonderful things about live theater is that you are seeing something completely unique in the history of humankind.

No performance that took place before the one you are attending, or any performance ever again, will be the same as what you are experiencing right now.

Cherish that.

Real-time AI

Right now it takes at least a few seconds from the time you give a text prompt and the time you get an image back from one of the currently available generative A.I. programs. This is a limitation of the current technology.

But Moore’s Law keeps marching onward, in one way or another. With increased computational parallelism and newer semiconductor technologies, response time will gradually trend downward. In another decade or so the response will reliably arrive in a fraction of a second.

From the user’s perspective, this means that images and videos and simulated 3D scenes will appear even while you are describing them. But even more important, it will mean that you can edit those images and videos and simulated 3D scenes simply by continuing to talk. As you speak, the changes will happen right before your eyes.

When we get to that point, generative A.I. will truly become a fundamental new mode of human expression.

Drawing in the air

At some point everybody will be able to wear lightweight and affordable high quality XR glasses. A really good version of this is still some years off, but it’s fun to think about it now.

One of the things you will be able to do with those glasses is simply point your finger and draw in the air. Everybody who is in the room with you will be able to see your drawing, and they will be able to make their own drawings as well.

Of course all of this will be tied to artificial intelligence. After you draw something, you will be able to say things like “What would a couch look like right here?” or “Show me what this would look like as a real vase.” Your drawing will then come to life for everyone as something that looks realistic.

I wonder who just this one feature will change communication.

40 Years an Eclipse

Yesterday I posted an animated eclipse implemented as a procedural texture, in honor of that day’s great celestial event. It’s not a movie clip — it’s a live simulation running on your computer or phone.

Interestingly, this eclipse simulation is actually something that I created exactly 40 years ago. In 1984 I introduced to the world what came to be known as procedural shader languages, in the course of which I created lots of examples.

One of those examples was this procedurally generated eclipse. Originally it ran in my own custom shader language. Then around 18 years ago I re-implemented it in Java. More recently I re-implemented it yet again as a WebGL fragment shader.

But the design and the algorithm has never changed. It’s the same procedural eclipse that I created back in April 1984. Except that back then it took 30 minutes a frame to compute. Now it just runs on your phone in real time, thanks to the wonder of Moore’s Law.

In another 40 years, I wonder what simulations will run in real time that now take half an hour to compute. I can hardly wait!