We are not yet at the point where the cameras that are available to everybody can recognize our hand and finger gestures with high accuracy and high speed. But sometime in the coming years we will get there.
When that happens, hand and finger gestures will start to become incorporated into the ways that we communicate to computers, and communicate with each other through those computers. Gestural language will gradually become more integrated with verbal speech.
Eventually what we call language will be different from what we call language today. We will start to think of hand gestures and verbal speech as simply different aspects of a much richer continuum of communication between people.
At some point, when the supporting technology is ready, someone will get serious about making this happen. I, for one, will appreciate the gesture. 🙂
I am building a geometric modeler for VR, and I am currently doing a lot of the development on a traditional 2D computer screen. That’s because I am still waiting on delivery of a pair of fancy gloves that will accurately track the movements of my hands and fingers.
I have a VR system here, but I don’t see a point in designing with VR controllers, since I would only make all the wrong choices. I would end up designing for the controllers, rather than for my hands and fingers, and it would then be very difficult to change things around.
I figure it is better for now to design with just the traditional mouse and keyboard. This way I will know for sure that I am not pretending to make something that will make use of the full power of hands and fingers.
To misquote Voltaire, the good is the enemy of the great.
There is nothing quite like a quiet day of programming computer graphics. I like writing, and I like drawing, but programming is different.
Using math to construct something visual and new, piece by piece, is at its best, a highly contemplative act. It provides pleasure many parts of your brain at the same time.
I think this is because there are two complementary forms of beauty involved. There is, of course, the visual beauty of the thing you are trying to create.
But there is also the intellectual beauty of the algorithm. There is a unique kind of joy in getting the math right, implementing things an elegant and clear way.
I wonder whether there is a parallel here to the way a good song combines two different kinds of beauty, when the words and words and music complement each other just right.
Continuing the theme from yesterday…
When thinking about customized Web views, maybe we shouldn’t start with “how would we implement this?” Instead we might start with “how would someone interact with this?”
Suppose somebody is looking at the Wikipedia page listing all of the people who were born on this day of the year. They might want, for example, to say: “For all of the people on this list who are still alive, change the text color to green.”
How close could I come to letting people have an interface as simple as that? Is natural language really the best way to approach it?
After all, if you were asking another person to do that for you, you would just talk to them in plain English (or whatever is your shared language).
Is that really the right way to go? Or would it be better to provide some sort of visual drag and click interface with menu options?
It seems to me that there are at least three separate but related questions here: (1) What is the best way for a user to interact with such a system; (2) How do we get the system to properly interpret what the user wants; and (3) How do we really implement all this on the back end in the Web browser?
None of those questions are easily answered. Which might be one reason that this sort of thing is not yet readily available.
When I go to today’s date on Wikipedia, I see that a lot of famous people have birthdays today. I don’t recognize all of the names.
But I’ll bet that in many cases, I would recognize the faces. And that might make it more likely that I would click on that person to learn more about them.
The way things are now, in order to see the person’s face, I can do one of two things: Either I hover my mouse over their name for a bit (which may or may not show me their face) or I click to go to their actual Wikipedia entry.
In a more fully realized version of the Web, there should be a way for me, as an ordinary user, to specify custom visual filters for the page I am viewing. I might want to specify, for example, that every place a person’s name appears as a link, I want my browser to go to that page behind the scenes, fetch the first image of a person it finds, and paste it onto the page I am looking at.
You could do this now, but you would need serious programming chops. And that is not ideal.
Would it be difficult to create an easy way for non-programmers to have that kind of capability? I hope not, and I think it would be a cool thing for somebody to work on.
If you went to live in a cave for a year or two and then came out, the world would seem a bit different. Some things would be the same, but others would be oddly unfamiliar.
I think what is going on now in the U.S. is a little like that, except that everybody is coming out of the cave at the same time. Those of us who get the vaccine are walking into stores without masks, and sometimes seeing the faces of the people who work there for the first time.
We may have known these people for months, yet never really knew before what they look like. And it’s also the first time they know what we really look like.
There is still that weird moment of alarm when I touch an object or a surface that somebody else has touched in a public place. Habits of caution tend to run deep, especially after so many months.
I need to remind myself that the world is now safer than it was. But maybe it’s just as well that we are a little more thoughtful about things than we were before.
When you work in any specialized field you develop knowledge not shared by most of the population. This is true whether you are a pilot, a plumber, an accountant or a brain surgeon.
So when you explain what you are working on, there are two general ways you can do it, depending on your audience. If you are talking to people in your field, you can feel free to burrow down and use specialized shared knowledge. But if you are talking to nearly anybody else, you need to take a very different approach.
We deal with this all the time in our work in computer graphics and virtual reality. If we are talking with colleagues, we can cut to the chase and use all sorts of technical terms. There is a huge based of shared knowledge that lets us do this.
But trying to explain what we are doing to anybody else is a very different proposition. We can’t assume any sort of specialized knowledge, so we often need to start out by using broad stroke and analogies.
Even worse, we sometimes need to fight against misinformation in the popular culture. There is a lot of shorthand out there which is at best confusing, and at worst downright wrong.
The phrase “everyone knows” is often the enemy of real knowledge. Sometimes I find that if I talk about something we are developing in VR to a person not in the field, they will try to correct me (wrongly) because of something they read in a magazine or saw on a TV show.
I’m still not sure what is the best response when someone does that. But I suppose, on balance, it’s good that they are interested.
Yesterday I talked about the possibility of visualizing narrative, for example the plot and character arcs of a novel, as a kind of architectural space within a virtual room in vr.
But what would be the best way to use movement in such a scenario? We could of course just have a static space, and our own movement through the space would provide all of the motion needed to understand what we are seeing. That would be a kind of static VR sculpture.
But why not make use of the fact that the space itself can change in response to our movements through it? We could illuminate narrative and structure by creating a responsive virtual world that changes as we approach and move away from various places within it.
Eventually, we could create an entire language of interaction between our movements and mutable architectural space. I don’t think this would be easy, but it sure would be an interesting thing to explore.
People have done many wonderful things in virtual reality, But one thing I still haven’t seen anyone do well in VR is creating a really good visualization of narrative.
Think, for example, of any novel you really love, one that completely drew you in. For me one example of this was Pride and Prejudice.
There are specific places where we are introduced to key characters, where relationships change, where important new information is revealed. There is a structure to all this, and in a great novel that structure is beautiful.
I would like to be able to walk into a virtual room in which all of that structure is arrayed all around me. And I would like to be able to learn from it, for key insights about the novel to immediately jump out at me.
I’m not sure how much of this is possible, but it seems like a great thing to work on. I might try my hand at it. If you are similarly inspired, I would love to hear how it goes!
The first time you watch a great TV series, you don’t know where it is going. Sure, the show’s creator had set up some basic arcs and relationships in the pilot episode, but you don’t know yet what the series are going to do with it all.
Yet when you rewatch the series from the beginning, you already know all that. Every planted conflict or pointed conversational barb has a purpose. And now you know just what that purpose is.
In some ways it is a much richer experience. You are being given the privilege of seeing inside the mind of a great writer.
I highly recommend it.