Counting to ten

Today, for the first time, I experienced a really satisfying video teleconferencing experience. I wasn’t expecting to. After all, like many people reading this blog I have had years of unsatisfying and somewhat uncomfortable experiences of teleconferencing. Skype chat, and before that iChat and its cousins, all promise more than they deliver. I’m sure I’m not the only one who has deliberately turned off his video capability, because the results are just too jarring and disappointing.

But today a friend of mine, who works for the network company CISCO, invited me to a little demo, in which we talked for an hour to a friend/colleague in San Francisco over a high quality video teleconferencing hook-up. Having spent all these years being disappointed by video teleconferencing experiences, I wasn’t expecting much.

Except this time it was different. This time, I felt as though I was having a completely comfortable conversation with somebody who was three thousand miles away. Not that everything was perfect. For example you still don’t have the experience of looking into each others’ eyes – that would require each participant to look at the camera, rather than at the image of the other person on the screen. Or some technological fix to adjust the apparent aim point of the pupils – something that would be quite difficult to do reliably well.

But seeing somebody you know at life size, on a high end 65″ diagonal LCD screen, through a high quality video camera, with guaranteed 30 frames per second video, and good quality audio that is actually synchronized to the video, turns out to make the difference. We all actually felt as though we were simply having a conversation, as opposed to “accessing technology”. We all agreed that we’d be perfectly comfortable having meetings over this medium on a daily basis – something none of us had been expecting to be able to say before the demo.

Of all of the aspects of this demo, the one that I think was the most important was the excellent synchronization of video and audio. There seemed to be zero delay. Intellectually, I knew this was highly implausible, given the current state of technology. And yet, there it was.

So I tried an experiment. I told my friend in San Francisco to count to ten with me. I told him I would say “one … three … five … seven … nine”, and I asked him to fill in with the even numbers as quickly as he could. The results were interesting. What I and the people in the room with me heard was:

one … two three … four five … six seven … eight nine … ten

But what my friend in San Francisco heard was:

one two … three four … five six … seven eight … nine ten

Based on the lengths of the pauses, we concluded that there was about a half a second delay in the transmission. And yet, for any other communication, we simply could not detect this delay. Conversation seemed perfectly normal in every way.

For the remainder of the session we were all aware of this delay, and yet we could not detect it. It was unquestionably there, and yet the experience we all had was that there was no delay – a surprising result that I found to be quite intriguing.

The conclusion I reached was that even a half a second delay is essentially unnoticeable, if you have excellent time synchronization between audio and video. Given sight and sound signals arriving at exactly the same time, and in the absence of any other artifacts disrupting the flow of time, the human mind seems to just gloss over the delay, as people automatically adjust their flow of conversational turn-taking to compensate.

How surprising and delightful! For the first time, I am hopeful that in the future there will be truly useful video teleconferencing for everyone.

One Response to “Counting to ten”

  1. […] wrote a post here some months ago about what it is like to experience of really good quality telepresence. But I didn’t address the question of what this technology might mean for physical […]

Leave a Reply