The first five seconds, revisited

I found Rhema’s comment on yesterday’s post very insightful. It seems quite plausible that it is the virtuous loop of instant high bandwidth feedback when meeting somebody in person that allows our brains to function at higher capacity, as we feel each other out in real time.

Which suggests a more refined version of my original question: Could we systematically add and subtract various aspects of “being there” to figure out which elements of meeting in person are the most important?

For example, perhaps we can separate people by a pane of glass, or have them interact through video of varying latency or resolution, or with varying sound fidelity, or with/without stereo depth, etc.

At some point we might discover a decisive dimension which holds the key, more than other dimensions, to the sense of “being there” when meeting somebody.

Of course, we still need to come up with an objective criterion: What question shall we ask, to understand how much our two participants were fully “present” in their mutual encounter?

I am open to suggestions!

6 thoughts on “The first five seconds, revisited”

sally says:

February 9, 2013 at 2:39 am

I don’t think that this should be treated as a linear process. It’s a distributed one. Thus, I don’t think that you can “pull” out pieces to get to it. Think of the process like hologram storage or something. The whole is in each bit.
admin says:

February 9, 2013 at 7:23 am

Fair enough Sally. Given that, what sorts of things might you suggest for an “objective criterion” to evaluate any given attempted approximation of this hologram?
sally says:

February 9, 2013 at 11:52 am

I don’t think you can, which is why it’s such a challenging problem, (and likely why no one has solved it yet).

Mostly it’s because the things you are looking to remove to test this have embedded cultural knowledge (ECK) within them. This is the hologram part, the ways that the pieces are related and connected.

See M.D. Fischer:

http://anthropunk.com/xwiki/wiki/anthropunk/download/Miguel/AnthroSystems/MFischerAnth.pdf

It isn’t that all humans are the same. We have different cultural values, different preferences. As we say in our papers, “same sensors, different experiences.” Thus, with it being such a highly heterogenous issue, it’s nearly impossible to tease out pieces when each piece has attached to it some cultural filter.

This new journal of Multisensory Research may be of interest though:

http://www.brill.com/publications/journals/multisensory-research
Francisco Cabrita says:

February 9, 2013 at 3:30 pm

While I do agree to some extend to sally’s point of view, I also do believe that beyond cultural differences our common physiological and psychological background prevail in the first seconds of an encounter.

We are like leafs on branches. Far apart from each other, but if we shake the branch we all react.

Things like latency, flickering effect, field of view, resolution are definitely more important.
A latency above 20 ms will definitely break the sense of presence. Feedback loop is tremendously important. This all related to our understanding, interpretation and extrapolation of events (or events to be). See free energy principle: http://www.ncbi.nlm.nih.gov/pubmed/20068583

Also the limited field of view of screens and distortion of camera matters a lot in this feeling of “being there”.
Last exemple is the camera shiftting: the camera is not located where your own eyes are… and we must solve that (rendering a parallaxed version of the live-recorded meeting?) before being able to decide which objective criterion is the most important.

And to come back to your original question, I think objective criterions could be:
– The conversation duration (like a speed dating process)
– The gazing time (for extraverted people :))
– The amount of change in your own behavior (knowing that some is here changes the way we behave)

..and sorry if it’s not all that clear… say jetlag is to blame 🙂
Kamyar says:

February 10, 2013 at 4:02 pm

Well, I believe that first-five-second impression is associated with a number of probably physical features of the person (physical because in those first five seconds you are only exposed to the apparent features of the person) and a method to map these features to the types of personalities we already know of. This is sort of a classification problem where our classes are typical characters we have met in the past and they differ from person to person and probably culture to culture.

Using pattern recognition and machine learning methods we may find both the reference classes and the most salient features (the decisive dimensions you are looking for) by a considerable amount of experience, which can be used (or in another view are already used) in a machine (or in our mind) to map the features to personality classes.

After this stage we can analyze these salient features to see if with the available technology they can be converted, conveyed, and reconstructed into physical features again using digital information or not.

And about the objective criterion, I think a good criterion can be comparing the type of personality recognized after those first five seconds and after a considerable amount of time. If the result of these two are the same, the person you have met and guessed their personality has been present enough in those first five seconds.

Furthermore, if we find the feature-space and classification references (typical personalities) we may say how far the first guess is from the final perception as a measure of “being present” in the first five seconds.
PhilH says:

February 14, 2013 at 5:15 am

I think the two keys are direct gaze and audio location.

The times when people seem remote are when their voice is tinny and borgified by the various audio codecs, and when you can’t look at them directly on webcam.

Audio quality is easier; just use higher quality codecs and decent, perhaps stereo, microphones. You can get a good microphone for £50 these days from Blue Microphones or Samson.

Gaze on video is hard. To do it properly, somehow you need both people to stare at the camera, but since the camera is not in the display and people move their heads around, it’s very difficult to replicate that.

We talk to people in the dark and don’t feel they are distant, so I think audio-only is fine if it sounds right. Adding gaze-less video makes it worse, I think; somehow you have to take away the cues that remind you someone is not actually here.

6 thoughts on “The first five seconds, revisited”

Leave a Reply