Culture Matters -- A Study On Presence In An Interactive Movie

% $/!\$ The paper is well organized and interesting and it would be even more so if the import of the results was emphasized more effectively, as well as the hypotheses guiding the experiment.

\section{Introduction}

% $/!\$ introduction ambient intelligence and chinese market
% $/!\$ I would suggest to refine the incipit of the paper, which starts quite roughly with some disconnected statements (‘The user’s character is believed to influence the user’s feeling of presence. The user’s cultural background is often mentioned as such a characteristic [1, 2]. A few crosscultural presence studies are available [3], but none investigated the relationship between the user’s cultural background and presence directly’)

The user's character is believed to influence the user's feeling of presence. The user's cultural background is often mentioned as such a characteristic \citep{IJsselsteijn+RidderETAL-Pres:00, Freeman+LessiterETAL-IntrPres:01}. A few cross-cultural presence studies are available \citep{Chang+WangETAL-Croscommmedilear:02}, but none investigated the relationship between the user's cultural background and presence directly. This influence is, at this point in time, more of a conjecture than a proven fact, and therefore we conducted an empirical study to investigate the relationship.

% $/!\$ If culture is the 'headline' result, the reader should be told more about the possible or actual differences in the cultural backgrounds of the two groups
% $/!\$ The introduction is very short for this manuscript. The author(s) go directly from intro into the methods. I understand there is not a lot of literature on presence and cultural differences, but I assume there is literature on cultural difference and use of technologies (I am familiar with citations).

In absence of a clear definition of what cultural factors may influence presence, a good approach is to include participants from clearly different cultures. Using Dutch and Chinese participants in our study optimized cultural diversion. Hofstede \citep{Hofstede-Geert:91} provides an empirical framework of culture in which the Dutch and Chinese culture differ substantially.

At the same time, we were interested in distributed interactive media and their influences on the user's feeling of presence. We have entered a new media era: passive television programs become interactive with the red button on your remote control \citep{Bennett-ButtRevoPowePeop:04}. Video games come with many different controlling interfaces such as dancing mats, EyeToy\registered{} cameras, driving wheels and boxing Gametraks\texttrademark{} \citep{Games-Game:05}. The D-BOX\registered{} Odyssee\texttrademark{} motion simulation system even introduces realistic motion experiences, that were originally designer for theme parks, into our living rooms \citep{D-BOX-D-BOMotiSimu:05}. In the vision of Ambient Intelligence \citep{Aarts+Marzano-EverViewAmbiInte:03}, the next generation of people's interactive media experience will not unfold only on a computer or television, or in a head set, but in the whole physical environment. The environments involve multiple devices that enable natural interactions and adapt to the users and their needs.

% The meaning of embodiment and distribution should be better explained: whose embodiment are they referring to? And in which sense is the system ‘distributed’? Is there a reason to duplicate the mediation by posing another interface between the user and the mediated environment where the movie was projected? And given this peculiar ‘distribution’ onto different interfaces, what was the ‘ virtual space’ with respect to which presence was measured? If it was the distributed interactive environment including the screen and the robot, then the user diverting attention from the movie to the robot cannot be considered an explanation of low degree of presence (para 4).

{zh} Traditionally the word \emph{distribution} in multimedia implies distributed data or process in a network or networks. The ambient intelligent environments bring up with a new type of distribution, that is, distributed and synchronized presentation of multimedia content over networked devices in such an environment, hence possibly distributed interaction with the content using more devices than just one. Distribution here may increase immersiveness of the system and the presenting content, thus enhance the end user's entertainment experience. {zh}

Distribution is not a new idea for enhancing the entertainment experience. Multichannel surround sound systems distribute sound all around the audience and hence provide a more realistic and natural sound experience. The ambient intelligence \citep{Aarts+Marzano-EverViewAmbiInte:03} concept goes beyond such sound distributions by distributing content through other channels in the user's environment. {zh} html(<strike>Each display in the room may show video clips and each lamp may change its color and brightness. </strike>) displays in the room may show video clips, lamps may change its color and brightness, robots may dance and sing, and couches may vibrate. The virtual space or the content, then, is no longer yielded in traditional audio and video materials by one TV set, but now expanded into the user's surroundings covering more sensory modalities. The light color, robotic behavior and the couch vibration are parts of delivered content, conveying an virtual experience but with a direct pysical embodiment. {zh}

{zh} However, distributing interactive content to multiple devices would also increases the complexity of interaction. The environment together may become difficult to understand and to control. To ease the situation, embodied characters may be used to give such an environment a concrete face (such as eMuu \citep{}), not only as one of the involved devices presenting robotic behaviors that give the content a physical body (such as Tony \citep{}). html(<STRIKE>Especially intelligent appliances, such as domestic robots, may present synchronized behaviors that give the content a physical body.)</STRIKE>) {zh}

% $/!\$ The argument in the introduction that "the physical embodiment may transfer more attention from the virtual environment to the physical environment" requires more explanation of how in an ambient intelligence environment the physical environment is supposed to, but may not, seamlessly incorporate the virtual environment, and why it might not.

However, the influence of embodiment on the user's presence experience seems unclear. On the one hand, embodiment extends the distributed content from an on-screen virtual environment to a physical environment. The physical embodiment improves the content's liveness and fidelity by stimulating more sensors of the user. This might result in an increased feeling of presence \citep{Lombart+Ditton-Hear:97}. On the other hand, the physical embodiment may transfer more attention from the virtual environment to the physical environment. This might break down the illusion of \emph{being there} and hence would result in less feeling of presence \citep{Freeman+LessiterETAL-IntrPres:01}.

To control interactive content, the user requires interaction devices. A physical embodiment would invite direct manipulation. A robot could, for example, ask the use to touch its shoulder to select an option. Interaction with a virtual on-screen character may favor the use of a remote control. Embodiment in interactive media can therefore not be studied without considering the interaction method. We therefore included two interaction methods in our study.

In this framework of interactive distributed media we defined the following three research questions: \begin{enumerate}

\item What influence has the user's cultural background on the users' presence experience. \item What influence does the embodiment of a virtual characters have on the users' presence experience? \item Would direct touching the presented content objects bring more presence than pressing buttons on remote controls?

\end{enumerate}

\section{Experiment}

We conducted a 2 (Interaction) $\times$ 2 (Embodiment) $\times$ 2 (Culture) mixed between/within experiment (see \fref{fig:TheInterview_conditions}). Interaction and culture were the between participant factors. Interaction had the conditions \expcondition{RemoteControl} and \expcondition{DirectTouch}, and culture had the conditions Dutch and Chinese. Embodiment was a within participant factor. Embodiment had the conditions \expcondition{ScreenAgent} and \expcondition{Robot}.

% $/!\$ Figure 1 is confusing because the connection between the big screen in all the conditions, and the remote control/touch and robot/screen agent isn't clear.

%attachment:figure1.png

\subsection{Measurements}

% $/!\$ and to clarify some aspects of the procedure, namely: a) validation of the modified presence questionnaire; the phrasing of the questionnaire has been modified (para 2.1) and this modification raises the issue of validity and reliability, which may have been changes since the original version of the questionnaire. The study may have actually carries out such validation but I couldn’t find any mention to it in the manuscript submitted.
% $/!\$ Likewise, given that two different ‘cultures’ were tested ( Dutch and Chinese), I suppose that different translations of the questionnaire have been used and again it would be necessary to specify whether some procedures have been followed to validate the two versions.

The questionnaire used was the ITC-SOPI \citep{Lessiter+FreemanETAL-CrosPresQues:01}. Only the definition of the \emph{Displayed Environment} in the introduction was adjusted to include the robot/screen character. The questions remained unchanged and are clustered into four groups: \begin{inparaenum}

\item \emph{Spatial Presence}, a feeling of being located in the virtual space; \item \emph{Engagement}, a sense of involvement with narrative unfolding within virtual space; \item Ecological validity, a sense of the \emph{naturalness} of the mediated content; \item \emph{Negative effects}, a measure of the adverse effects of prolonged exposure to the immersive content.

\end{inparaenum}

\subsection{Participants}

\subsection{Participants} $19$ Chinese and $24$ Dutch between the age of $16$ and $48$ ($14$ female, $29$ male) participated in the experiment. Most of them were students and teachers from Eindhoven University of Technology, with various backgrounds in computer science, industrial design, electronic engineering, chemistry, mathematics and technology management. The Chinese participants were no longer than two years in the Netherlands.

\subsection{Setup}

The experiment took place in a living room laboratory (see \fref{fig:TheInterview_setup}). The participants were seated on a couch in front of a table. The coach was $3.5m$ away from the main screen, which was projected onto a wall in front. The projection had a size of $2.5m \times 1.88m$ with $1400 \times 1050$ pixels. The second screen was located $0.5m$ from the coach, standing on the table. The secondary screen was $30cm\times23cm$ with $1280 \times 1024$ pixels LCD touch screen (Philips DesXcape Smart Display).

% The labels for the two images in Figure 2 seem to be reversed.

%attachment:figure2.png

In the \expcondition{Robot} conditions, the secondary touch screen was replaced with a Lego robot that had about the same height. In the \expcondition{ScreenAgent} conditions, the secondary screen displayed a full screen agent of the robot.

The behavior of the screen based agent and the Lego robot were identical. They played the role of a TV companion by looking randomly at the user and the screen, but always looking at the user while speaking. Speakers were hidden under the table and were used to produce the speech, which was based on the standard Apple Speech Synthesis software. At the start of every movie, the character introduced himself and its role.

% $/!\$ Much more description of the interactive movie in the "Setup" section would really help the reader - who made it? what are the production values? how long is it? who is the onscreen character depicted as being and what kind of job is he applying for? what are the two decisions? why/how is the experiment participant motivated to make the decisions? etc.
% $/!\$ What exactly does "The participants chose different options almost all the time" mean?
% $/!\$ Did all the subjects practice with the remote control even though some of them used touch interaction in the actual experiment, and if so, why?
% $/!\$ pg 2, para 3, The author(s) refer to the film as being designed “to be neither too exciting or too boring.” What were these decisions based on? Citations?
% $/!\$ Related to this issue above, was the film pretested?

The interactive movie was about a job interview in which the participants had to make decisions for the applicant. The story and movie cuts were designed to be neither too exciting nor too boring. The movie had two decision points, which resulted in four possible movie endings. The participants chose different options almost all the time. At every decision point camera would zoom in on the applicant's forehead. The applicant then cycled through two options in his mind. He looked first to the left and thought aloud about one option, before he looked right and thought aloud about the second option. In the \expcondition{remote} condition the screen would show one icon on the left and a different icon on the right. The icons were identical to two icons on the remote control. In the \expcondition{robot} condition, the participant had to touch the left or the right shoulder of the robot to make the decision.

{zh} A figure should be here to show the decision points {zh}

attachment:decision.png

\subsection{Procedure}

% $/!\$ What were the participants told they were doing? Were they told the purpose of the study or given a “cover” story?
% $/!\$ The person in the stimulus movie needs to be described. Was he Chinese? Or European? I was wondering about the participants identifying with the actor.
% $/!\$ How was the film presented? Perhaps the Chinese students felt the film was something important for them to learn about Dutch culture?
% $/!\$ The authors should explain why sex was not balanced between conditions

After reading an introduction that explained the structure of the experiment the participants started with a training session. In this session, the participants watched an unrelated interactive movie that had only one decision point, during which the participants could make the decision using the remote control. Afterwards, they had the opportunity to ask questions about the process of the experiment. Next, the participant were randomly assigned to one of the between-participant conditions, which each consisted of two movies and a questionnaire after each movie. The participant received five Euros for their efforts.

% $/!\$ did (or could) the researchers ask the participants for insights about the different levels of presence reported by members of the groups?

\section{Results}

% $/!\$ Don't use "nationality" instead of "culture" in the results
% $/!\$ Report reliability (Cronbach's alpha) for each of the 4 presence indices.
% $/!\$ The results would be a lot more interpretable if the means and SDs for each main effect were given and graphed separately.
% $/!\$ The ANOVA main effect results supercede the t-tests, so the latter should be cut unless there's a specific justification for them and the overall error rate isn't a problem.
% $/!\$ The direction of the differences in the significant results should be made more clear in the results section.

The mean scores for all measurements, including their standard deviations are presented in \fref{tab:TheInterview_mean_stddev} and graphically in \fref{fig:TheInterview_means}.

%attachment:table1.png

%attachment:figure3.png

\subsection{Embodiment, interaction and nationality effect}

A 2 (embodiment) $\times$ 2 (interaction) $\times$ 2 (culture) repeated measures ANOVA was conducted. Interaction had no significant influence on any of the measurements. Embodiment and culture both had significant influence on almost all measurements (see \fref{tab:TheInterview_embodiment}).

%attachment:table2.png

Interaction was removed as a factor from the further analyses since it had no effect on the measurements. The means for all remaining conditions are summarized in \fref{fig:TheInterview_nationality_embodiment} and were used as the basis for the further analyses.

%attachment:figure4.png

Paired Sample t-Tests were performed across both culture conditions. The measurements for Spatial Presence were significantly ($t(42)=2.235$, $p=0.031$) higher in the \expcondition{ScreenAgent} condition than in the \expcondition{Robot} condition. Negative Effects were significantly ($t(42)=2.38$, $p=0.022$) higher in the \expcondition{Robot} condition than in the \expcondition{ScreenAgent} condition.

Independent Samples t-Tests were performed. All measurements between the Dutch and the Chinese participants differed significantly, except for engage in the screen condition, which just missed the significance level ($t(41)=2.007$, $p=0.051$).

\section{Discussion}

% $/!\$ Can the authors at least provide more (educated) speculation about what's behind the culture results?
% $/!\$ A final note: the authors’ interpretation of the reason why Chinese participants may have been rated higher in their sense of presence is very speculative (‘One might speculate that the long-term orientation in Chinese culture would result in more patience towards imperfections. They might have more easily tolerated the noise emitted by the robot and the occasional visibility of a microphone in the movie’, para 4); cultural profiles cannot be treated so simplistically, they risk to reflect prejudices more than scientifically viable hypotheses.

The participants' cultural background clearly influenced the measurements. Chinese participants perceived more presence than Dutch participants in all conditions. One might suspect that the Chinese participants might simply be more polite in answering questions. Our measurements show that they also gave higher scores to Negative Effects and therefore did not simply respond politely. The next question will be what aspects of the cultural background have the greatest influence on presence. \Citet{Hofstede-Geert:91} suggested several categories through which cultures and organizations may be characterized, but none of them appear relevant to presence at first sight. One might speculate that the long-term orientation in Chinese culture would result in more patience towards imperfections. They might have more easily tolerated the noise emitted by the robot and the occasional visibility of a microphone in the movie. Further studies are necessary to investigate this issue.

% $/!\$ The argument that more presence should mean more negative effects is very arguable - e.g., simulation sickness can be the result of being 'in the environment' or of mismatching vestibular and audio/visual stimuli.
% $/!\$ The results for the use of screen agents versus robots are interesting. But the discussion only touches on the differences between the two for negative effects. Can the author(s) offer an explanation for the difference between the groups (both nationalities and experimental conditions).

The influence of embodiment on all measurements does not conform to the expected results defined in the construct of presence. If the user experiences high presence then all measurements should be high, including Negative Effects. Furthermore, Spatial Presence should be positively correlated with Negative Effects. However, in our results Spatial Presence and Naturalness are higher in the \expcondition{ScreenAgent} condition, while Negative Effects was higher in the \expcondition{Robot} condition. Negative Effects appear to have been affected by something else than presence.

During the experiment, the robot's motor emitted noise, which caused the participants to look at it. A moving physical object is potentially dangerous and hence attracts attention. Clearly, the robot emphasized the participants feeling of being in the room and not in the movie and thereby reducing the presence experience.

% $/!\$ Doesn't the argument that switching between the big screen and the robot also apply to switching between the big screen and a little screen?

The participants were frequently switching between looking at the movie and the robot and hence divide their attention. This switching made it hard for the users to stay focused and might cause the high negative experience. \Citet{Eggen+FeijsETAL-BreaFlowIntecomp:03} showed that a divided attention space reduced the users immersion. Further research is necessary to determine if divided attention increases the negative effects of multiple displays. The extra costs necessary to build and maintain a robot for an interactive movie appear unjustified in relation to its benefit.

The different interaction methods (using a remote control or touching directly) had no influence on the measurements. The participants did not experience more or less presence when they interacted with a remote control or with the screen/robot directly. This is to some degree surprising, since the participants had to lean forward to touch the screen/robot directly, while they could remain leaned back using the remote. The necessity to make a choice might have overshadowed the difference in physical movement. To create compelling sense of presence it might be useful to pay more attention to the physical output than to the input.

% $/!\$ What are the implications of the findings? Are they just meaningful for ambient intelligent VEs? / What about future research? / Overall, a very interesting paper to read, but much more can and should be added to it.

\section{Acknowledgments}

The authors would like to thank Loe Feijs, Maddy Jansen and Kees Overbeeke for their help and support. In addition, we would like to thank all participants of the study.

References

Reviews

1

Very interesting study and results, but the paper as written is unnecessarily difficult to follow and leaves out some important detail.

If culture is the 'headline' result, the reader should be told more about the possible or actual differences in the cultural backgrounds of the two groups - did (or could) the researchers ask the participants for insights about the different levels of presence reported by members of the groups? Can the authors at least provide more (educated) speculation about what's behind the culture results?
The argument in the introduction that "the physical embodiment may transfer more attention from the virtual environment to the physical environment" requires more explanation of how in an ambient intelligence environment the physical environment is supposed to, but may not, seamlessly incorporate the virtual environment, and why it might not.
Figure 1 is confusing because the connection between the big screen in all the conditions, and the remote control/touch and robot/screen agent isn't clear.
The labels for the two images in Figure 2 seem to be reversed.
Much more description of the interactive movie in the "Setup" section would really help the reader - who made it? what are the production values? how long is it? who is the onscreen character depicted as being and what kind of job is he applying for? what are the two decisions? why/how is the experiment participant motivated to make the decisions? etc.
What exactly does "The participants chose different options almost all the time" mean?
Did all the subjects practice with the remote control even though some of them used touch interaction in the actual experiment, and if so, why? / Don't use "nationality" instead of "culture" in the results
Report reliability (Cronbach's alpha) for each of the 4 presence indices.
The results would be a lot more interpretable if the means and SDs for each main effect were given and graphed separately.
The ANOVA main effect results supercede the t-tests, so the latter should be cut unless there's a specific justification for them and the overall error rate isn't a problem.
The direction of the differences in the significant results should be made more clear in the results section.
The argument that more presence should mean more negative effects is very arguable - e.g., simulation sickness can be the result of being 'in the environment' or of mismatching vestibular and audio/visual stimuli.
Doesn't the argument that switching between the big screen and the robot also apply to switching between the big screen and a little screen?
What are the implications of the findings? Are they just meaningful for ambient intelligent VEs?
What about future research?

Overall, a very interesting paper to read, but much more can and should be added to it.

2

This is a good start at exploring other individual differences that may impact the levels of presence experienced by media users. The manuscript is terse, but to the point. I think the manuscript would make an interesting presentation but I do not recommend it for publication. The following are some areas/items to consider for revision.

The introduction is very short for this manuscript. The author(s) go directly from intro into the methods. I understand there is not a lot of literature on presence and cultural differences, but I assume there is literature on cultural difference and use of technologies (I am familiar with citations).
The images in Figure 2 are reversed
The description of the procedures left me with several questions?
1. What were the participants told they were doing? Were they told the purpose of the study or given a “cover” story?
2. The person in the stimulus movie needs to be described. Was he Chinese? Or European? I was wondering about the participants identifying with the actor.
3. How was the film presented? Perhaps the Chinese students felt the film was something important for them to learn about Dutch culture?
pg 2, para 3, The author(s) refer to the film as being designed “to be neither too exciting or too boring.” What were these decisions based on? Citations?
Related to this issue in #4, was the film pretested?
The results for the use of screen agents versus robots are interesting. But the discussion only touches on the differences between the two for negative effects. Can the author(s) offer an explanation for the difference between the groups (both nationalities and experimental conditions).

3

The paper presents the results of a study where 3 variables were manipulated (Culture, Interaction and Embodiment) in order to examine their influence on some dimensions of presence.

The paper is well organized and interesting and it would be even more so if the import of the results was emphasized more effectively, as well as the hypotheses guiding the experiment.
I would suggest to refine the incipit of the paper, which starts quite roughly with some disconnected statements (‘The user’s character is believed to influence the user’s feeling of presence. The user’s cultural background is often mentioned as such a characteristic [1, 2]. A few crosscultural presence studies are available [3], but none investigated the relationship between the user’s cultural background and presence directly’)
and to clarify some aspects of the procedure, namely: a) validation of the modified presence questionnaire; the phrasing of the questionnaire has been modified (para 2.1) and this modification raises the issue of validity and reliability, which may have been changes since the original version of the questionnaire. The study may have actually carries out such validation but I couldn’t find any mention to it in the manuscript submitted.
Likewise, given that two different ‘cultures’ were tested ( Dutch and Chinese), I suppose that different translations of the questionnaire have been used and again it would be necessary to specify whether some procedures have been followed to validate the two versions.
The authors should explain why sex was not balanced between conditions
The meaning of embodiment and distribution should be better explained: whose embodiment are they referring to? And in which sense is the system ‘distributed’? Is there a reason to duplicate the mediation by posing another interface between the user and the mediated environment where the movie was projected? And given this peculiar ‘distribution’ onto different interfaces, what was the ‘ virtual space’ with respect to which presence was measured? If it was the distributed interactive environment including the screen and the robot, then the user diverting attention from the movie to the robot cannot be considered an explanation of low degree of presence (para 4).

I appreciated the originality of the study, its structure and the insights that can derive from its results. With the abovementioned aspects clarified will it be possible to interpret the results.

A final note: the authors’ interpretation of the reason why Chinese participants may have been rated higher in their sense of presence is very speculative (‘One might speculate that the long-term orientation in Chinese culture would result in more patience towards imperfections. They might have more easily tolerated the noise emitted by the robot and the occasional visibility of a microphone in the movie’, para 4); cultural profiles cannot be treated so simplistically, they risk to reflect prejudices more than scientifically viable hypotheses.