Quality-of-Experience for XRTS 26.928

Immersiveness and Presence

Presence is the feeling of being physically and spatially located in an environment. Presence is divided into 2 types:

  • Cognitive Presence is the presence of one's mind. It can be achieved by watching a compelling film or reading an engaging book. Cognitive Presence is important for an immersive experience of any kind.
  • Perceptive Presence is the presence of one's senses. To accomplish perceptive presence, one's senses, sights, sound, touch and smell, have to be tricked. To create perceptive presence, the XR Device has to fool the user's senses, most notably the audio-visual system. XR Devices achieve this through positional tracking based on the movement. The goal of the system is to maintain your sense of presence and avoid breaking it.
    • Perceptive Presence is the objective to be achieved by XR applications and is what is referred in the following.

Presence is achieved when the involuntary aspects of our reptilian corners of our brains are activated. There are four components relevant for feeling present:

  1. The Illusion of being in a stable spatial place
  2. The Illusion of self-embodiment
  3. The Illusion of Physical Interaction
  4. The Illusion of Social Communication

The most relevant component from the technical aspect is the first one. This part of presence can be broken down into three broad categories, listed in order of most important to least important for their impact on creating presence:

  1. Visual presence
  2. Auditory presence
  3. sensory or haptic presence

Technical Requirements for visual presence:

  • Tracking: 6DoF tracking, 360 degrees tracking, sub-centimeter accuracy, quarter-degree-accurate rotation tracking, no jitter, comfortable tracking volume, above 1000Hz tracking frequency.
  • Latency: < 20 ms motion-to-photon latency, minimize the time of pose-to-render-to-photon (< 50 ms), fuse optical tracking and inertial measurement unit (IMU) data, minimize loop (tracker -> CPU -> GPU -> display -> photons), minimize interaction delays and age of content depending on the application.
  • Persistence: Low persistence - Turn pixels on and off every 2 - 3 ms to avoid smearing / motion blur; 90 Hz and beyond display refresh rate to eliminate visible flicker.
  • Resolution: Spatial Resolution: No visible pixel structure - you cannot see the pixels; Temporal Resolution: a sustained frame rate of 90Hz and beyond.
  • Optics: Wide Field of view (FOV) is the extent of observable world at any given moment and typically 100 - 110 degrees FOV is needed; Comfortable eyebox - the minimum and maximum eye-lens distance wherein a comfortable image can be viewed through the lenses; High quality calibration and correction - correction for distortion and chromatic aberration that exactly matches the lens characteristics.

The sense of presence is not only important to VR experiences, but equally so to immersive AR experiences. To achieve Presence in Augmented Reality, seamless integration of virtual content and physical environment is required. Like in VR, the virtual content has to align with user's expectations. For truly immersive AR and in particular MR, it is expected that users cannot discern virtual objects from real objects.

Also relevant for VR and AR, but in particular AR, is not only the awareness for the user, but also for the environment. This includes:

  • Safe zone discovery
  • Dynamic obstacle warning
  • Geometric and semantic environment parsing
  • Environmental lighting
  • World mapping
environmental awareness in XREnvironmental Awareness in XR Applications

For AR, to obtain an enhanced view of the real environment, the user may wear a see-through HMD to see 3D computer-generated objects superimposed on his/her real-world view. This see-through capability can be accomplished using either an optical see-through or a video see-through HMD.

Interaction Delays and Age of Content

Beyond the sense of presence and immersiveness, the age of the content and user interaction delay are of the uttermost importance for immersive and non-immersive interactive experiences, i.e. experiences for which the user interaction with the scene impacts the content of scene (such as online gaming).

  • User interaction delay is defined as the time duration between the moment at which a user action is initiated and the time such an action is taken into account by the content creation engine. In the context of gaming, this is the time between the moment the user interacts with the game and the moment at which the game engine processes such a player response.
  • Age of content is defined as the time duration between the moment a content is created and the time it is presented to the user. In the context of gaming, this is the time between the creation of a video frame by the game engine and the time at which the frame is finally presented to the player.
  • The roundtrip interaction delay is therefore the sum of the Age of Content and the User Interaction Delay. If part of the rendering is done on an XR server and the service produces a frame buffer as rendering result of the state of the content, then for raster-based split rendering in cloud gaming applications, the following processes contribute to such a delay:
    • User Interaction Delay
      1. capture of user interaction in game client,
      2. delivery of user interaction to the game engine, i.e. to the server (aka network delay),
      3. processing of user interaction by the game engine/server,
    • Age of Content
      1. creation of one or several video buffers (e.g. one for each eye) by the game engine/server,
      2. encoding of the video buffers into a video stream frame,
      3. delivery of the video frame to the game client (a.k.a. network delay),
      4. decoding of the video frame by the game client,
      5. presentation of the video frame to the user (a.k.a. framerate delay).

The four following categories are considered with respect to roundtrip interaction delay:

  1. Ultra-Low-Latency applications: roundtrip interaction delay threshold of at most 50ms latency.
  2. Low-Latency applications: roundtrip interaction delay threshold of at most 100ms latency.
  3. Moderate latency applications: roundtrip interaction delay threshold of at most 200ms latency.
  4. Non-critical latency applications: roundtrip interaction delay threshold higher than 200ms latency.