Potential Standardisation AreasV 16.1.0

XR-Centric Device Types and Architectures

XR centric devices are the key enablers for XR services. Key aspects for XR devices are:

  • Rendering-centric device architectures using sophisticated GPU functionalities
  • Support for Tracking, in particular inside-out tracking in the device
  • Heavily power-constrained at least for certain form factors
  • Support for multiple decoding formats and parallel decoding of media streams

Extensions to 5G Media Streaming for XR/6DoF Media

Streaming XR and 6DoF is considered in several use cases evaluated in this Technical report. With the establishment of 5G Media Streaming in Release-16 in TS 26.501 and the stage-3 specifications, extensions of 5G Media Streaming to support XR experiences is useful. MPEG develops several new formats and codecs specifically addressing 3D and XR, but also proprietary formats exist that make use of regular hardware supported standardized media codecs and rendering functionalities. All of them have in common that they rely on existing and emerging device architectures that make use of existing video codecs and GPU rendering. In addition, the use of multiple video codecs in parallel is commonly applied. XR/6DoF Streaming is based on CDNs and HTTP delivery, however new functionalities are required.

Extensions to 5G Media streaming may be done in ordered to support the delivery of different XR/DoF media in 5G Systems. Relevant aspects are:

  • Additional media decoding capabilities including higher profile and levels, the use of multiple decoders in parallel to interface with rendering architectures, as well more flexible formats
  • Support for more flexible delivery protocols allowing parallel download of different objects and parts of objects
  • Viewport-dependent streaming
  • Potentially new 5QIs and radio capabilities to support higher bitrate streaming
  • Biometrics and Emotion Metadata definition and transport

Raster-based Split Rendering with Pose Correction

Split Rendering is a promising technology to support online gaming in power- and resource constrained devices. Split rendering requires the use of edge computing as the pose-to-render-to-photon is expected to be below 50ms. Rendering of rasterized frame buffers in the network allows to support XR devices with existing codecs using 5G System and radio capabilities. Relevant aspects for single-buffet split rendering include:

  • A simple XR split rendering application framework where a single frame buffer per media per eye is shared and final pose correction is done in the XR device
  • 2D video encoders and decoders that are capable encode and decode 2K per eye as well as 90 fps as well as typical 2D rasterized graphics centric formats
  • Integration of audio into split rendering architectures
  • Formats and protocols for XR Pose information delivery and possibly other metadata in the uplink at sufficiently high frequency
  • Content Delivery protocols that support split rendering
  • 5QIs and other 5GS/Radio capabilities that support split rendering
  • Edge computing discovery and capability discovery based on work in SA2 and SA6

XR conference applications

XR conversational applications within real or computer generated virtual environments is considered in several use cases evaluated in this Technical report. Some work is already ongoing in IVAS, ITT4RT, however, potential additional normative work includes:

  • Study the mapping of XR conference applications to the 5G system architecture including Media streaming and MTSI.
  • Support for media processing in the network (e.g. NBMP) (e.g. foreground/background segmentation of the user capture, replacement of the user with a photo-realistic representation of their face, etc.)
  • 6DOF metadata framework and a 6DOF capable renderer for immersive voice and audio.
  • Support of static/dynamic 3D objects’ formats and transport for real-time sharing
  • Transport of collected data from multiple sensors (e.g. for spatial mapping)
  • Format for storing and sharing spatial information (e.g. indoor spatial data).
  • Content Delivery protocols that support XR conversational cases
  • 5QIs and other 5GS/Radio capabilities that support XR conversational cases
  • Edge computing discovery and capability discovery based on work in SA2 and SA6

Augmented Reality for New Form Factors

Glass-type AR/MR UEs with standalone capability, i.e., that can be connected directly to 3GPP networks are interesting emerging devices. Such a type is classified as XR5G-A5 in Table 4.3-1. However, it would also be necessary to consider situations where XR5G-A5 UEs have to fall back to XR5G-A1 or XR5G-A2, i.e., to the wired or wirelessly tethered modes, e.g., from NR Uu to 5G sidelink or IEEE 802.11ad/y. Furthermore, an evolution path for devices under the category XR5G-A3 and XR5G-A4 should be considered. Further studies are encouraged, among others

  • Basic use cases: a set of use cases relevant for XR5G-A5 can be selected from Table 5.1-1. The preferred cases will be those capable of delivering experiences previous or existing services could not support, e.g., real-time sharing or streaming 3D objects. They also have to be easier to realize in the environments of glasses that are more limited than those of phones.
  • Media formats and profiles: for the selected use cases, available formats and profiles of the media/data can be discussed. Sharing of XR and 3D Data is of interest for personal and enterprise use cases as documented in scenario 5.2. Properly generated XR data can be used in AR applications on smartphone devices as well as on AR glasses. Exchange formats for AR-based applications are relevant, for example for services such as MMS.
  • Transport technologies and protocols: in case the selected use cases include relocation or delivery of 3D or XR media/data over 3GPP networks, combinations of transport protocols, radio access and core network technologies that support the use cases at relevant QoS can be discussed. If existing technologies and protocols cannot serve the use cases properly, such gaps can be taken into account in the consideration of normative works.
  • Form factor-related issues: in Table 4.3-1, typical maximum transmit power of XR5G-A5 is 0.5-2W while phone types transmit at 3-5 W. However, if XR5G-A5 is implemented in a form factor of typical glasses, i.e., smaller than goggles or HMDs and with a weight less than 100 g, its cellular modems and antennas are located near the face. In this case, XR5G-A5 UEs can have more constraints on transmit power and it would be necessary to develop solutions to overcome it, e.g. considering situations where XR5G-A5 UEs have to fall back to XR5G-A2 from NR Uu to 5G sidelink or IEEE 802.11ad/y. Furthermore, an evolution path for devices under the category XR5G-A3 (3-7W) and XR5G-A4 (2-4W) should be considered.

Traffic Characteristics and Models for XR Services

As identified in the course of the development of this report, there is significant interest in 3GPP radio and system groups on the traffic characteristics for XR services. This effort should be a prime work for 3GPP to collect realistic traffic characteristics for typical XR services. Of specific interest for other groups in 3GPP is a characterization of traffic of an XR service in the following domains:

  • Downlink data rate ranges
  • Uplink data rate ranges
  • Maximum packet delay budget in uplink and downlink
  • Maximum Packet Error Rate
  • Maximum Round Trip Time
  • Traffic Characteristics on IP level in uplink and downlink in terms of packet sizes, and temporal characteristics. Such characteristics are expected to be available for at least the following applications
  • Viewport independent 6DoF Streaming
  • Viewport dependent 6DoF Streaming
  • Single Buffer split rendering for online cloud gaming
  • XR conversational services

The Technical Report on Typical Traffic Characteristics in TR 26.925 should be updated to address any findings to support the 3GPP groups.

Social XR

Social XR is used as an umbrella term for combining, delivering, decoding and rendering XR objects (avatars, conversational, sound sources, streaming live content, etc.) originating from different sources into a single user experience. Social XR may be VR centric, but also may apply to AR and MR.

Social XR is expected to integrate multiple XR functionalities such a 6DoF streaming with XR conversational services. Some normative work may include:

  • Social XR Components – Merging of avatar and conversational streams to original media (e.g., overlays, etc.)
  • Parallel decoding of multiple independently generated sources.
  • Proper annotation and metadata for each object to place it into scene.
  • Description and rendering of multiple objects into a Social XR experience.

Generalized Split and Cloud Rendering and Processing

Edge/Cloud processing and rendering is a promising technology to support online gaming in power- and resource constrained devices. Relevant aspects for generalized cloud/split rendering include:

  • A generalized XR cloud and split rendering application framework based on a scene description
  • Support for 3D formats in split and cloud rendering approaches
  • Formats and protocols for XR Pose information delivery and possibly other metadata in the uplink at sufficiently high frequency
  • Content Delivery protocols that support generalized split/cloud rendering
  • Distributions of processing resources across different resources in the 5G system network, in the application provider domain (cloud) and the XR device.
  • Supporting the establishment of Processing Workflows across distributed resources and managing those
  • 5QIs and other 5GS/Radio capabilities that support generalized split/cloud rendering by coordination with other groups
  • Edge computing discovery and capability discovery based on work in SA2 and SA6

Conclusions and Proposed Next Steps

In this study, frameworks for eXtended Reality (XR) have been analysed. XR referes a larger concept for representing reality that includes the virtual, augmented, and mixed realities. After defining key terms and outlining the QoE/QoS issues of XR-based services, the delivery of XR in the 5G system is discussed, following an architectural model of 5G media streaming defined in TS 26.501. In addition to the conventional service categories, conversational, interactive, streaming, and download, split compute/rendering is identified as a new delivery category. A survey of 3D, XR visual and audio formats was provided.

Use cases and device types have been classified, and processing and media centric architectures are introduced. This includes viewport independent and dependent streaming, as well as different distributed computing architecture for XR. Core use cases of XR include those unique to AR and MR in addition to those of VR discussed in TR 26.918, ranging from offline sharing of 3D objects, real-time sharing, multimedia streaming, online gaming, mission critical applications, and multi-party call/conferences.

Based on the details in the report, the following is proposed:

  • In the short-term:
    • Develop a flexible XR centric device reference architecture as well as a collection of device requirements and recommendations for XR device classes based on the considerations in clause 7.2. Device classes should include VR device for 6DoF streaming and XR online gaming (XR5G-V4), as well as AR devices (XR5G-A1, XR5G-A4 and XR5G-A5).
    • Develop a framework and basic functionalities for raster-based Split Rendering for Online Gaming according to the considerations in clause 7.4.
    • Document typical XR traffic characteristics in 3GPP TR 26.925 based on the initial considerations in this report, in particular clause 7.7 and support other 3GPP groups in designing systems for XR services and applications.
    • Address simple extensions to MTSI to support XR conversational services based on the considerations in clause 7.5
    • Study detailed functionalities and requirements for glass-type AR/MR UEs with standalone capability according to clause 7.6 and addresses exchange formats for AR centric media, taking into account different processing capabilities of AR devices.
  • In the mid-term:
    • Based on the work developed in the short-term for raster-based split rendering, an extended Split and Cloud Rendering and Processing should be defined based on the considerations in clause 7.9, preferably preceded by a dedicated study
    • Address simple extensions to 5G Media Streaming to support 6DoF Streaming based on considerations in clause 7.3. Stage-2 aspects related to TS26.501 should be considered first before starting detailed stage-3 work.
    • Based on the work developed in the shorter time frame above, address the considerations in more detailed considerations inclause 7.5 and clause 7.8 on Social XR

The work should be carried out in close coordination with other groups in 3GPP on XR related matter, edge computing and rendering as well in communication with experts in MPEG on the MPEG-I project as well as with Khronos on their work on OpenXR, glTF and Vulkan/OpenGL.