Quality of experience

The Quality of Experience (QoE) is often used in information technology and consumer electronics domain to indicate the overall satisfaction with the service users receive. Institute of Digital Technologies researchers have longstanding experience in 3D audio-visual QoE. Their research portfolio extends over fundamental research on human visual perception, modelling 3D audio-visual experience and modelling the impact of ambient environmental factors on the perceived quality.

Traditionally, communication service providers assumed service level parameters such as Quality of Service (QoS) to quantify the level of users’ experience. However, service level measures do not provide sufficiently accurate figures on the actual user experience for ever growing numbers of different services operated on the Internet and other communication platforms. QoE, on the other hand, provides more comprehensive measures that incorporate not only the QoS but also content quality and service quality such as reliability, availability, and usability. It is a rather subjective measure, which is immensely user dependent. Hence, modelling QoE objectively is broadly challenging. Industries which especially deal with audio-visual media processing and delivery would highly benefit from a cost effective objective model, which can resemble the actual QoE on the service that they provide. One of the main focuses of the Institute of Digital Technologies researchers has been on the modelling 3D audio-visual experience.

Institute of Digital Technologies researchers have actively been researching on the modelling spatial audio experience and 3D visual experience. The spatial audio experience is modelled using four Key Performance Indicators (KPIs). These are the basic audio quality, front and surround spatial fidelity, and spatial and temporal correlation between the audio and video. In contrast, the visual experience can broadly be modelled using two KPIs, which are the Image Quality (IQ) and Depth Perception (DP). These KPIs are modelled using the knowledge on psycho-acoustic and psycho-visual information gained from the fundamental research. The theoretical models had previously been verified using extensive subjective experiments. Within the lights of the considered 3D audio-visual QoE dimensions, in the previously conducted EU funded DIOMEDES project, the approach has been to determine the KPI at the sender side and estimating the QoE at the receiver side to take necessary adaptation actions. The KPI models are used by the sender to extract KPI metadata while the QoE models are used by the receiver to compute the QoE value of the received 3D multimedia content.

As an example, the impact of the spatial correlation between the spatial audio and the 3D video on the Quality of Experience has been investigated through a set of extensive subjective experiments. In these experiments, the angular directional mismatch between the presented auditory scene and the 3D video was used as the parameter indicating Audio-Video Correlation (AVC), which can be perceived when watching multi-view video contents accompanied by multichannel spatial audio. Five different angular deviations (5, 15 and 25 degrees to the left, and 10 and 20 degrees to the right) on the auditory scene have been applied to create different test conditions. The obtained results have indicated that in particular the perceived QoE degradation is statistically significant and large as the angular mismatch is firstly introduced at as small as 5 degrees. For the angular mismatch from 5 to 20 degrees, the perceived QoE seems to decrease, although the tendency may not be statistically significant. As the angular mismatch increases from 20 to 25 degrees, the QoE degradation is again statistically significant and the most noticeable. This implies that the AVC degradation, if unavoidable, should be kept such that the angular mismatch between audio and video is less than 20 degrees.

Again, as part of the 3D audio-visual experience modelling, an extensive research work has been carried out to accurately model the QoE for stereoscopic 3D video consisting of a left view and a right view. For the developed metric, two main types of artefacts have been considered, namely the structural distortions and blurring, which are common in stereoscopic video content encoded with block based video codecs such as H.264 and HEVC. Structural distortions refer to the alteration of object structures in a video frame. For instance, the orientation of edges in a video could change due to compression. In block based video codecs, each encoded video frame is divided in to a collection of blocks, where if a block is encoded independently without considering features of the surrounding blocks, block borders tend to be visible in the compressed video. On the other hand, blurring is the process of decreasing the amount spatial information in a video. Compression of video is one cause of blurring because quantisation suppresses high spatial frequencies. As the quantisation step size increases, a greater proportion of high spatial frequencies get lost and the blur amount increases. Blur in general is perceived by the human visual system as the loss of sharpness in the image.

The full reference quality metric relies on the estimation of the structural distortions and amount of blurs in the left and the right views, and also the spatial and temporal indices of both the videos and the estimated disparity map from the stereoscopic views. The performance of that metric has been compared against a number of the state-of-the-art stereoscopic quality assessment metrics, which has suggested that the correlation coefficient with respect to the subjective QoE scores exceeds always 0.9 (1.0 meaning exact match), whereas Structural Similarity metric stays below that percentage.