3D-TV research has been active for more than a decade. It has the ability to give viewers an enhanced impression of depth by presenting the scene from slightly varying viewing angles to both eyes. This is also referred to as stereoscopy. The IDT has been actively involved with capturing, processing, compression, transmission and rendering processes of 3D-TV.
Among the greatest challenges in 3D-TV is the capturing of a scene from multiple cameras, where the cameras need to be well calibrated throughout capturing for accurate 3D scene rendering, and furthermore, the positioning of cameras need to be well adjusted for the maximum scene reconstruction performance. The extracted depth maps from the captured multi-view videos play a crucial role in the success of a rendered 3D scene. Besides, compression of the multi-view video content is another challenge. Exploitation of inter-camera correlations in removing compression redundancy is another dimension on top of the 2D video compression. It is also important to deliver the compressed multi-view data to clients under unfavourable network conditions, such as intermittent connection losses and packet erasure.
For 3D video to be a mass market success, it is imperative to make viewing of 3D video at least as comfortable as traditional television. Hence, visual discomfort that is associated with 3D viewing needs to be minimised by appropriate methods and the dynamics of users’ quality of 3D media experience need to be well understood.
The research work carried out by the Institute of Digital Technologies team has focused on the following areas of 3D-TV:
- 3D video capturing and post-production processing
- Depth map extraction, processing and coding
- Error resilient transmission of 3D video
The ability to render high quality views for multi-view displays is essential. Depth Image Based Rendering (DIBR) approach is prominent for this purpose. Unlike other rendering techniques, DIBR outputs high quality rendering results if the depth information is accurate. Hence, high quality depth map estimation has been a key research item at the Institute of Digital Technologies. The depth maps are extracted using the multi-view videos and the corresponding camera calibration parameters. The extracted depth maps are post-processed to yield optimised depth maps. The post-processing involves adaptive filtering techniques to improve the spatial and temporal consistency of depth maps, hence depth map coding overhead is reduced while the quality of rendered views increases. Multi-view rectification is performed to align the epipolar lines of the cameras, whereas colour correction is applied to increase both the stereo matching and video compression performances.
One of the approaches is to compress the depth maps based on the minimisation of rendering distortions rather than the minimisation of block coding error of depth map macroblocks themselves. Hence, the encoder’s rate-distortion optimised mode decision cycle is modified to calculate the distortion in the rendered image using the reconstructed depth macroblock and the DIBR technique. In another method, the depth maps are downsampled to reduce the bit rate required for their transmission and are upsampled at the receiver side in an adaptive way. This method is based on filters that are adaptive along the edges of the depth map. Edge aware upsampling after decoding allows the conservation and better reconstruction of critical object boundaries.
Exploiting the unique correlations existing between the colour and their corresponding depth images leads to more error resilient video encoding schemes for 3D-TV. An error resilient 3D video coding approach that exploits the correlation of motion vectors in colour and depth video streams has been developed. The motion estimation process is performed jointly between the colour video and the corresponding depth video, such that there is only one set of motion vectors for both streams. The motion vector in one stream can be easily recovered from the other stream in case of a packet loss event. Experimental results have shown that significant gains in the quality of rendered views can be achieved with the developed joint motion estimation and motion vector sharing scheme under packet loss conditions.
The Institute of Digital Technologies researchers have also investigated joint 3D content coding and the underlying media distribution network design. In the specific contexts of two EU funded projects DIOMEDES and ROMEO, multi-view 3D video accompanied by the depth maps are encoded in a scalable way, such that each camera view comprises more than one layer of compressed representation at various qualities. As long as the network conditions allow, all quality layers are delivered to the clients, resulting in the highest possible 3D scene reconstruction performance. Peer-to-Peer (P2P) distribution overlay is coupled with the scalable encoded 3D multi-view video in the application domain. Quality layers of each camera view are encapsulated into different IP packet streams that can be discarded selectively based on the available network bandwidth and the 3D viewing quality constraints. Furthermore, Multiple Description Coding (MDC) is exploited in combination with the P2P distribution overlay in order to retain resiliency against IP link errors. The P2P distribution overlay developed in the ROMEO project is based on the creation of multiple application level multicast trees, disseminating 3D multimedia data packets across peers. Using multiple multicast trees inherently facilitates path diversity that is suitable for multiple description coding. Therefore, various descriptions belonging to the same video or depth representation are distributed to the same client across different paths, where if one of the paths fails temporarily, the effect on the 3D video reconstruction is not detrimental.