Real-time scalable video coding based on 3-D discrete wavelet transform

Real-time scalable video coding based on 3-D discrete wavelet transform

1. Introduction

In the field of multimedia communications there exist many important applications where live or real-time video data is captured by a camera, compressed and transmitted over the channel which can be very unreliable and, at the same time, computational resources or battery capacity of the transmission device are very limited. For example, such scenario holds for video transmission for space missions, vehicle-to-infrastructure video delivery, multimedia wireless sensor networks, wireless endoscopy, video coding on mobile phones, high definition wireless video surveillance and so on. Taking into account such restrictions, a development of efficient video coding techniques for these applications is a challenging problem.

The most popular video compression standards, such as H.264/AVC and H.265/HEVC, are based on the hybrid video coding concept, which is very efficient when video encoding is performed off-line or non real-time and the pre-encoded video is played back. However, the high computational complexity of the encoding and the high sensitivity of the hybrid video bit stream to losses in the communication channel constitute a significant barrier of using these standards for the applications mentioned above.

In this project a scalable video codec based on 3-D discrete wavelet transform is developed. The proposed codec is much faster than exisitng software implementations of H.264/AVC and H.265/HEVC standards, with compression performance close to the fast implementation of H.264/AVC. Finally, the video stream generated by 3-D DWT is robust to packet losses in a communication channel.

2. Scalable video coding and its advantages

The main idea of the scalable video coding is that encoder forms the bit stream from several layers: base layer and enhancement layers. For the next enhancement layers encoding the previous layers (that may include base layer) is needed. Each layer is characterized by its own bit rate and visual quality.

In the following example video data captured by a laptop camera is compressed and transmitted to two different laptops with different display resolutions and CPU performances. One laptop is receiving and decoding all video data, while the second one is receiving just part of the stream and decoding the video with smaller frame resolution. Thus, both devices are able to playback video in real-time.

The proposed 3-D DWT codec supports the following types of scalability:

Temporal scalability (different frame rates: 1, 1/2, 1/4, 1/8 and 1/16 frame rate);
Spatial scalability (different frame resolutions: 1, 1/2, 1/4, 1/8 frame resolution);
Visual quality scalability (up to 12 visual quality levels).

The following example illustrates compression performance of 3-D DWT codec, when a different quality bit streams are extracted from 6Mbps video stream. One can see that extracted compression performance (blue curves) is close to the one when multiple encodings are used (red curve).

3. Rate-distortion comparison with H.264/AVC and H.265/HEVC

Simulation results were obtained for the Full HD (1920x1080) video sequences 'Perestrian area', 'Rush hour', 'Sunflower', 'Tractor', 'ElFuente' and 'Kimono1'. For our experiments, the proposed video coding algorithm is compared with fastest software implementation of H.264/AVC standard (x264 codec) and fastest software implementation of H.265/HEVC standard (x265 codec).

One can see that the proposed 3-D DWT codec provides similar or even better compression perforomance than x.264 codec. It is important to notice, that both x264 and x265 codecs do not provide scalability (introducing of scalability significantly drops the compression performance).

4. Encoding speed comparison with H.264/AVC and H.265/HEVC

In all cases, the codecs were simulated using a constant bit rate mode and the encoding speed was measured without using any assemblers, threads, or other program optimization techniques. The encoding speed in this project is defined as the number of frames (with resolution 1920x1088) which can be encoded in one second on the hardware platform with processor Intel Core 2 DUO CPU 3.0GHz.

One can see that the proposed 3-D DWT codec is 2-5 times faster than fastest software implementation of H.264/AVC standard and 15-20 times faster than fastest software implementation of H.265/HEVC standard.

5. Video multiplexing

In many cases N video sources should be compressed and transmitted simultaneously by a single device over shared channel with rate R. One solution here is use of N separate video encoders, so that each encoder compress a single video at bit rate R/N. However, taking into account differences in statistical properties of the videos, some of them can be compressed with a high quality, while the remaining ones with a low quality. Another solution, is to use one joint encoder to compress all N video sources, so that different portion of the channel is allocated for each of them, but the achieved video quality for all sources are more fair. The proposed 3-D DWT codec provies real-time video multiplexing according to quality fairness criteria. The following figures illustrate difference between two described above separate and joint approaches. In this example, 6 videos: 'Vassar_0', 'Ballroom_0', 'Exit_0', 'Race1_0', 'Football' and 'Vtc1nw', with frame resolution 640x480 are compressed.

6. Robustness to packet losses

In this example we demonstrate the case when 3-D DWT and H.264/SVC video bit streams are transmitted over a communication channel with packet losses. For both codecs the packet length was set to 800 bytes and channel with independent packet losses was simulated. In case of H.264/SVC we have used JSVM 9.8 reference software with two spatial and five temporal scalable layers. GOP size and intra-frame period are set to 16. For error-resilient coding we have used flexible macroblock ordering with two slice groups and loss-aware rate-distortion optimized macroblock mode decision. Frame copy error concealment method is used at the decoder side.

This movie shows the performance of both codecs at the same channel rate (3000 kbps) for different packet loss ratios. One can see that the proposed codec is significantly less sensitive to packet losses and provides much better visual quality while H.264/SVC has a lot of frames with unrecognizable objects.

For better visual quality achievement the inter-packet loss protection based on Reed-Solomom error-correction codes can be applied. Moreover, the base scalable video layer can be protected using codes with a high redundancy level while the remaining layers can be protected with a lower redundancy level or not protected at all. The following figures demonstrate an expected visual quality (E[Y-PSNR]) for the proposed 3-D DWT codec under different bit rates and packet loss rates when inter-packet loss protection is used.

Software

3-D DWT codec, Version 1.0 (September 2012) [download]
3-D DWT codec, Version 2.0 (January 2017) [download]
3-D DWT codec, Version 2.1 (April 2021) [download]
3-D DWT codec, Version 2.2 (January 2025) [download]

If you plan to use 3-D DWT software, please also refer to the following papers:

References

[1] E.Belyaev, K.Egiazarian, M.Gabbouj, A Low-Complexity Bit-Plane Entropy Coding and Rate Control for 3-D DWT Based Video Coding, IEEE Transactions on Multimedia, 2013. [download]
[2] E.Belyaev, A.Vinel, A.Surak, M.Gabbouj, M.Jonsson, K.Egiazarian, Robust vehicle-to-infrastructure video transmission for road surveillance applications, IEEE Transactions on Vehicular Technology, 2015. [download]
[3] Belyaev, K. Egiazarian, M. Gabbouj, and K. Liu, A low-complexity joint source-channel video coding for 3-d dwt codec, Journal of Communications, 2013. [download]
[4] E. Belyaev, S. Forchhammer, and M. Codreanu, Error concealment for 3-d dwt based video codec using iterative thresholding, IEEE Communications Letters, 2017. [download]
[5] E. Belyaev, An Efficient Soft Decoding for Wavelet Video Compression with Global Motion Compensation, Digital Signal Processing and Its Applications — DSPA-2025, 2025. (submitted)