Drone HDR Infrared Video Coding via Aerial Map Prediction

Drone HDR Infrared Video Coding via Aerial Map Prediction

1. Introduction

In recent years more and more video data are acquired by infrared sensors supporting high dynamic range (HDR), where each pixel is represented by an integer value with a bit depth higher than 8 bits. Such sensors could be mounted on unmanned airborne vehicles (UAV) and used in many applications, such as inspections of district heating or other energy systems, search and rescue operations, night video surveillance and so on. Due to channel capacity limitations such HDR videos should be compressed before transmission (or storage). It can be performed by well-known image coding standards, such as PNG, JPEG2000, JPEG-XT or other methods, supporting high bit depth formats. However, these standards have a limited coding efficiency, because they do not take temporal redundancy between neighbor frames into account. In contrast, the latest video coding standard H.265/HEVC supports HDR video formats and provides a high coding performance due to motion estimation and compensation.

However, the UAV camera motion has several features which are not reflected well by the motion estimation model utilizing in H.265/HEVC standard:

UAV video has a camera rotation which cannot be well estimated by block-based motion estimation used in HEVC.
In many cases, each frame of UAV video has an area (around frame borders) which is not presented in previous frame(s). Therefore, such areas cannot be predicted well by motion compensation and encoded in Intra mode.
UAV can fly many times above the same area, but the number of reference frames allowed by HEVC is not enough to use the similarity between current frame and frames captured few seconds or minutes ago.

To address these features in [2,3] two similar approaches utilizing historical data have been proposed. First, before flying a huge video set covering wide area and long time (historical data) are prepared and stored at both the encoder and decoder. Then this data is used to improve the HEVC prediction performance. This approach allows to reduce the bit rate by approximately 30%. However, it requires a huge storage capacity and cannot give any benefits if the historical data for a given area is not collected. In this paper, we propose a novel HDR infrared video coding algorithm based on aerial map prediction which operates without historical data. In this approach we accumulate input frames in a buffer and use them to build an aerial map. The map is compressed by H.265/HEVC Intra and included into the overall bit stream. Then we apply global motion estimation to extract the most similar frame from the reconstructed aerial map and use it as an extra reference frame. The motion model used for the extraction reflects camera rotation which helps to improve the overall coding performance. Experimental results show that when the aerial map bit size is not taken into account (assuming that the UAV is flying many times above the same area) the proposed approach provides 20-60% bit rate savings comparing to the H.265/HEVC. If the map bit stream is included in the overall bit stream then for a test video with camera rotation the proposed algorithm provides 3-35% bit rate savings comparing to the H.265/HEVC.

For more detailed information please see [1].

2. Performance comparison

Experimental results were obtained for two test video sequences captured by Drone Systems ApS by a Flir Tau2 infrared camera with frame resolution 640x512, 16 bits per pixel. The first test video includes 101 frames and corresponds to a general case, when a drone is flying in one direction, rotating and returning back with a frame overlapping, approximately from 10-50%, i.e., the motion model assuming to be well reflected by the H.265/HEVC motion estimation. The second test video includes 70 frames and corresponds to the case when a drone is rotating from 0 to approximately 360 degrees, i.e., the motion model is too complex for the block-based motion estimation used in HEVC. The following figure shows corresponding tone mapped aerial maps for these videos. At the first stage, all frames were aligned using three parameters: rotation angle, vertical and horizontal displacements. Then the resulting pixel of an aerial map was computed as a weighting combination of the corresponding pixels in overlapped frames. The weights values for each frame were assigned relatively to the distance between a pixel and the center of the frame: smaller distance means a higher value.

The proposed video coding via AMP was embedded into HM 16.14 which is a reference software of H.265/HEVC video coding standard. The software was compiled with high bit depth support and run in monochrome16 profile, Tier=high, Level=8.5, with InputBitDepth=16, InternalBitDepth=16, InputChromaFormat=400, and ExtendedPrecision=1. The remaining parameters were set as in configuration file encoder_lowdelay_main_rext.cfg available in the reference software. The GOP Size was set to 4, i.e., 4 reference frames were used by default. In case of AMP, the list of reference frames was extended by additional frames extracted from the reconstructed aerial map.

The following figure shows rate-distortion performance of the HEVC codec with and without the proposed AMP.

For the proposed approach we show three scenarios.

In the first scenario called HEVC+AMP (map is not embedded) we store the aerial map without lossy compression and do not include the map bit rate into the overall bit rate. It helps to estimate an upper bound performance of the AMP or estimate the coding performance in case when a drone is flying many times above the same area. One can see that 20-60% bit rate savings are provided for the both test video sequences.
In the second scenario called HEVC+AMP (map is embedded) the map is lossy encoded and embedded into the overall bit stream. In case of test video 1, the corresponding aerial map bit rate is higher than the bit rate savings provided by the AMP. As a result, the proposed approach does not give any advantages comparing to the H.265/HEVC. In case of test video 2, the corresponding aerial map bit rate is less than the bit rate savings provided by the AMP. As a result, 3-35% bit rate savings comparing to the H.265/HEVC are provided. Herewith, the map bit rate takes from 50% to 90% of the overall bit rate.
In the third scenario we encode the first half of the video sequence via H.265/HEVC. Then the reconstructed frames are utilized to build an aerial map which is used for the AMP of the frames within the second half of the video sequence. The main advantages of this approach are that the encoding can be done without additional frame accumulation latency needed for the aerial map building, and the map should not be embedded into overall bit stream, i.e., the encoding performance is expected to be the same or better comparing to the H.265/HEVC. On the other hand, the map should be built on the decoder side too, and the encoding efficiency is limited by the overlapping level between the map and frames in the second half of the video sequence. Experimental results show that this approach provides 0.5-2% bit rate saving for test video 1 and 3-9% for test video 2.

Software

HEVC with aerial map prediction, Version 1.0 [download]

If you plan to use the software, please also refer [1].

References

[1] E.Belyaev, S.Forchhammer, 'Drone HDR Infrared video coding via aerial map prediction', 2018 IEEE International Conference on Image Processing, 2018.
[2] Ma Biao and A. Reibman, 'DashCam Video Compression using Historical Data', Picture Coding Symposium (PCS), 2016.
[3] Xu Wang, Jing Xiao et al.,'Cruise UAV Video Compression Based on Long-Term Wide-Range Background', Data Compression Conference (DCC), page 466, 2017.