AI Technology's Foray Into The Enchanting World of Video Compression

By Anubhav Singh Samsung R&D Institute India-Bangalore
By Raj Narayana Gadde Samsung R&D Institute India-Bangalore
By Balvinder Singh Samsung R&D Institute India-Bangalore
By Min Woo Park Samsung Research
By Kwang Pyo Choi Samsung Research

Images and videos have been at the forefront of digital media consumption for a long time. All aspects of video and image capture, transmission, and display have seen leaps of innovation in recent times. Content creation has evolved considerably over the years, for example, 2D to 3D capture in mobile devices, AR/VR capture, point clouds, 3D meshes, Generative AI based image/video generation, etc. Resolution has also increased many fold, from the SVGA (800x600) to HD, FHD, & now going beyond UHD. Additionally, display technology has evolved from LCD to OLED to AMOLED with newer capabilities of High Dynamic Range (HDR) and Wide Color Gamut (WCG). With these advancements, even the smallest of visual artefacts can lead to a poor user experience. The definition of end user of encoded videos has also changed over time. Use cases where content is consumed by Machine Learning (ML) algorithms rather than human eyes have also emerged. A human eye may not notice very low-level details, but an ML algorithm might find them to be highly crucial for performing the desired tasks.

Video codecs have to adapt the huge variation in all aspects of media consumption. Traditional video codecs have relied on statistics and image processing techniques at their core for long. The success of AI in generic image processing tasks make the AI-powered video codecs a promising proposition. Exploration of such techniques in industry as well as standard bodies such as Moving Picture Experts Group/ Alliance for Open Media (MPEG/AOM) has shown good evidence for its viability, but has also revealed the challenges or unsolved technical problems. AI based pre- and post-processing have already been developed and deployed in many commercial solutions and with standardization of JPEG AI (Neural Network based Image compression) [1], the prospects of AI in video compression seem to be optimistic.

Video Codecs: Standardization & Evolution over time

Popularity of a video codec is heavily reliant on its global standardization. Adoption of a common set of specific rules for interpreting a compressed media bitstream is achieved through rigorous collaborative development and consensus among global stakeholders. As a result, the standardization of video codecs has led to mass adoption and usage across the industry. H.264/AVC and HEVC are two of the most widely used codecs in video compression applications.

Video codecs have seen an evolution from the days of H.261 to the most recent VVC to cater to new kinds of content, need for a richer user experience and display devices as shown in Fig. 1. The earliest video codecs were limited to handling 2D media. Over the years, contents like AR/VR, Screen Content, Point Clouds and Multi-camera capture have given rise to newer and more robust codecs with wide variety of tools. Combined with this is the ever-increasing video consumption rate. According to estimates [2], roughly 75% of internet traffic is caused by video content. With the advent of consumer centric video delivery platforms, especially short video apps, there is a bigger need for achieving higher quality at extremely low bitrate.

Figure 1. Display Resolutions, Display types and content variations over recent past

Advancements in related domains also feed into the constant evolution of newer compression methodologies. Quality consideration for bit budget allocation is an important factor for any compression method. Some quality metrics can even highlight the inefficiencies in terms of subjective quality, necessitating coding tools that improve the perceptual video quality. Increased compute power and advancements in hardware over the years has also enabled experimentation and implementation of coding tools with higher complexity.

Combining AI, which has proven to work in these aspects, and traditional compression, which has a proven record of achieving huge levels of compression, seems the way forward.

AI-Powered Video Codecs: A Fresh Trajectory

The need, compulsion and the opportunity opened up by the use of deep learning for video coding has made it a promising proposition.

Neural Network (NN) based methods exhibit deeper understanding of data-redundancy compared to the ones identified by traditional approaches leading to greater compression. In addition to pixel redundancy, they also extract the perceptual redundancy in the spatial-temporal domain of videos achieving gains over traditional codecs
Block based, transform guided processing in traditional codec result in several artefacts. Emphasis on subjective quality and the need to have differential encoding based on human psycho-visual systems have given NN based coding an edge over block-based codecs
NN based video coding provides the flexibility and versatility as it may only define the NN model architecture using the basic operations in the standard. Content or use-case specific models can be trained to generate the new weights
Improved computational capabilities enable NN methods to be run completely on general-purpose hardware, like NPUs, reducing the dependency on special hardware for codecs. Such advancements reduce the time-to-market for newer standards
NN based codecs can also be developed for different and newer types of use-cases more easily compared to traditional codecs. E.g., Video Coding for Machines (VCM), Video generated using Generative AI, etc.

The spectrum of using deep learning in video codec can be classified into three main categories as shown in Fig. 2:

NN based Pre-processing & Post-processing methods with traditional video codecs
Hybrid Codec with some NN based modules
End-to-End AI codec

Figure 2. Perforation of Deep Learning in and around Traditional Video Codec

As the standardization bodies are still exploring the avenues of NN based codec standard, many in the industry have already started looking into ways to get an early advantage. Solutions like AI ScaleNet, which down-scales the input to the encoder and up-scales the output of the decoder using AI methods are already deployed in consumer use-cases to enable high resolution video calling at low resolution network bandwidth. End-to-End codecs with Variational Auto Encoder (VAE) at their core are in early exploratory stages as they come with significant complexity.

Figure 3. Major AI tools being explored in Hybrid codecs

The most promising way of combining AI and video coding currently is the hybrid approach, wherein few of the modules of the codec are replaced or augmented by using AI. As these methods do not aim to replace the codec, they can be lightweight and the developers have better control and understanding of such modules. Some of the technologies being explored using Neural Network [3] are shown in Fig. 3.

AI based Intra prediction: Using NN to better model the blocks from boundary pixel has shown to achieve upwards of 3% BD-BR gain over conventional methods. Matrix-based Intra Prediction (MIP) is one such method, which has been adopted in VVC and has its origin in NN
AI based Inter prediction: Use of AI for finding the better motion candidate, motion prediction, and reference frame generation has shown to achieve ~2% BD-BR gain
AI based In Loop Filter: This is one of the most encouraging method of using AI in the codec pipeline. It also draws parallels from the plethora of research already done in image/ video enhancement as it is closer to that. Recent study on AI based In Loop Filters in JVET have shown a BD-BR gain of 5-10% with 17-500 kMAC/pixel (MAC – multiply and accumulate)
Scalers: Use of AI based up scaling and down scaling of input as well reference pictures to achieve better compression gains showing gains of the tune of 2-3% BD-BR
Post Filtering: Post filtering is another way of incorporating NN methods in the codec without explicitly changing the inherent codec. Supplemental Enhancement Information (SEI) for conventional video codec is being developed to send specific information about post processing according to the intended purpose

AI-Powered Video Codecs: Challenges & Way Forward

Though NN based video coding seems to be the obvious choice going forward, it has its own challenges. To name a few,

Complexity: The complexity of currently explored NN based video coding techniques is very high which hinders its ability for full-fledged adoption. For instance, the simplest of AI tools with marginal gain compared to VVC has more than 40x the complexity of VVC [3]
Quality metric: The widely used PSNR BD-BR are well suited for traditional codecs using pixel based processing. Lack of widely accepted quality metric for visual quality, taking into consideration the HVS, has made it challenging to objectively portray the advantage of NN based codecs over traditional ones
Nature of Artefacts: Employing AI for video compression may change the pattern of compression artefacts. Many users have become familiar with seeing blockiness, ringing, smoothening, etc. in videos, but the artefacts generated by AI methods may seem completely new
Generalizability of codecs: NN models are heavily data dependent. Traditional codecs have the advantage of relying only on the statistics of the frames and thus can have the upper ground in terms of generalization capability. Such an exploration is still needed for developing NN based codecs

Though these challenges may seem daunting at first, the future seems bright for AI Codec. Recent trends and explorations clearly show that there is huge opportunity for this technology with a strong desire from the standardization and product perspective. Standardization of JPEG AI, development of newer and more robust quality metrics, data independent models, the incessant advancements in hardware capabilities and hardware friendly design consideration of newer approaches are great leaps towards solving each of the challenges.


[1] JPEG AI Common Training and Test Conditions, ISO/IEC JTC 1/SC29/WG1, 98th JPEG Meeting, 100421, Sydney, Australia, Jan. 2023.

[2] Video streaming to the extreme,

[3] E. Alshina, F. Galpin, Y. Li, D. Rusanovskyy, M. Santamaria, J. Ström, R. Chang, Z. Xie, “EE1: Summary report of exploration experiment on neural network-based video coding”, JVET-AG0023, Joint Video Exploration Team (JVET), January 2023