Why Per-Title Coding Ideas Still Play an Important Role

Per-Title encoding by theme is not very new technology. This article was published by Netflix in 2015. But until today, Per-Title coding thought is still playing an important role, through Per-Frame and even Per-Block.

More fine-grained segmentation can achieve higher coding efficiency. LiveVideoStack abstracts articles.

Text / Anne Aaron, Zhi Li, Megha Manohara, Jan De Cock and David Ronca

Translation / Gold Song

It took us several years to develop this coding method called “Per-Title”. Its core is to analyze each topic independently and choose the best coding method according to its complexity. Imagine that for a very complex action scenario, we use more bits to encapsulate the action information and only use fewer bits to encapsulate the fixed landscape and animation parts. This approach allows us to provide users with the same high-quality video viewing experience while reducing bandwidth consumption. This is especially important in countries with lower bandwidth and where users often watch video on mobile networks.

Related background

Under traditional television media such as terrestrial TV, cable or satellite TV, broadcasters will have their own fixed and available bandwidth. The video stream that they provide (or multiple programs provided at the same time) needs to reasonably occupy this part of the bandwidth. Broadcasters often use a statistical multiplexing method to efficiently allocate bit rates to multiple programs that are played at the same time, but their total bit rate is still limited by the limited amount of bandwidth. In most cases, the broadcaster will even ensure that the corresponding bit rate for a fixed channel is exactly the same by filling in empty (video) packets, thus wasting valuable data transmission rates. In addition, TV programs or movie episodes with low ratings will be assigned to lower bit rates (thus poor video stream quality) through pre-defined channel assignments, while high-rated programs will be in bits. Transfer on higher frequency channels.

With the advantages of Internet streaming, Netflix is ​​not limited by pre-allocated channels. Instead, we can provide our members with the best quality video stream, regardless of which program is tailored to the members' available bandwidth and viewing equipment. We use the optimized encoding method to precode the video stream at multiple bit rates. The client can instantly select the best-resolution video stream based on adaptive streaming media algorithms to maximize video quality while avoiding playback disruptions due to video loading.

How can we choose the best coding method? This problem is not simple. For example, assuming that the user's available bandwidth is 1 Mbps, what resolution should we use to transmit H.264/AVC video? 480p, 720p or 1080p?

If 480p is used, the video may not show block distortion or ring distortion under 1Mbps bandwidth. However, if the member is using a high-definition device to watch video, the over-sampled video will have a lower resolution. On the contrary, if we encode with 1080p, the transmitted video resolution will be higher, but at the same time the transmission bit rate may be so low that most pictures will have annoying coding distortion.

Universal coding

We first deployed H.264/AVC encoding at the end of 2010, when our video engineers chose (at the time) the best encoding for all the video categories we provided. They tested a variety of encoder configurations and set the various parameters of the encoder by actually looking at the comparison method so that different types of video can achieve the best quality. In the end, we chose the bit rate-resolution ladder listed in the table below to encode the video to ensure that the video will not be severely distorted.

For most content, this "one size fits all" approach does achieve high quality encoding given bitrate limitations. However, in some special cases, such as scenes where the camera noise or film grain noise is high, the video stream transmitted at the highest bit rate of 5800 kbps still suffers from block distortion in the area with much noise. For the simple content such as animation, it is very wasteful to transmit a 1080p video stream at 5800 kbps. In addition, users with a network bandwidth limit of 1750 kbps may be able to view the animation in HD resolution, not limited to the SD resolution specified by the bit rate ladder.

Many topics in the Netflix video set have high signal feature diversity. In the figure below we show the diversity of 100 randomly sampled topics. We used the x264 encoder to encode 100 sources with 1080p resolution under constant quantization parameter (QP) ratio control. For each quantization parameter, we calculate the bit rate (in kbps) for each type of topic as the x-axis value, and measure the PSNR (peak signal-to-noise ratio in dB) of the video quality as the y-axis. Values ​​to get these data points.

From these curves, it can be seen that certain topics can achieve very high PSNR (45 dB or higher) at a bit rate of 2500 kbps or lower. At the same time, other topics need to be transmitted at a bit rate of 8000 kbps or higher to achieve an acceptable minimum PSNR of 38 dB.

Given the diversity of signal features and user-defined bandwidth for different topics, it is clear that we find it difficult to find a universal solution to achieve the best viewing quality for each topic. In some cases, using a pervasive solution can also result in an excessively high bit rate of allocation, which greatly exceeds the bit rate required to improve video quality, resulting in a waste of storage space and transmission bits.

Note on video quality metrics: In the charts above and below, we use PSNR as a measure of video quality. PSNR is the most commonly used measure in video compression. Although PSNR does not always reflect the perceived video quality of the human eye, it is a simple measure of video fidelity (ie, 45 dB for video fidelity, 35 dB for video distortion), and it is also well represented. The (distribution) trend of the video quality within each topic. We can also use other quality metrics for analysis, such as VMAF awareness metrics. VMAF is a new perceptual quality measure developed by Netflix in collaboration with researchers at the University of Southern California.

The best encoding scheme for video content

Why Per-Title?

For a "simple" animated theme of content, each frame of the video consists mainly of flat areas, without camera or film grain noise, and the motion between frames is very small. Let's compare the quality curves obtained by using the fixed bit rate ladder and using the optimized bit rate ladder for the theme:

As shown in the above figure, video clips with a resolution of 1920×1080 are encoded at different bit rates. When the bit rate is 2350 kbps (A, step by subject bit rate), the video quality is high (PSNR is about 45 dB), when the bit rate increases to At 4300 kbps (B, fixed bit rate step) or even 5800 kbps (C, fixed bit rate step), the visual quality of the video does not increase significantly (when the PSNR is greater than or equal to 45 dB, the human eye cannot experience significant distortion). With a fixed bit rate step, we can only encode video segments with a resolution of 1280×720 at a bit rate (D) of 2350 kbps. As a result, users with about 2,350 kbps of bandwidth can only watch 720p videos instead of higher quality 1080p videos.

On the other hand, for an action movie, it has more interframe motion information and spatial texture information than animation. There are fast-moving objects, frequently changing scenes, explosions, and splashes in action movies. The figure below shows the quality curve we obtained based on a certain action piece.

If these high-complexity scenes are coded at a bit rate of 4300 kbps (A) at 1920×1080 resolution, encoding distortions such as block distortion, ring distortion, and contour distortion may result. Another method of quality compromise is to encode at a lower resolution of 1280x720 (B) to eliminate coding distortion at the expense of increasing the scaling. Coding distortion is usually more boring and easier to find than blurring caused by downsampling (before encoding) and oversampling (after decoding). For this action movie with a highly complex scene, the best possible way is to encode at a bit rate of more than 5800 kbps at 1920×1080 resolution, for example 7500 kbps, to completely eliminate encoding distortion.

In order to provide our members with the best quality video, each theme's video should correspond to a specially designed bit rate ladder for its complex features. In order to answer the following questions, Netflix's coding team has devoted a lot of energy to research and development over the past few years:

Given a topic, how many quality levels should be set so that there is just a minimum detectable difference (JND) between the coding results of adjacent levels?

Given a topic, what is the best resolution-bit rate pair for each quality level?

Given a topic, what is the highest bit rate required to achieve the best perceived quality?

Given a video coding result, how to obtain the perceived quality of human eyes?

How to design a robust and scalable production system to solve the above problem?

algorithm

In order to design an optimal per-subject bit-rate ladder, we have been limited by some practical conditions in determining the total number of quality levels and the corresponding bit-resolution pairs for each quality level. For example, we need to ensure backward compatibility (the video stream needs to support all previously certified Netflix devices), so the selectable resolution is limited to - 1920×1080, 1280×720, 720×480, 512×384,384 ×288 and 320×240 among these options. The selectable bit rate is also limited, where the incremental bit rate is approximately 5%.

In addition, we also considered other optimization criteria.

- The selected bit rate-resolution pair should be valid, ie the coding result obtained at a given bit rate should have the highest possible quality.

- There should be a visual perception difference between adjacent bit rates (corresponding to encoding results). Ideally, the perceived difference between two adjacent bit rates should be just below a minimum detectable difference. This ensures the smoothness of the video quality conversion when switching between bit rates. Given the wide range of perceived quality that bit rate ladders need to cover, we should also ensure that the number of quality levels is not too large.

To explain this problem more intuitively, we coded the video source with three different resolutions and various bit rates in the following example.

Coding with various resolutions at three resolutions. Blue mark indicates coding point, red curve indicates PSNR - bit rate convex hull

At each resolution, the encoding quality monotonically increases as the bit rate increases, but when the bit rate exceeds a certain threshold, the curves begin to flatten (A and B) because of the perceived quality of each resolution. The upper limit. When the video is downsampled to low resolution for encoding and then oversampled to full resolution for playback, its high frequency content will be lost.

On the other hand, at the same bit rate, the video quality encoded at high resolution may instead be lower than the video quality encoded at lower resolution (see C and D). This is because more pixels are encoded with lower precision, and fewer pixels are encoded with higher accuracy in combination with upsampling and interpolation, the former resulting in lower quality images. In addition, at a very low bit rate, the coding consumption associated with each fixed-size coded block begins to dominate in the bit rate consumption, and the remaining bits that can be used to code the actual signal are very few. Performing high-resolution encoding in case of insufficient bit rate will cause encoding distortions such as block distortion, ring distortion and contour distortion.

We can see that each resolution corresponds to a bit rate region that is better than the other resolutions. If we aggregate these regions for all available resolutions, they will collectively form a boundary called a convex hull. From an economic point of view, the convex hull is the area where the coding point reaches Pareto (optimum) efficiency. Ideally, we want to be able to operate precisely on the convex hull, but due to practical constraints (eg, the limited resolution that can be selected), we will choose the bit rate-resolution pair that is as close as possible to the convex hull.

Building a complete bit-rate-quality map covering the entire quality area for each type of video in our video library is impractical. To find a more practical solution, we experimentally coded different quantization parameters (QPs) at a limited set of resolutions. The choice of quantization parameter is based on "there is a minimum detectable difference between adjacent quantization parameters." We measure the bit rate and quality of each test code. After interpolating the curves according to the sampling points, we obtain the bit-rate-quality curve corresponding to each candidate resolution. By choosing the point closest to the convex hull, we end up with the thematic bit rate ladder.

Achievement

In the case of the cartoon BoJack Horseman, its frames are mainly composed of flat areas, and the motion between frames is also very low. In the fixed bit rate ladder scheme, we encode and transmit at 480p using a 1750 kbps bit rate. In the per-subject bit-rate ladder scheme, we first transmit 1080p video at a bit rate of 1540 kbps. Let's compare the screenshots of the two versions (above: 1750kbps, bottom: new 1540kbps) (assuming the monitor is 1080p). It is not difficult to see that the results obtained with the new coding method are clearer and the visual quality is better.

The video features of the American drama Orange is the New Black are even more complex. In the low bit rate area, the new per-topic coding scheme did not significantly improve the coding quality. In the high bit rate area, the new scheme has a bit rate of 4640 kbps configured for the highest resolution 1080p. Compared to the 5800 kbps used in the fixed ladder scheme, the new scheme saves 20% of the bit rate. For the video on this topic, we ensured our members excellent visual quality while avoiding wasting bits. The following is a screenshot of the bit rates 5800kbps (above) and 4640kbps (below).

The best coding solution for your device

In the above description, we have chosen an optimized per-subject bit-rate ladder, where there is an assumption that the viewing device can receive and play (support) encoded video of any resolution. However, due to hardware limitations, some devices may limit the resolution to a value lower than the original resolution of the video source. If we set the highest hull resolution to 1080p, it may result in a poor viewing experience for users on a 720p-only tablet. For example, for an animation-themed video, if the user's available bandwidth is 2000 kbps, we may switch the video stream's resolution to 1080p because its quality will be better than the 720p video stream at 2000 kbps. However, the user's tablet computer does not support 1080p encoding. Even if the available bandwidth supports higher quality (higher bit rate) 720p encoding, the bit rate of the video stream the user eventually receives will be limited to 2000 kbps.

To solve this problem, we designed another theme-by-bit rate step for the highest playable resolution of the device. More specifically, we designed the optimal per-bit-rate ladder for 480p and 720p devices. Although these additional encodings will reduce the overall storage efficiency of each theme video library, this move guarantees our customers the best viewing experience.

What does this mean for Netflix users?

The topic-based coding scheme enables us to provide higher-quality video in two ways: Under low-bandwidth conditions, coding by subject can provide you with a better quality “simple” theme video because of this type of video (eg BoJack Horseman ) will now be presented at a higher resolution at the same bit rate. When the available bandwidth is suitable for high bitrate encoding, per-theme encoding can provide you with a better quality complex subject video, such as Marvel's Daredevil, because this type of video will be encoded at a higher maximum bitrate. It will provide better viewing experience for members while reducing bandwidth consumption and become a better Internet steward. This has always led us to continue to innovate.


Encoding Disk

Encoding disk refers to a digital encoder that measures angular displacement. It has the advantages of strong resolving power, high measurement accuracy and reliable work. It is one of the most commonly used displacement sensors for measuring the position of the shaft rotation angle. The code disk is divided into two types: absolute encoder and incremental encoder. The former can directly give the digital code corresponding to the angular position; the latter uses a computing system to make the pulse increment generated by the rotating code disc against a certain reference number addition and subtraction. The metal encoding disk is directly engraved with through and unconnected lines, and it is not fragile, but because the metal is not easy to corrode but easy deform, the accuracy is limited, and its thermal stability is an order of magnitude worse than that of glass. It is mainly used in motor, hardware, electrical appliances, automobile and other fields.

We customize diverse patterns metal encoder disk with drawings provided by customers. We are equipped with professional metal etching equipment and exposure development equipment. The raw material use for encoder disk are SUS304 stainless steel and the flat degree can below 0.02 mm. Our stainless-steel encoder disk can achieve flat, no burr and can be available in harsh environment.


Stainless Steel Encoder Disk,Encoder Disk,Metal Encoding Disk

SHAOXING HUALI ELECTRONICS CO., LTD. , https://www.cnsxhuali.com