【转】Video Rendering with 8-Bit YUV Formats


http://msdn.microsoft.com/en-us/library/aa904813(VS.80).aspx

Summary

This article describes the 8-bit YUV formats that are recommended for video rendering in the Microsoft Windows operating system. This article presents techniques for converting between YUV and RGB formats, and also provides techniques for upsampling YUV formats. This article is intended for anyone working with YUV video decoding or rendering in Windows. (13 printed pages)

Introduction

Numerous YUV formats are defined throughout the video industry. This article identifies the 8-bit YUV formats that are recommended for video rendering in the Microsoft? Windows? operating system. Decoder vendors and display vendors are encouraged to support the formats described in this article. This article does not address other uses of YUV color, such as still photography.

The formats described in this article all use 8 bits per pixel location to encode the Y channel (also called the luma channel) and use 8 bits per sample to encode each U or V chroma sample. However, most YUV formats use fewer than 24 bits per pixel on average, because they contain fewer samples of U and V than of Y. This article does not cover YUV formats with 10-bit and 12-bit Y channels. 

Note   For the purposes of this article, the term U is equivalent to Cb, and the term V is equivalent to Cr.

This article covers the following topics:

  • Identifying YUV Formats in DirectShow — Explains how to describe Microsoft DirectShow? YUV format types.
  • YUV Sampling — Describes the most common YUV sampling techniques.
  • Surface Definitions — Describes the recommended YUV formats.
  • Color Space and Chroma Sampling Rate Conversions — Provides guidelines for converting between YUV and RGB formats, and for converting between different YUV formats.
  • Additional Information Provides additional information.

Identifying YUV Formats in DirectShow

Each of the YUV formats described in this article has an assigned FOURCC code. A FOURCC code is a 32-bit unsigned integer that is created by concatenating four ASCII characters.

There are various C/C++ macros that make it easier to declare FOURCC values in source code. For example, the MAKEFOURCC macro is declared in Mmsystem.h, and the FCC macro is declared in Aviriff.h. Use them as follows:

DWORD fccYUY2 = MAKEFOURCC('Y','U','Y','2');
DWORD fccYUY2 = FCC('YUY2');

You can also declare a FOURCC code directly as a character literal simply by reversing the order of the characters. For example:

DWORD fccYUY2 = '2YUY';  // Declares the FOURCC 'YUY2'

Reversing the order is necessary because the Windows operating system uses a little-endian architecture. 'Y' = 0x59, 'U' = 0x55, and '2' = 0x32, so '2YUY' is 0x32595559.

In DirectShow, formats are identified by a major-type globally unique identifier (GUID) and a subtype GUID. The major type for computer video formats is always MEDIATYPE_Video. The subtype can be constructed by mapping the FOURCC code to a GUID, as follows:

XXXXXXXX-0000-0010-8000-00AA00389B71 

where XXXXXXXX is the FOURCC code. Thus, the subtype GUID for YUY2 is:

32595559-0000-0010-8000-00AA00389B71 

Many of these GUIDs are defined already in the header file Uuids.h. For example, the YUY2 subtype is defined as MEDIASUBTYPE_YUY2. The DirectShow base class library also provides a helper class, FOURCCMap, which can be used to convert FOURCC codes into GUID values. TheFOURCCMap constructor takes a FOURCC code as an input parameter. You can then cast the FOURCCMap object to the corresponding GUID:

FOURCCMap fccMap(FCC('YUY2'));
GUID g1 = (GUID)fccMap;

// Equivalent:
GUID g2 = (GUID)FOURCCMap(FCC('YUY2'));

YUV Sampling

One of the advantages of YUV is that the chroma channels can have a lower sampling rate than the Y channel without a dramatic degradation of the perceptual quality. A notation called the A:B:C notation is used to describe how often U and V are sampled relative to Y:

  • 4:4:4 means no downsampling of the chroma channels.
  • 4:2:2 means 2:1 horizontal downsampling, with no vertical downsampling. Every scan line contains four Y samples for every two U or V samples.
  • 4:2:0 means 2:1 horizontal downsampling, with 2:1 vertical downsampling.
  • 4:1:1 means 4:1 horizontal downsampling, with no vertical downsampling. Every scan line contains four Y samples for every U or V sample. 4:1:1 sampling is less common than other formats, and is not discussed in detail in this article.

Figure 1 shows the sampling grid used in 4:4:4 pictures. Luma samples are represented by a cross, and chroma samples are represented by a circle.

Figure 1. YUV 4:4:4 sample positions

Figure 1. YUV 4:4:4 sample positions

The dominant form of 4:2:2 sampling is defined in ITU-R Recommendation BT.601. Figure 2 shows the sampling grid defined by this standard.

Figure 2. YUV 4:2:2 sample positions

Figure 2. YUV 4:2:2 sample positions

There are two common variants of 4:2:0 sampling. One of these is used in MPEG-2 video, and the other is used in MPEG-1 and in ITU-T recommendations H.261 and H.263. Figure 3 shows the sampling grid used in the MPEG-1 scheme, and Figure 4 shows the sampling grid used in the MPEG-2 scheme.

Figure 3. YUV 4:2:0 sample positions (MPEG-1 scheme)

Figure 3. YUV 4:2:0 sample positions (MPEG-1 scheme)

Figure 4. YUV 4:2:0 sample positions (MPEG-2 scheme)

Figure 4. YUV 4:2:0 sample positions (MPEG-2 scheme)

Compared with the MPEG-1 scheme, it is simpler to convert between the MPEG-2 scheme and the sampling grids defined for 4:2:2 and 4:4:4 formats. For this reason, the MPEG-2 scheme is preferred in Windows, and should be considered the default interpretation of 4:2:0 formats.

4:4:4 Formats, 32 Bits per Pixel
  • 4:2:2 Formats, 16 Bits per Pixel
  • 4:2:0 Formats, 16 Bits per Pixel
  • 4:2:0 Formats, 12 Bits per Pixel
  • First, you should be aware of the following concepts in order to understand what follows:

    • Surface origin. For the YUV formats described in this article, the origin (0,0) is always the upper-left corner of the surface.
    • Stride. The stride of a surface, sometimes called the pitch, is the width of the surface in bytes. Given a surface origin at the upper-left corner, the stride is always positive. 
    • Alignment. The alignment of a surface is at the discretion of the graphics display driver. The surface must always be DWORD aligned, that is, individual lines within the surface are guaranteed to originate on a 32-bit (DWORD) boundary. The alignment can be larger than 32 bits, however, depending on the needs of the hardware.
    • Packed format versus planar format. YUV formats are divided into packed formats and planar formats. In a packed format, the Y, U, and V components are stored in a single array. Pixels are organized into groups of macropixels, whose layout depends on the format. In a planar format, the Y, U, and V components are stored as three separate planes.

    Example: Converting RGB888 to YUV 4:4:4
  • Example: Converting 8-bit YUV to RGB888
  • Converting 4:2:0 YUV to 4:2:2 YUV
  • Converting 4:2:2 YUV to 4:4:4 YUV
  • Converting 4:2:0 YUV to 4:4:4 YUV
  • DirectShow SDK Documentation.