This section describes how specific audio and video information is stored in the Stream Properties Object. For more information, see sections 9.1 and 9.2.
11.1 Audio codec type-specific data in ASF
This section outlines how audio information is stored in the Stream Properties Object for the following codecs.
Codec name |
Format tag |
Notes |
Windows Media Audio |
0x161 |
Versions 7, 8, and 9 Series |
Windows Media Audio 9 Professional |
0x162 |
9 Series |
Windows Media Audio 9 Lossless |
0x163 |
9 Series |
GSM-AMR |
0x7A21 |
Fixed bit rate, no SID |
GSM-AMR |
0x7A22 |
Variable bit rate, including SID |
11.1.1 Windows Media Audio
The type-specific information for the Windows Media Audio codec is stored in the following format as part of the Type-Specific Data of the Stream Properties Object as outlined in section 9.1.
struct WMA_TYPE_SPECIFIC_DATA
{
WAVEFORMATEX wfx;
DWORD dwSamplesPerBlock;
WORD wEncodeOptions;
DWORD dwSuperBlockAlign;
};
11.1.2 GSM-AMR
The type-specific information for GSM-AMR is stored in the following format as part of the Type-Specific Data of the Stream Properties Object as outlined in section 9.1.
struct GSMAMR_TYPE_SPECIFIC_DATA
{
WAVEFORMATEX wfx;
// bit[0] = SID is used (must be zero in case wFormat == 0x7A21)
// bit[1] = varying bitrate is used (must be zero in case wFormat == 0x7A21)
DWORD dwFlags;
};
11.2 MPEG-4 Video type-specific data in ASF
This section explains how a video decoder should decode the stream header information (decoder configuration) contained in ASF. This recommendation ensures that the decoder will be compatible with current and future encoded content that makes use of the MPEG video FourCC codes MP4S and M4S2. To achieve this full compatibility, only a very simple test is needed in the decoder software to decode the video headers correctly.
11.2.1 Background
There are two raw MPEG-4 video stream types, with FourCCs MP4S and M4S2. At the frame level, all MPEG-4 compatible decoders can decode both stream types, but there is a difference at the level of the initial header information. MP4S is an earlier format that was based on MPEG-4 version 1 of August 1999 and was compatible with the MPEG ISO reference software of that time. Consequently, some of the new version 2 header information is not included in this stream type. M4S2 is the new format and is fully compatible with MPEG-4 version 2 bit stream as it is now defined in ISO MPEG-4 document ISO/IEC-14496.
To achieve maximum compatibility, video decoders should be able to decode both stream types. This is very simple to achieve with only a tiny modification to the decoding process.
11.2.2 Decoding process
The video stream is made up of two parts: the video bit stream header, contained in the extended BITMAPINFOHEADER (or “EBIH”), and the video bit stream itself, consisting of compressed frames. To handle both FourCC types, it is necessary to make a software test to correctly decode the EBIH. There are no changes to the method of decoding the compressed frames.
The EBIH contains decoder configuration information:
In the case of the MP4S stream type, it has the format described in the section “Decoding MP4S header information”. The first start code is video_object_start_code. All remaining syntax elements are fully compatible with MPEG-4 with visual_object_verid = 1.
In the case of M4S2, the header has the format described in ISO/IEC-14496, and the first start code is visual_object_sequence_start_code, and all syntax elements are fully compatible with MPEG-4. However, to allow flexibility for those people generating short header streams using M4S2, we currently allow the whole MPEG-4 header (from Visual Sequence to Video Object Layer, inclusive) to be replaced by a single 22-bit short_video_start_marker. This is not compatible with MPEG-4, but is allowed for convenience.
The following diagram shows the difference between MP4S and M4S2 EBMI headers:
The following code should be used to handle both header types. It is based on examining the first few bits of the EBIH header.
if( peekbits(22) == 0x20 ) // short_header_start_marker
{
// Ignore the rest of EBIH; decode as short header stream.
}
else if ( peekbits(32) == 0x1B0 ) // visual_object_sequence_start_code
{
// M4S2
DecodeVisualSequenceHeader(…);
DecodeVisualObjectHeader(…);
}
else
{
// MP4S
// Assume default values for visual headers above.
// (See section 11.2.3.2 of this document.)
InitDefaultVisualSequenceAndObjectElements();
// Set default value of visual object syntax element
// to indicate version number of bit stream.
visual_object_verid = 1;
}
DecodeVideoObjectHeader(…);
DecodeVideoObjectLayerHeader(…);
This pseudo code is very simple to implement and provides full compatibility with existing and future encoders. An alternative to testing the bit stream is to test the value of the FourCC to decide which branch to take.
11.2.3 Decoding MP4S header information
Any stream containing the M4S2 FourCC includes the appropriate Visual Sequence and Visual Object headers.
To decode MP4S content, decoders should be able to handle a bit stream without Visual Sequence and Visual Object headers, and they should assume default values for these headers. (The default values are shown in the section 11.2.3.2.)
Decoding of the header should then start from the Video Object header.
11.2.3.1 Format of MP4S EBIH (Video Object and Video Object Layer headers)
Syntax element |
Number of bits |
Value from Windows Media Encoder
|
video_object_start_code |
32 |
0x00000100 |
video_object_layer_start_code |
32 |
0x00000120 |
random_accessible_vol |
1 |
0 |
video_object_type_indication |
8 |
0x01 (= simple object) |
is_object_layer_identifier |
1 |
0 |
aspect_ratio_info |
4 |
1 (= square) |
vol_control_parameters |
1 |
0 |
video_object_layer_shape |
2 |
0 |
marker_bit |
1 |
1 |
vop_time_increment_resolution |
16 |
resolution of the time line (see note 1) |
marker_bit |
1 |
1 |
fixed_vop_rate |
1 |
0 |
marker_bit |
1 |
1 |
video_object_layer_width |
13 |
Frame Width |
marker_bit |
1 |
1 |
video_object_layer_height |
13 |
Frame Height |
marker_bit |
1 |
1 |
Interlaced |
1 |
0 |
obmc_disable |
1 |
1 |
sprite_enable |
1 (see note 2) |
0 |
not_8_bit |
1 |
0 |
quant_type |
1 |
0 |
complexity_estimation_disable |
1 |
1 |
resync_marker_disable |
1 |
Can be 1 or 0. |
data_partitioned |
1 |
0 |
Scalability |
1 |
0 |
Padding to next byte boundary |
6 |
0b011111 (see note 3) |
Note 1: This field indicates the number of evenly spaced subintervals, called ticks, within one modulo time. One modulo time represents the fixed interval of one second. For details, see the MPEG-4 Visual standard (ISO/IEC 14496-2).
Note 2: visual_object_verid is assumed to be equal to 1; therefore, sprite enable has 1 bit.
Note 3: A zero stuffing bit followed by a number of one stuffing bits shall be present until the current position is on a byte boundary. As described above, the numbers of bits in MP4S EBIH before this field are 138, so 0b011111 is stuffed.
11.2.3.2 Default Values for Visual Sequence and Visual Object Headers when decoding MP4S
Syntax element | Number of bits | Default value |
---|---|---|
visual_object_sequence_start_code | 32 | 0x000001B0 |
profile_and_level_indication |
8 |
0x01 |
visual_object_start_code |
32 |
0x000001B5 |
is_visual_object_identifier |
1 |
0 |
visual_object_type |
4 |
0x01 |
video_signal_type |
1 |
0 |
Padding to next byte boundary |
2 |
0x01 |
Note that because visual_object_verid is not present here, its value is assumed to be = 1.