Introduction to Audio Formats on CD, SACD, DVD, DVDA and Blu-Ray


High Definition Sound Formats on Plastic – The Sound of Music Plastic

Digital Audio – The Format Wars

Video and Audio Formats on Plastic – Part 3 is an introduction to audio formats on CD, SACD, DVD, DVDA and Blu-Ray.

Mono audio was the first of all the (analog) audio formats and formed the basis for radio and analog TV for many years; including vinyl records. Alan Blumlien of EMI in the 1930’s patented analog stereo, to be first used on vinyl and films, and later for tape, FM radio and eventually TV. Blumlien is also credited with patenting surround sound. Stereo lasted for a long time, right up to the 1970’s when the first surround sound quadraphonic albums and tapes appeared; which made little inroads into the commercial world of audio due to costs and format incompatibilities, but who’s processing was used in films such as Star Wars in 1977.

So was unleashed the CD and DVD formats wars. Developments in data formats for both the DVD & Blu-ray discs and supporting electronics would eventually provide up to today’s 13 channels of audio, still growing, together with the new object based immersive sound formats. So with all this new found digital bandwidth the Blu Ray, and in particular the UHD Blu Ray, can now carry up to 32 channels of studio quality audio.

A History Lesson – For those too young not to remember.

Let me first disclose that I consider myself a bit of an audio ‘snob’. I am totally intolerant of poor audio whether its stereo or multichannel; and I will not tolerate any form of compression unless it is lossless when listening to music in my purpose built A/V room. No apologies here for still being in love with vinyl. (link to reviews) I have also accepted (a long time ago) the need for analog and/or digital signal processing in order to optimize the rooms acoustic performance, even if it was well acoustically designed in the first place.

First there was analog audio

I started my love affair with audio when mono was the norm for broadcasting and stereo records were in full swing. All the broadcast companies and recording studios I worked at were analog, only supporting two channels. Tremendous efforts by many very skilled maintenance and design engineers and researchers were put into analog audio circuit and system design in order to perfect it and achieve technical performances of systems that approached what our ears could resolve. However, the bandwidth requirements required to achieve that performance limited the number of audio channels that could be cost effectively supported by the available technology. Furthermore, there was a definite limit to the way you could manipulate analog audio in order to create sound effects; echo reverb etc. These issues together with the development of digitization of analog signals and its subsequent drop in costs moved the analog world of the studio to digital. This gave the producers more ‘tools’ to be creative in their productions… no too sure about that!! Yes, I still remember buying the very first studio quality digital processing systems from companies such as AMS (UK) and Lexicon (USA).

Many professional recording studios still use analog boards due to their ‘sound’ and use digital signal processing for outboard effects equipment. But you can still find studios using those old EMT plates for reverb and tape for echo; they have a unique sound and feel… something that digital equivalents have tried to copy.

Enter Digital Audio

Digital audio provided convenience at first, certainly NOT quality; jumping between CD tracks could now be done at a touch of a button. The original stereo 16 bit LPCM CD’s released in 1982 sounded terrible when compared to a stereo ½” analog mastering machine. As with all new technologies significant strides were made in the technology and techniques used for the conversion between analog and digital signals resulted in the Red Book performance of most of today’s CD players.

In order for digital audio to sound similar to analog audio very high sample rates (at least 44,100/sec) and word sizes (at least 16 bit) must be used for each mono channel (there are many other technical issues besides these two that also have to be addressed). This results in a similar problem to video in that the data rates and consequently bandwidths that are required to carry the signal are very high (a ‘low quality’ Red Book CD = 1.41Mb/s); a similar problem to digital video but a lot more scaled down. Our ears are our bodies most advanced processing system; still exceeding what we can to this day achieve with either analog or digital electronics. Compare that with our eyes, which can easily be fooled into believing that a full picture is there when actually, only half of it is at any one time I.E. interlacing, to note just one visual trick.

So CD’s became the main stay of audio slowly pushing vinyl out; not based on quality but convenience and business models; a pet peeve!

The desire to transmit stereo over the air and between locations resulted in a number of techniques that were developed in order to meet the mediums transmission requirements. Enter that dreadful term compression, the bane of my life, together with IP technology; more on this in later posts on the Direction of Professional Broadcasting.

Compression++

Wow!! Well, just as with video, compression certainly solved a lot of bandwidth issues allowing us to squeeze a ‘quart into a pint pot’. The ubiquitous MP3 compression format can go as low as 96,000 bits per second for stereo; a compression ratio of approximately 15:1.

So looking at some of the technical challenges of digital audio such as clock jitter, DACs and ADCs together with digital filters, op-amps and digital noise etc., it is not surprising that digital audio and compression sounds different to the original analog sound, and can be readily detected by many audiophiles. Note that I did not say it sounds bad, it sounds different and that is not a wholesale condemnation of digital audio, having many digital audio CD’s, DVD’s and SACD’s myself many of which sound outstanding. (I still prefer the sound of well mastered and pressed vinyl). add link to reviews

So while I understand and fully appreciate all of the benefits of compression, I shall not be discussing the various techniques and methods due to the complexity and length of such a series of posts. There are many papers, articles and standards documents available on the web that covers these techniques for those who are that interested.

Disc Supported Audio Data Formats

  • CD – LPCM – Linear Pulse Code Modulation – uncompressed
  • DVD-A – DVD-Audio – LPCM or Meridian Lossless Packaging (MLP)
  • SACD – DSD (Direct Stream Digital) – uncompressed
  • DVD – MPEG1 Layer 2, Dolby Digital AC3 and DTS – all compressed
  • Blu-Ray – LPCM, all DVD formats, Dolby True HD & DTS-HD Master Audio – Dolby & DTS are lossless compressed
  • UHD Blu-Ray – all Blu-Ray formats plus Dolby Atmos & DTS-X – both lossless compressed.

There have always been heated discussions between lovers of the Dolby and DTS formats. DTS lovers arguing that it sounds ‘better’ than Dolby. There is some credence to this argument in that DTS is generally encoded at a higher bit rate than Dolby. However, there is more to the final sound than just the bit rate. The algorithms that are used to create the compressed audio also count a great deal towards the final sound quality. From my experience, on my system, I must say that I do tend to find that DTS formats do provide a slightly better ‘bang for your buck’ and I tend to select them over Dolby if they are available.

See below for many of the common Dolby and DTS (Digital Theater Systems or Dedicated To Sound) surround formats.

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Dolby and DTS Audio Surround Formats


Digital Audio Formats in more Detail

In order not to bore all my readers to death and put you all to sleep, and for the less technical amongst us, I have only outlined the basics of each format. As always, some technical liberties have been taken in order to get a point across.

Linear Pulse Code Modulation – LPCM

LPCM, normally only referred to as PCM, was the first standardized digital audio format that was widely used by the audio industry. It initially supported sampling a mono channel at 44,100 times per second and converting the level at that instant to a 16 bit binary digital word. These fixed rates and bit levels chosen back then for CDs were at the limits of the available and cost effective technology; but are still used today in order to maintain compatibility and keep costs down. These days much higher sampling rates, up to 192,000 and longer digital words, up to 24 bits, can now be supported. This results in a much higher quality signal approaching that of the studio master tape when done right. The higher sampling rates and longer words produce very high data rates that cannot be stored by CDs. These higher quality audio formats can be found on DVD-Audio, DVDs and Blu-ray discs.

DVD-Audio

These discs are basically video DVD’s that may or may not also carry a significant amount of video or still images. Two fixed high bit rate formats maybe found on these discs, LPCM or Meridian Lossless Packaging (MLP). MLP is a proprietary technique used to compress LPCM and to be able to recreate the original LPCM signal with a bit for bit accuracy; known as mathematically lossless, hence the term lossless compression. The actual degree of compression is quite low at approximately 1.5:1. The additional data handling capacity of a DVD allows this format to be used for multi-channel audio, up to 6 at a maximum total bit rate of 9.6 Mb/s. It was also available on HD-DVD and on Blu-ray where it supports up to 8 channels at a maximum bit rate of 18 Mb/s. Its used on Blu-ray disc and forms the basis for Dolby True HD.

DSD – SACD

The audio on these discs in encoded using Direct Stream Digital (DSD). Invented by Sony and Phillips it never really gained much traction in the market, except for audiophiles, even though these discs are still available today. DSD uses pulse density modulation (what a mouthful). It samples the analog signal 2,882,400 times a second (64 times that of a CD) and creates just a single binary bit that indicates if the signal is larger or smaller than it was during the last sample. This information is then used to create a delta-sigma modulated bit stream (another mouthful), which is stored on the disc. This technique creates a fixed rate bit stream that has a dynamic range of up to 120dB and a frequency response up to 100Khz. The process equates to a LPCM signal sampled at 96Khz using a 20 bit binary word.

This format supports both stereo and multichannel surround sound up to 5.1 channels. The use of 0.1 channel (even in multi-channel DVD-Audio) often confuses users and apparently producers and engineers. The 0.1 channel was specifically designed by Dolby to support low bass in movies, typically below 80Hz, for which there are very definite level and mix standards. As there are no such standards for music production this channel is frequently incorrectly used, and contains both invalid levels and frequencies that should not be present. It should be used with care; on some discs it doesn’t even carry any significant information even though the disc is described as 5.1!

Double, quadruple and octuple sample rate versions of this technique were created by other manufactures but they never caught on.

Lots of arguments abound as to whether these disc sound better than high bit rate DVD-Audio; I am still on the fence, leaning towards SACD.

Dolby Digital and DTS surround formats

Dolby Digital or AC-3 was originally developed for use in cinemas and this format became the mandatory defacto surround sound format for all DVD and Blu-ray players. Discs must always contain a version of Dolby Digital while DTS is an optional added format. That’s licensing for you! The compression technique is proprietary to Dolby, its base format supports 5 satellite channels (20-20Khz) and one low frequency effects (LFE) channel (20-120Hz). Sample rates of up to 48KHz are supported creating a fixed bit rate of up to 448 Kbits/s for DVD and 640Kbits/s for Blu-ray.

Over the years Dolby has continually increased both the number of channels that the format can support by adding ‘extensions’ to it (see earlier Formats table) in order to maintain backward compatibility with older hardware and at the same time improve the audio quality of the channels. This resulted in support for up to 7.1 channels and bit rates of 3Mbits/s for Dolby Digital Plus (E-AC-3) and 18Mbits/s for Dolby True HD.

Dolby True HD is based upon MLP but differs significantly from the format used on a DVD-A. This 18Mb/s stream can support up to 16 channels with sample rates up to 192Khz and word lengths of up to 24bits. The data stream is a variable bit rate as opposed to a fixed bit rate as used on CDs, DVD-A’s and SACD’s. This means that depending upon the content of the audio channels the bit rate will change and be maximized in order to get the required lossless performance. Additionally the data stream carries metadata. This is additional information that is used for dialogue normalization (dialnorm) and dynamic range compression.

DTS is a proprietary codec developed by DTS Inc, (a company specializing in developing surround sound formats for both commercial and theatrical environments) in order to compete with Dolby Digital. The original core data stream supports the same number of channels as Dolby Digital with extension streams used to support more channels and features. By doing this, backward compatibility was maintained, as only hardware that recognized the additional data streams would provide the additional features. The format grew along side Dolby Digital to support the same number of channels and features. However, it can support far higher bit rates, up to 1.5Mbits/s, almost double Dolby’s 640Kbits/s. This translates into sound quality and transparency that is generally considered to be higher than that of Dolby’s. Unfortunately the higher DTS bit rates were not often used in DVD production being typically 755Kbits/s; this tended to level the quality playing field between to two formats. The data stream also carries metadata describing dialnorm and dynamic range compression.

DTS-HD Master Audio is an extension of the DTS Coherent Acoustics CODEC (coder/decoder) it is a combined codec that supports both lossy and lossless compression. Variable bit rates up to 24.5Mb/s can be supported with a maximum sample rate of 192Khz and a maximum sample word length of 24bits. These numbers being reduced depending upon the number of channels that are supported in order to meet the discs maximum audio bit rate capacity. The format can support an almost unlimited number of channels.


Immersive Sound Formats on Blu-ray

Introduction

The three competing immersive sound formats found on Bluray discs are Dolby Atmos, Auro 3D and DTS-X. Dolby Atmos and DTS-X formats are based upon the lossless versions of the audio codec’s used for Blu-ray and use spatial encoding for the audio tracks. The main difference is that the audio tracks (or objects) are NOT necessarily hard assigned to each speaker channel but steered or rendered to and between the satellite channels by the use of metadata contained in an additional data stream within the HDMI connection. This, with the use of additional side and overhead channels produces an audio ‘bubble’ in which the listener sits, allowing the sounds to be dynamically panned to and between speakers in order to match the action; thereby immersing the listener in the audio sound field.

Other immersive sound processing formats are in development, I.E. MPEG H, but this format is not to be used in the production of Blu-ray discs and will not be found on any DVD or Blu-ray. It is being developed for the distribution and streaming of UHDTV. I very much doubt whether these formats will be backward compatible with the current generation of decoders in order to allow current receivers etc. to decode them. If not, yet another reason for an upgrade!

Dolby Atmos technology in its theatrical form arrived in 2012, it can support up to 128 audio tracks and 64 unique speaker feeds! These tracks can be rendered or assigned to the speaker layout according to a metadata sub stream. In 2014 Dolby announced the home theater version of the format but it has a restricted channel support and the metadata is used to render the channels in a different way to the theatrical version. A separate sub stream is added to the True HD or Digital Plus audio stream that represents an abbreviated version of the original object based mix used for the theater. The home version supports up to 24.1.10 channels; meaning up to 24 discrete channels, one LFE channel and 10 overhead or Dolby enabled speakers; that’s a lot of flexibility!

At the time of writing the range of available receivers and pre-amps that support Atmos will only drive at best an 9.1.4 configuration, as shown below. For most households and many high-end HT owners I have to assume that this will be sufficient for the time being, due to the associated costs and complexities of installing such a large array of speakers and at the same time getting an optimal placement for all speakers.

It should be noted that the speaker configuration selected, of which there are many, is unique to Atmos and will not be optimal for any other immersive formats – see DTS-X.

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Image credit: Dolby Installation Guidelines

Dolby Atmos 9.1.4 Speaker Layout

There are many speaker configurations that Dolby Atmos can support ranging from the most basic 5.1.2 to the more complex 9.1.4 shown above.

For further information please click on: Dolby Atmos® Home Theater Installation Guidelines

DTS-X

SRS Labs, now purchased by DTS, spearheaded the setup of the 3D Audio Alliance. This resulted in a new proposed open sourced platform to become known as Multi Dimensional Audio (MDA). The MDA was responsible for the concept of the sounds or objects being located in a three dimensional space. This concept allows the engineers to concentrate on the placement of the object within that space regardless of the number of channels.

The home theater version of DTS:X supports up to 11 satellite speakers and two sub woofers and is object based in the same way Atmos is. However, its main advantage is that it can support up to 32 different satellite speaker locations. This allows DTS-X to run along side ANY other immersive format in that it is speaker and format agnostic to allow greater speaker placement flexibility and easier content production. Once the system has been told what speakers positions are available it will ‘map’ the immersive audio channels appropriately in order to obtain the best immersive effect.

Currently DTS-X has indicated that a range of additional functionality could be added to their technology; for example an algorithm that will improve voice intelligibility. However such additional features will only occur with the co-operation of the production and TV companies.

For further information click on: DTS-X.

AURO-3D

This format is designed around three layers of sound (surround, height and overhead ceiling), building on the single horizontal layer used in the 5.1 or 7.1 sound format. Auro-3D creates a spatial sound field by adding a height layer around the audience on top of the traditional 2D surround sound system. This additional layer reveals both localized sounds and height reflections complementing the sounds that exist in the lower surround layer. The height information that is captured during recording is mixed into a standard 5.1 surround PCM carrier, and during playback the Auro-3D decoder extracts the originally recorded height channels from this stream.

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Image credit: Auro Technologies

 

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Image credit: Auro Technologies

The above two images show the baseline Auro 3D speaker configurations for the 5.1 base layer and 5-speaker height and top layer. The top speaker is sometimes referred to as the Voice of God (VOG) being centrally located in the ceiling.

Large theaters use AuroMax which expands on the basic layout used by Auro 11.1 and Auro 13.1 by dividing the side, rear and ceiling channels into “zones”, to allow for placement of sound at discrete points along the theatre wall or ceiling as well as within the theatre itself supporting up to 26 channels. The principle of operation is the same as the other immersive formats using object metadata to steer the sound between channels.

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Image credit: Auro Technologies

The above shows a typical room speaker layout for an 11.1 Auro 3D system.

 

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Image credit: Auro Technologies

Auro 3D implementation in a large 13.1 home Theater system.

As with Dolby Atmos these speaker configurations are unique to the Auro 3D format.

For further information click on: Auro-3D.


Other Immersive Sound Processing Developments

MPEG – H is a new standard that is currently under development and is designed primarily as a broadcast format to support UHDTV and streaming with video resolutions of 3840×2160 (4K) and 7680×4320 (8K).

MPEG-H is a group of standards being created by the Moving Picture Experts Group (MPEG) for the development for a digital container, video and audio compression, and conformance-testing standards; to be formally known as ISO/IEC 23008 – High efficiency coding and media delivery in heterogeneous environments.

MPEG-H consists of 13 parts the most relevant part here is part 3:

3D Audio – Is an audio compression standard for three-dimensional audio that can support many loudspeakers.

This new standard is to support coding audio as audio channels, audio objects, or higher order ambisonics, it can support up to 64 loudspeaker channels and 128 core audio channels.

The main profile of MPEG-H 3D audio has five levels:

Audio Formats on CD, SACD, DVD, DVDA and Blu-ray

Levels for the Main profile of MPEG-H 3D Audio

 


So where to now?

Audio was and is my true calling and generally speaking I can’t get enough of it. Stereo, multi-channel sound and now immersive audio ALL sound terrific when correctly set up with well produced material. However, I am finding the continuing development of standard after standard and the expansion of the number of satellite speakers required to effectively support these new formats trying. I even went to a demonstration in Las Vegas that provided 128 speakers in 4 different vertical levels surrounding the audience; yes it was a very impressive feat of ingenuity and sounded very impressive, but totally impractical for any home or HT.

Having visited and worked in many homes I am continually amazed that the majority of household audio systems or poor at best. Many homes are not even set up to maximize stereo let alone multi-channel audio; and the room acoustics, well lets not even get started on that one here. Whilst I realize that this IS NOT an argument to stop multi-channel development, it does look like the industry is pandering to itself in order to sell more “stuff” and relying on, a smaller and smaller number of HT users to support their developments.

Business needs to sell in order to stay in business. Call me old fashioned, but I am no longer too sure how much further this multi-channel world can go.

See part 1 Video Formats here and part 2 The Dreaded HDMI & HDCP here.