The Audio Coding Revolution: Part 1 - Series 1

Shedrack eze
Shedrack eze

The Audio Coding Revolution: Three Decades of Innovation (1990-Present)

Introduction

The evolution of audio coding represents one of the most significant technological advances in digital media. Although there were 2Mbit/s Codecs such as G.703/G.704 and 384kbps Codecs such as J.41, the complete breakthrough has been from the early days of MP2 and MP3 via low latency apt-X and OPUS to today's sophisticated streaming codecs like xHE-AAC, this transformation has fundamentally changed how we consume and distribute audio content.

Discover history with me — Watch the video now!

Part 1: The Pioneering Years (1990-2000)

Early Breakthroughs

The journey began with groundbreaking research into psychoacoustic modeling. Krahé, D., Brandenburg, K.-H., Grill, B., Theile, G. and Stoll, G. (late 80s) established fundamental principles of masking thresholds, laying crucial groundwork for perceptual audio coding. This work demonstrated how human hearing could be leveraged for efficient audio compression.

Fig. 1: Fraunhofer-researchers Jürgen Herre, Martin Dietz, Harald Popp, Ernst Eberlein, Karlheinz Brandenburg and Heinz Gerhäuser (left to right) with one of their ASPEC-19-ich-studio-devicesin the year 1991 (phot: Fraunhofer IIS)

The MP2/MP3 Revolution

Based on earlier publications and patents, 1991 marked a pivotal moment when MP2 (MPEG-1 Layer II) transformed digital broadcasting. Operating at 256 kbit/s, it provided the first practical demonstration of high-quality audio compression for professional applications, in particular broadcasting. Brandenburg's team at Fraunhofer IIS, alongside Stoll and Theile's contributions from thte Institut fuer Rundfunktechnik (IRT), pushed boundaries further with MP3 (MPEG-1 Layer III) in 1993, achieving remarkable quality at just 128 kbit/s.

Technical Deep-Dive

Fig. 2 (left): The MPMan F10 (1998) was the first MP3-player worldwide with a storage capacity of 32 MB
Fig. 3 (right): The successor SaeHan MPMan F60 T12 offered already Expansions Slot to extend the capacitiy. License:  CC BY-SA 2.0

The original implementation required specialized hardware:

•    DSP boards costing $10,000+

•    Significant computational power

•    Real-time encoding capabilities

Today, these algorithms run on chips for less than 1$, demonstrating Moore's Law in action.

The Emergence of Advanced Audio Coding (AAC)

By 1997, Advanced Audio Coding (AAC) emerged as a quantum step in audio compression technology. Developed collaboratively by multiple research institutes including Fraunhofer IIS, AT&T, Dolby, and Sony, AAC represented not merely an enhancement to MP3 but a complete reimagining of perceptual audio coding principles.

The codec introduced sophisticated improvements in psychoacoustic modeling, fundamentally changing how we approach audio compression. Through more precise temporal masking calculations and enhanced frequency resolution in critical bands, AAC achieved unprecedented accuracy in modeling human hearing perception. The introduction of Temporal Noise Shaping (TNS) proved particularly revolutionary, allowing precise control over the temporal shape of quantization noise and significantly improving the handling of transient signals.

Key Technical Innovations:

•    Advanced pre-echo detection and control

•    Improved simultaneous masking modeling

•    Enhanced backward prediction for tonal signals

•    Sophisticated joint stereo coding techniques

The spatial representation capabilities of AAC marked another significant advancement. The codec brought remarkable improvements in stereo imaging accuracy and phase relationship preservation, resulting in:

•    More precise sound source localization

•    Enhanced ambient sound reproduction

•    Better preservation of the original soundstage

•    Improved channel separation

These innovations enabled AAC to achieve the same perceived quality as MP3 at approximately 70% of the bitrate. The codec's superior performance led to its widespread adoption, becoming the foundation for iTunes and digital broadcasting standards. AAC's success paved the way for future developments like HE-AAC and xHE-AAC, establishing a new benchmark in perceptual audio coding.

Literature

·  Brandenburg, K., Stoll, G., Dehery, Y.-F., & Theile, G. (1991). "The ISO/MPEG Audio Coding Standard." Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM), 2, 1017-1021.

This paper details the development of MPEG-1 Layer II (MP2) and Layer III (MP3), highlighting the collaborative efforts of Fraunhofer IIS and the Institut fuer Rundfunktechnik (IRT). It covers the psychoacoustic principles and masking thresholds that laid the groundwork for the audio coding revolution starting in 1991.

·  Johnston, J. D. (1988). "Transform Coding of Audio Signals Using Perceptual Noise Criteria." IEEE Journal on Selected Areas in Communications, 6(2), 314-323.

A foundational work on perceptual coding, this paper explores the use of psychoacoustic modeling and transform techniques (e.g., MDCT) that influenced the early research by Brandenburg, Krahe, and others in the late 1980s, providing the theoretical basis for MP2 and MP3.

·  Bosi, M., Brandenburg, K., Quackenbush, S., Fielder, L., Akagiri, K., Fuchs, H., ... & Niss, B. (1997). "ISO/IEC MPEG-2 Advanced Audio Coding." Journal of the Audio Engineering Society, 45(10), 789-814.

This seminal paper documents the development of AAC in 1997, detailing innovations like Temporal Noise Shaping (TNS), improved psychoacoustic modeling, and joint stereo coding. It reflects the collaborative efforts of Fraunhofer IIS, AT&T, Dolby, and Sony, as described in the article.

·  *Theile, G., & Stoll, G. (1990). "Perceptual Coding of Digital Audio." Proceedings of the 89th AES Convention, Preprint 2981.

An early contribution from the IRT team, this paper discusses the principles of perceptual audio coding, including masking thresholds, which were refined in the late 1980s and early 1990s. It aligns with the pioneering work attributed to Krahe, Brandenburg, and Theile in your article.

·  Herre, J., & Brandenburg, K. (2003). "MPEG-4 High-Efficiency AAC Coding." IEEE Signal Processing Magazine, 20(6), 137-142.

This paper covers the evolution from AAC to HE-AAC, introducing techniques like Spectral Band Replication (SBR) and Parametric Stereo, which built on the foundations of AAC and paved the way for xHE-AAC, as noted in the article.

·  *Schuller, G., Yu, R., & Bleidt, R. (2015). "Extended HE-AAC - Bridging the Gap Between High-Quality Media and Low-Bitrate Broadcasting." Proceedings of the 139th AES Convention, Preprint 9317.

A technical analysis of xHE-AAC, this paper discusses its advancements in low-bitrate audio compression while maintaining quality, reflecting the ongoing innovation in streaming codecs mentioned in the introduction.

·  Painter, T., & Spanias, A. (2000). "Perceptual Coding of Digital Audio." Proceedings of the IEEE, 88(4), 451-513.

A comprehensive review of perceptual coding technologies from the 1990s, including MP2, MP3, and the transition to AAC, with insights into the technical deep-dive aspects like pre-echo control and psychoacoustic modeling.

·  *Grill, B., & Herre, J. (1996). "Advances in Perceptual Audio Coding." Proceedings of the 100th AES Convention, Preprint 4218.

This paper explores the technical innovations in AAC development, including temporal masking and joint stereo coding, providing a bridge between the MP3 era and the more advanced codecs that followed.

Detlef Wiese, German Sound Expert, Scientist and Entrepeneur with focus on audio processing, encoding and transmission. With more than 30 patent applications, Detlef is leading the industry towards innovative solutions in soft- and hardware. Beyond his professional focus, he is a musician with his own songs on various platforms, his engagement can be found in cultural, social and political activitis as well. He is CEO and founder of Ferncast GmbH and Binaurics Audio GmbH.

Contact him via dw@detlefwiese.de

publish on

Our Customers & partners

Subscribe to our newsletter

Thank you for your interest in the Ferncast Newsletter!
Oops! Something went wrong while submitting the form.
Find Us On Social Media:
We’re Always Happy To Help
phone: +49-241-99034567