Principles Of MP3
Written by Harry Fairhead   
Thursday, 11 May 2017
Article Index
Principles Of MP3
The Licensing Problem

MP3 is just a file format for audio files but judging by the revolution in the way music is listened to since it was introduced you might think it was much more and in a sense it is.

 

Forget for the moment the copyright issues that incense the music industry and consider the magic that allows you to take a high quality audio recording, convert it into digital form and then reduce the amount of storage needed by a factor of 10 or more without any loss of quality.

Well in theory there is no audible loss of quality but, as will become all too clear, this is something of an “issue” when it comes to actually using MP3.

 

Banner

 

First what does MP3 stand for?

The simple answer is that it’s short for the MPEG audio layer-3.

A more informative answer has to include the fact that MPEG stands for “Moving Pictures Expert Group” but this then raises the question of what experts on moving pictures have to do with sound!?

The answer is that moving pictures have sound tracks and MP3 is just one of a number of possible ways of encoding them.

MP3 is in fact part of the MPEG-1 standard, which covers how to compress both video and audio. It proved to be so good, however, that it has been widely adopted for sound tracks that don’t have any pictures!

A slice of history

The story of MP3 starts back in the incredibly far distant past of 1987 when the Fraunhofer institute started work on audio coding as part of the Digital Audio Broadcasting (DAB) project. The most important contribution to the work was from Professor Dieter Seitzer of Erlangen University – a name that will crop up again when it comes to the question of who owns the MP3 technology.

In 1988 the Motion Pictures Expert Group was born and by 1992 the Audio Layer 3 standard was created, based on the work at the Fraunhofer. Soon after a patent was granted to Dieter Seitzer for the MP3 method.

Frauenhofer produced the first MP3 player but it wasn’t very easy to use. Soon afterwards a free player – AMP - was developed by some university students and after being given Windows and Mac user interfaces – WinAMP and MacAMP – this started the music swapping MP3 craze that culminated in Napster, iTunes and similar Internet services.

Psycho acoustics

So how does it work and what was the big breakthrough that made, and makes, MP3 so good?

You probably already know that there are two types of data compression - “lossless” and “lossy”. 

Lossless compression works by using the most efficient use of the storage. For example, if you want to store a picture of a black and white page of text the simplest way is to represent the colour of every point or pixel as either being black or white. However, there’s a lot of white on a page and most of it happens together. Suppose we have a run of fifty white pixels. Using the simple representation this would take fifty bits to store but if we just write “50 white bits” you need to store the number fifty and the fact that these bits are white – much less storage is required.

This is called “run-length encoding” but it’s another story and all that really matters is that you can see that efficient coding schemes are possible and they save storage space without throwing away any information – hence “loss-less”.

Lossy data compression, on the other hand, does throw away information to save storage space – but it tries to do it so that you just don’t notice.

If you have a digital camera you will know all about JPEG (this one is from the Joint Photographics Experts Group  - the still picture version of MPEG!). When you save a picture using JPEG compression some of the detail in the picture is lost but in general you don’t notice.

If you want to see what I mean simply try saving a picture with a high JPEG compression ratio selected and then look at part of the result at high magnification. This is lossy-compression, because it loses information. It promises the highest compression ratios of any method and it works fine as long as you really do only throw away information that isn’t useful. Another problem with lossy compression is that you can't repeat it. For example take a picture that has been compressed just enough using JPEG. If you decompress it then what you are looking at looks fine but it has lots of compression errors or artifacts. If you take this and compress it again then the result will lose more information and the compression errors will increase. If you compress an image over and over again the result eventually becomes unacceptable. 

MP3 is a lossy compression scheme for sound that takes advantage of the way human audio perception works. It’s based on a psychoacoustic model of human hearing.

You might think that your hearing is great and you hear everything perfectly all of the time but… There is a phenomenon known as auditory masking, which means you really don’t hear everything you just think you do.

In auditory masking a loud tone in one frequency range masks quieter tones in nearby frequencies. Notice that we are not talking about tones far enough apart to be part of a chord – if we were then there would be no polyphonic music because the loud notes in a chord would mean you didn’t hear the quieter notes.

In addition to auditory masking there is also the simple fact that human hearing isn’t equally sensitive to all frequency ranges. It turns out that our hearing is most sensitive to the range 2kHz to 4kHz and the sensitivity falls off outside this range surprisingly rapidly. What this means is that you only need to use high quality representation in the frequency bands that we are really sensitive to.

 

fig1A loud noise in one frequency band masks any sound in nearby bands and allows them to be ignored.

 

MP3 uses all of these ideas and the fact that the sound coming out of the left and right hand stereo channels is very similar to reduce the number of bits per second needed to produce high quality sound.

Processing

The exact details of MP3 coding are complicated but a rough outline of what happens is:

  • First the digitised signal is filtered into 32 different frequency ranges or bands (using a Fourier Transform).

  • The psychoacoustic model then determines the masking thresholds for each band. This sounds complicated but all that happens is that each band is rated according to how loud it is compared to the bands next door and whether or not it is more or less like a pure tone or noise.

  • Finally each band is coded using just the number of bits needed according to how sensitive the human ear is to the frequency range, how loud the band is and how masked it is by neighbouring bands.

As you might guess the processing needed to implement this is quite a lot but the good news is that while MP3 encoding takes a long time MP3 decoding is quite quick and doesn’t need a powerful processor.This is lucky because without this property there would be no portable MP3 players or phone apps!

Good MP3 compression can achieve results that are 10 or 12 times smaller than the original file with no perceptible loss of quality, i.e. it converts the 1.4Mbits you need for one second of CD quality stereo music to around 100Kbits. Of course this reduction is achieved by throwing away some of the music but the argument is that you can’t hear it anyway.

Some would disagree however!

Of course it is important to realise that MP3 is designed to work best for music and how well it works depends a lot on the suitability of the acoustic masking principle to the sound source. For example there are much better compression algorithms than MP3 for speech, which doesn’t benefit much from the masking effect.

Many are also of the opinion that there are better compression methods for music but MP3 has the advantage of being widely used.

Even so some audio enthusiasts insist on using lossless compression. Currently the most used lossless compression formats are FLAC which is open source and licence free and Apple's Apple Lossless format. Of course there are some audio enthusiasts that reject digital sound, compressed or uncompressed, completely.

 

Banner

<ASIN:3642116116>

<ASIN:1598633015>

<ASIN:026203378X>

<ASIN:0240810740>

<ASIN:1568811683>

<ASIN:1578200830>

<ASIN:0130226165>

<ASIN:0470525673>

<ASIN:0470170956>

<ASIN:0596100760>

<ASIN:1420501666>



Last Updated ( Thursday, 11 May 2017 )