Usually, Stereo is considered to be sound produced from two audio channels. These channels are known as Left and Right and they are encoded separately into an audio file which increases the resulting bitrate (each channel is treated as a completely separate entity).
Stereo is generally recommended if you’re encoding audio at a bitrate of 256 Kbps or higher. Let's remind that the word stereophonic is a made up word derived from the Greek stereos meaning solid and phone meaning sound, which is normally just abbreviated to stereo; in other words we could translate it as "solid speak".
Joint stereo is a method used to save file space while still maintaining a stereo signal. Stereo files can be unnecessarily large especially if the audio input into the Left and Right channel are pretty much the same. Joint stereo alleviates some of these problems by mixing the Left and Right channels into a Mid channel. Because the audio is so similar, when you export your audio as a joint stereo file, the media encoder is able to find the average of the Left and Right channel data and merge this into a smaller file. Generally, joint stereo is more advantageous to use at a lower MP3 bitrate because the amount of data being averaged is already significantly lower. These advantages will depend on your content though (for example it will gain importance if you are deciding which method to use, stereo or joint stereo, to compress an uncompressed audio stream).
Consider that in MP3 Audio the biggest determinant of sound quality is the bit-rate. The bit-rate basically defines the amount of data/information available to represent each sample.
Now, let's consider a 320 Kbit/sec MP3 file:
- A True-Stereo file will have two channels that share the bit-rate but are independent of one another. So 320kbit becomes 160kbit per channel maximum, irrespective of the amount of shared information; centre(mono) sound. Information can be duplicated at the expense of detail.
- A Joint-Stereo file can mix (join) parts of the audio that are close to centre/mono and very similar sound/information. This allows more of the available bit-rate to represent detail both for the mono parts and the separate (stereo) parts.
In some cases, where there is a big difference between channels; then true stereo may give a broader sound-stage and some improved dynamics.
However, Joint-Stereo will give, in general, a better overall sound in comparison of Stereo, and the following simple explanation will show you why:
- Let's say we have X amount of bits.
- We have 2 channels (Left and Right).
- These 2 channels have similarities (same or very similar sound info/data), let's call these similarities A.
- Now let's consider the differences in both channels, BL and BR, for Left and Right respectively.
- JOINT STEREO: we use Y bits to code A (similarities), so we will have (X-Y) bits to code BL and BR (the actual differences, the details).
- NORMAL STEREO: we use Y bits to code Left's A, then use another Y bits to code Right's A (basically encoding two times the same sound info), so now we will have only (X-2Y) bits left to code BL and BR (basically less bits available to code properly the sound details).
- It's simple math: substitute both X and Y with arbitrary figures of your own choosing, so, which let us use more bits to code BR and BL (the details)? Right, Joint Stereo.
A bit more detail on Joint Stereo
A good question, at this point, would be: "...but why Joint Stereo is called like this?".
Good question, to which we could reply telling that Joint Stereo is basically supports more than one method of stereo coding, such as SS ("simple" or "L/R" stereo or DualMono), MS ("mid-side" stereo), or IS ("intensity" stereo). A joint stereo stream may still only employ a single coding method, but for the sake of efficiency or quality may switch between methods on a frame or even sub-frame basis.
Obviously, Simple Stereo is the most straightforward method of coding a stereo signal: each channel is treated as a completely separate entity. This can be inefficient and may adversely impact quality (as compared to other modes) when both channels contain nearly identical signals (i.e., are mono or nearly so); this conding will waste details with a redundant amount of info (same or very similar data from both sides). This is not usually used at all in modern encoders.
Mid/Side stereo coding, also called Matrix Stereo, encodes one main channel (the mid channel) as the average of the left and right audio channels (L + R): this mid channel will contain the majority of the audio data in the MP3 file. A smaller side channel is then used to record the differences between the left and right channels (L – R).
If we consider M the sum of L and R, and S the difference of L and R, we can define that M is transmitted in L, while S is transmitted in R; L and R channels can be reconstructed by using:
Finally, Intensity stereo coding is a method that achieves a saving in bitrate by replacing the left and the right signal by a single representing signal plus directional information. This replacement is psychoacoustically justified in the higher frequency range since the human auditory system is insensitive to the signal phase at frequencies above approximately 2 kHz.
Intensity stereo is by definition a lossy coding method thus it is primarily useful at low bitrates. For coding at higher bitrates only mid-side stereo should be used.
In LAME encoder, probably the most used encoder that converts a digitized WAV audio file into the MP3 audio coding file format, the coding used is Mid/Side stereo, instead of Intensity Stereo, because Intensity Stereo coding has poor quality performance, specially with lower bitrates; to determine when to switch to mid/side stereo, LAME uses a much more sophisticated algorithm than the one described in the ISO documentation and, in particular, during the frame-by-frame analysis, it will decide if it will be good or not using Mid/Side Stereo coding or leave the two channels separated.
The official site for the LAME project is https://lame.sourceforge.io/.
Historically, Joint Stereo was recommended for MP3s with a bitrate of up to 192/224Kbps, but, nowadays, nothing will block you to use it for higher bitrate (like 320Kbps). Other audio format that use Joint Stereo as default coding are, for example, AAC/M4A , FLAC and WMA audio files.
Personally, after any audio registration, if I want to keep those audio files also in MP3 audio format, I use always 320Kbps of bitrate, CBR, 44Khz of sample rate, 24 bit depth and Joint Stereo coding.
In general, there is no more reason to still use normal Stereo coding, because the efficiency provided from Joint Stereo coding, in terms of storage, and overall better quality (available bits per second are better managed during the encoding process) are the two major advantages of this coding type and that explains also why a lot of audio processing software, while encoding a RAM audio stream into a compressed one like the MP3 format, will give you Joint Stereo as default choice, also with the hightest bitrate available.
Converting Stereo to Joint Stereo?
Even if this is officially possible, using for example open source professional software like Audacity, doing this starting from a compressed audio stream, such as MP3s, we need to consider to start, for any kind of encoding or audio manipulations, from raw audio stream, for example from an uncompressed audio version, like WAV audio format.
If we consider Audacity, if we still have the original Audacity Project, .aup file and linked .au files (the raw audio stream, result of a recording), or an uncompressed version of the recording (for example a AIFF/WAV file), we can make, for example, the new MP3 file from that rather than converting another existing MP3 (in this case, converting from Stereo to Joint Stereo will be not a benefit, and you will probably lose more sound info, so quality).
NOTE: if, after the actual recording phase, we want to export, in Audacity, the audio stream in RAW format (the hightest quality possible of your recordings in digital audio, with no headers or metadata available), we can do that by simply selecting "Other uncompressed file types" in the Export Audio dialogue, then checking the type under the Format Options section.
Check if audio file is encoded as Joint Stereo
A first way to get some useful properties about MPEG files (audio), and even if audio file is Joint Stereo or not, is to use an old, but useful, software called EncSpot (a mirror download will be available at the end of this article).
Check if audio file is encoded as Joint Stereo
EncSpot is a very featureful MPEG audio stream analyzer created by Jon Dee of Guerrilasoft. It is a quite old piece of software (2005-2006), but this does not mean that it does not work anymore!
It has a graphical interface built around an improved version of mp3guessenc (improved at the time - the current version of mp3guessenc, which is still being developed, is much more advanced than Encspot ever was).
The main feature is trying to guess which encoder was used to encode an MP3 file. To do that, it analyzes tags, flags and usage of MP3 features that tend to be typical of one encoder or another.
This software is now a freeware: https://web.archive.org/web/20080215055506/http://www.guerillasoft.co.uk/encspot/download.html.
EncSpot Console
Similarly to EncSpot, the command line version of EncSpot can be used, for example in scripting, to extract, for each audio/video file, several properties. In the following example it is shown the output of the sample command "EncSpotConsole.exe {fullpath_filename}":
MediaInfo
Another good software which will let us to know more about MP3 audio files (but not limited only to them, it manages also other audio and video formats) is a software called MediaInfo, which will let us also to extract the metadata in various formats.
Even if Stereo and Joint Stereo sounds similar, in audio, Joint Stereo is far more efficient, both in terms of storage and quality – https://www.heelpbook.net/2020/difference-between-stereo-and-joint-stereo-audio-encoding/ #heelpbook #audio