Am I free?

Social bookmarking

Delicious Bookmark this on Delicious

Flickr randomness

Badges and junk

Support The Commons

Valid HTML 4.01 Strict
Valid CSS!

Making noise - Theory

Of course, the next thing I want to do is make some noise. But first I need to know what kind of format the noise should be in. We know (from our experimenting with the simple client) how to put audio through our output port, but because we only copied what we had as input, we have no idea what format that was in.

Before we go any further, lets refresh our memories about how digital audio is stored.


We need to understand sample rate before we can make any noise. Or, rather, before we can make a noise of a specific pitch. But to understand sample rate, we need to know what a sample is. Thankfully, this is quite simple. A sample is a single value representing the height of the audio waveform at one point. An individual sample is the smallest unit we can split our audio into.

Sample rate

This is quite simply the number of samples per second. So a sample rate of 48000 samples per second means that one second of audio is made up of 48000 samples. This could be thought of as the resolution of the audio.

Now, lets assume for a moment that our samples are just signed bytes - that is, a single sample can have a value anywhere between -127 and +128. Now, if our sample rate is, say, 100 samples per second, then one second of audio is made up of 100 bytes. As an example, a square wave with a frequency of 1 cycle per second would be represented as 50 bytes with value 128 followed by 50 bytes with value -127.

Which is interesting. 1Hz (1 cycle per second) requires 100 samples per cycle for 100 samples per second audio.

Lets be a bit more formal about that. If we use s to represent our sample rate, h to represent the frequency of our desired sound, and spc for the required samples-per-cycle, we can write:


Lets just see if that works. A 2Hz signal should use half as many samples per cycle.


Great. That will come in handy. Notice that the number of samples per cycle is dependant on the sample rate? And remember that JACK handily calls our callback when the sample rate changes? OK. Hold on to that thought until we get to making the noise.

Something else that we should talk briefly about here is the Nyquist rate. Oooh - sounds complicated. Well, it is, but for now just remember this: Sounds with frequencies of half the sample rate or more are going to go funny. Very funny. A discussion of the Nyquist frequency is really beyond the scope of this article, but there is plenty of info on the interwebnet. Try Sampling Theory 101 if you really want to know more.

Sample depth

OK, we've looked at the horizontal resolution of sound, now lets think about the vertical resolution. Our previous examples used a single byte to store a sample. That's 256 possible values per sample - not very high. Imagine the difference between a 16 colour image, and an image using 16 bit colour. In fact, give your imagination a break - look at the images below. It's about time we had some images, anyway. The quality of the 16bit image is higher, even though they have the same resolution, because we can't see any banding like in the 16 colour image. The reason there is no banding is because each pixel can be one of 65536 (the highest value we can store in a 16 bit number) colours, and so we can move smoothly from one colour to another.

16 bit:
A 16 bit image has lots of colours, making transitions between colours nice and smooth

16 colours:
An image with just 16 colours will have banding or dithering where the 16bit image has smooth transitions

And this is exactly the same with audio. 8bit audio has a much lower range of possible values for each sample than 16bit audio. At the end of the day, 8bit audio sounds worse than 16bit audio.

We can think of this as the vertical resolution of the audio. Higher resolution=better quality.

An example

To help all of this stick in your head - and to show that these principals apply to the real world, not just JACK - lets have a look at an example.

Compact discs. Everyone knows what they are - discs that store audio or data. But what format is the audio in? The answer is 16bit 44.1kHz stereo. Pardon?

Audio data on a CD is made up of 16bit samples. The sample rate is 44.1kHz, which is the same as saying 44100 samples per second. And it is stereo.

Knowing that a CD can hold 74 minutes of audio, lets work out how much storage that really is.

First, we need to know how many seconds we have. That's just a matter of multiplying the number of minutes by 60 - 74*60=4440 seconds.

Now, there are 44100 samples for every second of audio, so that's 4440*44100=195804000 samples in total. Whew! Big numbers!

Each sample is 16 bits, or 2 bytes. The total number of bytes used, then, is 195804000*2=391608000

There are 1024 bytes in 1 kilobyte - 391608000/1024=382430KB (ish, lets stick to integers).

And 1024 kilobytes in a megabyte - 382430/1024=373MB

Don't forget that this is stereo, so we have two channels. 373*2=746MB

Now, you're probably thinking "Hold on a moment! I can only get 650MB on a 74 minute CDR!". And that's correct. An audio CD has much less error correction than a data CD. If there are problems reading an audio CD, the player can interpolate - obviously, this is not an option with data. So, the missing storage is used for error correction.

Samples and frames

You've probably noticed that I keep talking about samples, but jack talks about frames. Well, there is a difference. Take a look at the JACK capture client tutorial for a good description of the difference (Page 3, section 3.3).

Here's my quick explanation: frames are collections of samples, one for each channel. So, stereo has two channels and so each frame would have two samples.

But don't worry about that yet. We might be connected to the left and right speaker, but we're only really outputting the same mono signal to each, which means that for the purposes of our bleep, frames and samples are synonymous.

Making noise - Reality

We know that the sample rate for JACK is variable and we've seen how to deal with that. What about the sample depth? Well, looking through the metronome source code, we can see that individual samples are of type "sample_t". At the top of the code, we can see that this is really just a typdef using jack_default_audio_sample_t. The documentation on jack_default_audio_sample_t shows that this is also a typdef to float. Whew! So, we have samples represented as floats. This is handy, because most of the mathematical functions we might need (sin, for a start) return floats. But what does this mean in terms of bits-per-sample? Lets have a look.

The code below will show us how big floats are. Remember that this might not be the same on everybody's system.

#include < stdio.h >

int main(){
  printf("floats are %d bytes.\n",sizeof(float));
  return 0;

On my system, compiling and executing this gives:

 $gcc -o floatsize floatsize.c
floats are 4 bytes.

So, we have 4 bytes per sample, which is 32 bits. No need to worry about hissy radio-like audio, then.

Top - Previous - Next