Declan Kolakowski

Analysing Spectrum Data in Unity

Added on by Declan Kolakowski.

I recently dived down into the rabbit hole that is the unity sound API. My aim: to create a game controlled solely by sound.  (And more specifically, pitch) In my opinion this technology is extremely underused and its possibilities are really exciting. An instrument offers some very interesting control possibilities that a game pad or even a keyboard can't deal with.

 For example; the sheer number of easily accessible inputs - pretty much anyone who knows how to vaguely play an instrument will have at least a grasp of where the notes of their instrument lie, and this means they have access to a controller with potentially hundreds of inputs that are much easier to use than on a standard keyboard. Imagine an RTS or RPG with its myriad of controls and quick binds (awkward or unintuitive) - translate this complexity onto a violin or piano and all those binds become accessible and more importantly, fluid. This frees these games to change their pace dramatically, transforming their complexity into an action oriented realm, maintaining their emergence and depth while being able to process user input at a break neck speed and in a fluid manner.

The actual technicalities of doing this in Unity aren't actually that hard, however the cryptic documentation that pervades most of the sound classes makes it seem like a much more difficult task.

Essentially we are going to be using Unity's in built FFT (Fast Fourier Transform) to find pitch data and then filter that data to find a distinct pitch that can be bound to an input.

Fourier Transform Recap: a fourier transform takes audio data - (in our case from a microphone) and fits a set of a sine and cosine waves to the audio curve to deduce the magnitude of different frequency bands in that sound. In Unity's case the method we'll be using is AudioSource.GetSpectrumData(int,int,FFTWindow) which breaks the sound down into a number of given frequency bins (first argument) from a set of given channels (second argument) and transforms it using a particularly FFT Window type (third argument - I used BlackmanHarris but Hamming is fine as well and sort of depends on what instrument you are going to be using).

You can copy the output of Unity's spectrum analysis method into a float array and then iterate through it to find the the channel with the greatest amplitude and derive pitch from it.

Your input resolution must be a power of 2 and I've found that anything less than 8192 will not give clear pitch indications at lower frequencies (less than the F below middle C). [This is because the FFT bands are spaced logarithmically]

Unfortunately due to the nature of unity documentation I've not managed to calculate where the channels start in the audio spectrum making it very difficult to algorithmically ascertain the pitch of the output. Currently I'm simply using a switch statement to find the pitch from the array then parsing that into midi codes that can then be used in unity's character controller. 

Try it out. There's a wealth of possibilities here. You could try adding effects for the loudness of the player. Or even for timbral effects like slides, trills buzzes and clicks - fitted to a pre-defined frequency window these can be very accurate and create some very exciting and immersive effects.