BACKGROUND

My goal is to create a JavaScript-based web app to analyse and display frequency information in audio sources, both in-page sources (<audio> tag) and signals streamed from the client's microphone. I am well on my way :)

As a keen saxophonist, one of my goals is to compare the information inherent in the tone of different saxophonists and instruments by examining the distribution of upper partials in relation to a fundamental pitch. In short, I want to derive a representation of why different instrumentalists and instrument brands sound different even when playing the same pitch. Additionally I want to compare the tuning and frequency distribution of various 'alternative fingerings' against traditional or standard fingerings by the same player/instrument.

Accessing and displaying frequency information is a fairly trivial matter using the JS AudioContext.analyserNode, which I am using in conjunction with the HTML5 Canvas element to create a frequency map or 'winamp-style bargraph' similar to the one found 'Visualizations with Web Audio API' @ MDN.

PROBLEM

In order to achieve my goal I need to identify some particular information in the audio source, significantly the frequency in Hertz of the fundamental tone, for direct comparison between instrumentalists/instruments, and the frequency range of the source, to identify the frequency spectrum of the sounds I'm interested in. That information is to be found in the variable fData below...

// example...
var APP = function() {
    // ...select source and initialise etc..

    var aCTX = new AudioContext(),
        ANAL = aCTX.createAnalyser(),
        rANF = requestAnimationFrame,
        ucID = null;

    ANAL.fftSize = 2048;

    function audioSourceStream(stream) {

        var source = aCTX.createMediaStreamSource(stream);
        source.connect(ANAL);

        var fData = new Uint8Array(ANAL.frequencyBinCount);

        (function updateCanvas() {
            ANAL.getByteFrequencyData(fData);

            // using 'fData' to paint HTML5 Canvas

            ucID = rANF(updateCanvas);
        }());
    }
};

ISSUES

While I can easily represent fData as a bar- or line-graph etc via the <canvas> API, such that the fundamental and upper partials of a sound source are clearly visible, so far I have not been able to determine...

  • The frequency range of fData (min-max Hz)
  • The frequency of each value in fData (Hz)

Without this I cannot begin to identify the dominant frequency of the source (in order to compare variations in tuning against traditional musical pitch names) and/or highlight or excluded regions of the represented spectrum (zooming in or out etc) for more detailed examination.

My intention is to prominently display the dominant frequency by pitch (note name) and frequency (Hz) and to display the frequency of any individual bar in the graph on-mouseover. N.B. I already have a data object in which all the frequencies (Hz) of the chromatic pitches between C0-B8 are stored.

Despite reading the AudioContext.analyserNode specification several times, and virtually every page on this site and MDN about this subject, I still have no firm idea about how to accomplish this portion of my task.

Bascially, how does one go about turning the values in the Uint8Array() fData into a representation of the amplitude of each frequency in Hertz which the fData array elements reflect.

Any advice, suggestions, or encouragement would be greatly appreciated.

BP

up vote 9 down vote accepted

So first, understand that the output of an FFT will give you an array of relative strength in frequency RANGES, not precise frequencies.

These ranges are spread out in the spectrum [0,Nyquist frequency]. The Nyquist frequency is one-half of the sample rate. So if your AudioContext.sampleRate is 48000 (Hertz), your frequency bins will range across [0,24000] (also in Hz).

If you are using the default value of 2048 for fftSize in your AnalyserNode, then frequencyBinCount will be 1024 (it's always half the FFT size). This means each frequency bin will represent (24000/1024 = 23.4) approximately 23.4Hz of range - so the bins will look something like this (off-the-cuff, rounding errors may occur here):

fData[0] is the strength of frequencies from 0 to 23.4Hz.
fData[1] is the strength of frequencies from 23.4Hz to 46.8Hz.
fData[2] is the strength of frequencies from 46.8Hz to 70.2Hz.
fData[3] is the strength of frequencies from 70.2Hz to 93.6Hz.
...
fData[511] is the strength of frequencies from 11976.6Hz to 12000Hz.
fData[512] is the strength of frequencies from 12000Hz to 12023.4Hz.
...
fData[1023] is the strength of frequencies from 23976.6Hz to 24000Hz.

Make sense so far?

The next comment that usually comes up is "Wait a second - this is less precise, musically speaking, in the bass registers (where 23.4 Hz can cover a whole OCTAVE) than the treble registers (where there are hundreds of Hz between notes)." To that I say: Yes, yes it is. That's just how FFTs work. In the upper registers, it's easier to see tuning differences.

The NEXT next comment is usually "wow, I need a MASSIVE fftSize to be precise in the bass registers." Usually, the answer is "no, you probably shouldn't do it that way" - at some point, auto-correlation is more efficient than FFTs, and it's a lot more precise.

Hope this helps point you in the right direction, add a comment if there's a followup.

  • 1
    That single piece of info about the relationship between buffer size and sample rate is just what I was after, thought it suggest that I might have to take a different approach, for the reasons stated. :( Guessing I need a way to analyse the analyser!! :D Got any links for 'auto-correlation' that might be useful? Thanks anyway. – Brian Peacock Jun 12 '17 at 17:19
  • en.wikipedia.org/wiki/Autocorrelation has a pretty nice summary. And also mentions that oftentimes the fastest way to compute the autocorrelation is to use an FFT. Another alternative for getting the bass register values is to use a bank of bandpass filters to get the energy in each band. Designing the filters might be a bit tricky, since the bandwidths will be small and presumably you want the bands pretty isolated from each other. – Raymond Toy Jun 13 '17 at 15:09

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.