# Fast pitch recognition

I need to detect pitch (measure signal frequency) while the musicians play music, giving a warning if they are out of tune, but music happens to be a bit too fast for FFT (Fast Fourier Transform).

Below I try to give a technical description of the problem.

Musicians play music at 90-140 bpm. This means that there are 90-140 groups of notes each minute, up to 8 (more frequently, up to 4) notes in each group (60/140/8 = 0.0536 sec, 60/90/4 = 0.167 sec), that is, notes may change at the rate of 6-19 notes per second.

The music uses a logarithmic scale (see the attached image): the range between, say, 440Hz and 880Hz is divided into 12 notes, only 7 of which are used for melody. (Basically, they use only the white keys on the piano; when they want to shift the starting frequency, they use some of the black keys and don't use some white keys.) That is, the frequency of each next note is multiplied by 2^(1/12) = 1.05946.

To make things more complicated, the A (La) frequency may vary from 438 to 446 Hz. The string instruments in theory can be tuned, while the wind instruments depend on the air temperature and humidity, so the frequency happens to be re-negotiated by the musicians during the sound check.

Sometimes musicians and vocalists make errors in frequency, they call it "out of tune". They want a device that would inform them of such "out of tune errors". They have tuners, but the tuners require playing the same sound for about 1 sec before they start showing anything. This works for tuning, but does not work while the music is played.

Most likely, the tuner is doing FFT, and due to the formula $df = 1/T$ waits for 1 second to get the 1Hz resolution.

For A=440Hz, the difference in frequency between two notes is 440*0.05946 = 26.16 Hz, to get that frequency resolution, one has to use acquisition time of 0.038 sec, that is, at tempo=196bpm FFT is able to just distinguish two notes, at 98 bpm it is able to tell a 50% out-of-tune error provided that it starts acquisition at the very moment that the pitch changes. If we allow the pitch change in the course of an acquisition period, we get 49 bpm, which is just too slow. In addition, it is very desirable to be more precise about the frequency, say, detect a 25% or 12% out-of-tune error.

Is there a way to measure frequency (detect pitch) better than FFT, that is, with better resolution in less acquisition time? (At least 2 times better, ideally, 8-16 times better.) In exchange, I do not need to distinguish between notes of different octaves, e.g. both 440 and 880 may be recognized as A. I do not need the linearity of FFT output, a logarithmic scale would be better. (Probably, more trade-offs are possible, just nothing else comes to my mind right now.)

Here's a really good drawing:

• nice drawing. they should turn it upside-down so that the clef symbols (and musical staff) is right-side-up. but then all of the frequency and period and MIDI numbers would be upside-down. – robert bristow-johnson Nov 13 '15 at 9:11
• This sounds like you need polyphonic pitch detection instead of a monophonic (i.e. one note at a time) detection. Is that correct? – Jazzmaniac Nov 13 '15 at 13:51
• @Jazzmaniac Polyphonic would definitely be a plus, that is, monophonic would be a restriction. If I could use FFT, I would display several peaks on a 2D frequency-time diagram. On the other hand, if I understand correctly, the wind instruments are monophonic, and the violin is pretty close to that. – 18446744073709551615 Nov 13 '15 at 14:34
• Wind instruments definitely work with monophonic detection algorithms. String instruments (with more than one string) are tricky however, and most if not all monophonic detectors produce unreliable or even unusable results in the presence of decaying tones from not perfectly muted strings, open strings resonating or just crosstalk from the microphone. That said, polyphonic detection is hard. However, since you don't really need an accurate note detection but just an accurate in-tune-detection, you may very well find a suitable algorithm. It won't be a monophonic pitch detector however. – Jazzmaniac Nov 13 '15 at 14:57
• This may explain the downvote of RBJ's answer, or someone might have taken offense from his somewhat non-objective sales pitch. In any case, don't jump on his ship too soon. There are other options for what you want, and quite possibly better ones too. – Jazzmaniac Nov 13 '15 at 15:00

## 2 Answers

"Is there a way to measure frequency (detect pitch) better than FFT, that is, with better resolution in less acquisition time?"

yes there is. or are. there are multiple better ways to do musical pitch detection in real time that are far, far better than running an FFT.

consider :

Average Magnitude Difference Function (AMDF)

$$Q_x[k] = \sum_n |x[n] - x[n-k]|$$

Average Squared Difference Function (ASDF)

$$Q_x[k] = \sum_n (x[n] - x[n-k])^2$$

Autocorrelation Function (AF)

$$R_x[k] = \sum_n x[n] x[n-k]$$

note that i am playing fast-and-loose with the limits to the summation.

note also there there are no assumptions made about waveform shape or zero-crossings or other threshold crossings. the only assumption is that when the lag $k$ is approximately a period (or two periods or some other integer multiple of the period length), $x[n]$ looks a lot like $x[n-k]$. so the only assumption is that pitch is related to fundamental frequency of a periodic or nearly periodic (what i like to call "quasi-periodic") function.

my favorite is ASDF (and that is a thinly-veiled trade secret i just announced to everyone, but folks on comp.dsp knew that already). these are all time domain, AMDF and ASDF look very similar and ASDF looks like an upside-down version of AF. you are looking for nulls in AMDF or ASDF or peaks in AF which would correspond to potential period lengths of the quasi-periodic input.

here are a couple other tricks:

1. you can always correlate the most current $N$ samples against some $N$ samples delayed by $k$. that way you are dealing with the most current data possible in the real-time application.

2. you need not calculate the correlation for every integer lag $k$. in fact, since you like log-frequency, the spacing for larger $k$ might be bigger than the spacing for smaller $k$.

3. when a potential null (AM_F) or peak (AF) is found, you can compute the correlation for adjacent integer values of $k$.

4. between adjacent integer values of $k$, you can do interpolation to determine the peak location to a fractional-sample precision. i won't tell you how. use your imagination.

5. the whole trick (and this is the secret sauce where trade secrets and IVL patents apply) is to choose the correct peak or null when there are multiple candidates. choosing the incorrect peak or null will result in an "octave error". i'm not gonna tell you how to do that. use your imagination.

send me an email and we can discuss contracting terms if you want me to design you a kick-ass pitch detector. much better than YIN, which, in my opinion, works like shit.

• A comment on down-voting would be appreciated. If there's something wrong, I want to know that. This answer gives no recipe, but at least there's a list of what to read about (well, that does not sound as an easy reading, but it is something that is better than nothing). Please do not delete this answer. – 18446744073709551615 Nov 13 '15 at 13:10
• who would delete the answer? me? – robert bristow-johnson Nov 13 '15 at 22:31

I already answered your question here: https://stackoverflow.com/questions/33667275/fast-frequency-measurement/33678202#33678202

But, in summary, in certain circumstances, you can interpolate an FFT result to finer resolution that FFT bin spacing, thus allowing you to use a shorter data window for better time resolution.

But FFT frequency is not pitch frequency. And for some music instruments (those that produce slightly inharmonic overtones), neither is the auto-correlation function (or its relatives such as AMDF) frequency. That's because pitch is a psychoacoustic phenomena.

• two notes: if you want your pitch detection to be "fast", i wouldn't recommend doing it in the frequency domain (unless maybe if you're doing some kinda multi-rate thing with multiple FFTs. the reason why is that you cannot even begin to FFT until you get all of the samples. for an FFT of decent length (to get sufficient resolution at low pitches), you've already waited, say, 0.1 second. pitch (and loudness) are psychoacoustic measures that sometimes correlate well with physical properties like period (and power). for bells, toms, you'll get a pitch, but it might not mean the right thing. – robert bristow-johnson Nov 15 '15 at 1:30
• but i just ran my little matlab script on a recorded tom hit and it sounded to me that the pitch returned would be a plausible note value. – robert bristow-johnson Nov 15 '15 at 1:37