# Extracting precise frequencies from FFT Bins using phase change between frames

I have been looking through this fantastic article: http://blogs.zynaptiq.com/bernsee/pitch-shifting-using-the-ft/

While being fantastic, it is extremely hard and heavy going. This material is really stretching me.

I have extracted the maths from Stefan's code module that calculates the exact frequency for a given bin. But I don't understand the last calculation. Can someone explain to me the mathematical construction at the end?

Before digging into the code, let me set the scene:

• Let's say we set fftFrameSize = 1024, so we are dealing with 512+1 bins

• As an example, Bin's ideal frequency fits a single wave in the frame. At a sample rate of 40KHz, tOneFrame = 1024/40K seconds = 1/40s, so Bin would ideally be collecting a 40Hz signal.

• Setting osamp (overSample) = 4, we progress along our input signal in steps of 256. So the first analysis examines bytes zero through 1023, then 256 through 1279, etc. Note each float gets processed 4 times.

...

``````void calcBins(
long fftFrameSize,
long osamp,
float sampleRate,
float * floats,
BIN * bins
)
{
/* initialize our static arrays */
static float gFFTworksp[2*MAX_FRAME_LENGTH];
static float gLastPhase[MAX_FRAME_LENGTH/2+1];

static long gInit = 0;
if (! gInit)
{
memset(gFFTworksp, 0, 2*MAX_FRAME_LENGTH*sizeof(float));
memset(gLastPhase, 0, (MAX_FRAME_LENGTH/2+1)*sizeof(float));
gInit = 1;
}

/* do windowing and re,im interleave */
for (long k = 0; k < fftFrameSize; k++)
{
double window = -.5*cos(2.*M_PI*(double)k/(double)fftFrameSize)+.5;
gFFTworksp[2*k] = floats[k] * window;
printf("sinValue: %f", gFFTworksp[2*k]);
gFFTworksp[2*k+1] = 0.;
}

/* do transform */
smbFft(gFFTworksp, fftFrameSize, -1);

printf("\n");

/* this is the analysis step */
for (long k = 0; k <= fftFrameSize/2; k++)
{
/* de-interlace FFT buffer */
double real = gFFTworksp[2*k];
double imag = gFFTworksp[2*k+1];

/* compute magnitude and phase */
double magn = 2.*sqrt(real*real + imag*imag);
double phase = atan2(imag,real);

/* compute phase difference */
double phaseDiff = phase - gLastPhase[k];
gLastPhase[k] = phase;

/* subtract expected phase difference */
double binPhaseOffset = M_TWOPI * (double)k / (double)osamp;
double deltaPhase = phaseDiff - binPhaseOffset;

/* map delta phase into [-Pi, Pi) interval */
// better, but obfuscatory...
//    deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);

while (deltaPhase >= M_PI)
deltaPhase -= M_TWOPI;
while (deltaPhase < -M_PI)
deltaPhase += M_TWOPI;
``````

(EDIT:) Now the bit I don't get:

``````        // Get deviation from bin frequency from the +/- Pi interval
// Compute the k-th partials' true frequency

double bin0Freq = (double)sampleRate / (double)fftFrameSize;
bins[k].idealFreq = (double)k * bin0Freq;

double sampleTime = 1. / (double)sampleRate;
double samplesInStep = (double)fftFrameSize / (double)osamp;
double stepTime = sampleTime * samplesInStep;
double deltaTime = stepTime;

// Definition of frequency is rate of change of phase, i.e. f = dϕ/dt
// double deltaPhaseUnit = deltaPhase / M_TWOPI; // range [-.5, .5)
double freqAdjust = (1. / M_TWOPI) * deltaPhase / deltaTime;

// Actual freq <-- WHY ???
}
}
``````

I just can't see it clearly, even though it seems to be staring in the face. Could someone please explain this process from scratch, step by step?

• How one can get `BIN * bins` what does it stand for? Mar 20, 2015 at 21:27

The basic principle is very simple. If a given component exactly matches a bin frequency then its phase will not change from one FT to the next. However if the frequency does not correspond exactly with the bin frequency then there will be a phase change between successive FTs. The frequency delta is just:

``````delta_freq = delta_phase / delta_time
``````

and the refined estimate of the frequency of the component will then be:

``````freq_est = bin_freq + delta_freq
``````
• Sorry to be very stupid, but I still don't understand why this is true. I still feel very un-grounded using this maths.
– P i
Jan 8, 2011 at 12:10
• If the 2 FFTs are offset by an amount different than one period of the sine wave, then there will be a phase change, even if the sine wave frequency is bin centered. Jan 8, 2011 at 14:26
• It also helps to know that one definition of frequency is rate of change of phase, i.e. `f = dϕ/dt`. Jan 8, 2011 at 16:28
• I would hazard someone is jealous of your l33tDSPsk1llz :p well, it isn't me. I am tremendously grateful to both you and HotPaw for giving a fresh perspective. now I can actually understand this - finally!!!
– P i
Jan 8, 2011 at 17:23
• @Ohmu: glad to hear you're making progress - I recommend reading a good introductory DSP book if you're going to be doing more of this kind of stuff - Richard Lyons's book, Understanding Digital Signal Processing, is very good and is a lot more practical than most. Jan 8, 2011 at 20:31

I have implemented this algorithm for Performous myself. When you take another FFT at a time offset, you expect the phase to change according to the offset, i.e. two FFTs taken 256 samples apart should have a phase difference of 256 samples for all frequencies present in the signal (this assumes that the signals themselves are steady, which is a good assumption for short periods like 256 samples).

Now, the actual phase values you get from FFT are not in samples but in phase angle, so they will be different depending on the frequency. In the following code the phaseStep value is the conversion factor needed per bin, i.e. for frequency corresponding to bin x, the phase shift will be x * phaseStep. For bin center frequencies x would be an integer (the bin number) but for actual detected frequencies it may be any real number.

``````const double freqPerBin = SAMPLE_RATE / FFT_N;
const double phaseStep = 2.0 * M_PI * FFT_STEP / FFT_N;
``````

The correction works by assuming that the signal in a bin has the bin center frequency and then calculating the expected phase shift for that. This expected shift is substracted from the actual shift, leaving the error. A remainder (modulo 2 pi) is taken (-pi to pi range) and the final frequency is calculated with bin center + correction.

``````// process phase difference
double delta = phase - m_fftLastPhase[k];
m_fftLastPhase[k] = phase;
delta -= k * phaseStep;  // subtract expected phase difference
delta = remainder(delta, 2.0 * M_PI);  // map delta phase into +/- M_PI interval
delta /= phaseStep;  // calculate diff from bin center frequency
double freq = (k + delta) * freqPerBin;  // calculate the true frequency
``````

Notice that many adjacent bins often end up corrected to the same frequency because the delta correction can be up to 0.5 * FFT_N / FFT_STEP bins either way so the smaller FFT_STEP you use, the further away will corrections be possible (but this increases the processing power needed as well as imprecision due to inaccuracies).

I hope this helps :)

• I now have a few 'essay style' rationales to look at. but I'm not smart enough to formulate the maths myself from these explanations. I am after some explanation that generates the maths line by line. A mathematical proof.
– P i
Feb 6, 2011 at 12:26
• Maybe this will help? sengpielaudio.com/calculator-timedelayphase.htm (the time delay there is in in milliseconds but I suppose you can convert 256 samples into the proper amount of time) Feb 6, 2011 at 12:36

Finally I have figured this out; really I had to derive it from scratch. I knew there would be some simple way to derive it, my (usual) mistake was to attempt to follow other people's logic rather than use my own common sense.

This puzzle takes two keys to unlock it.

...

``````for (int k = 0; k <= fftFrameSize/2; k++)
{
// compute magnitude and phase
bins[k].mag = 2.*sqrt(fftBins[k].real*fftBins[k].real + fftBins[k].imag*fftBins[k].imag);
bins[k].phase = atan2(fftBins[k].imag, fftBins[k].real);

// Compute phase difference Δϕ fo bin[k]
double deltaPhase;
{
double measuredPhaseDiff = bins[k].phase - gLastPhase[k];
gLastPhase[k] = bins[k].phase;

// Subtract expected phase difference <-- FIRST KEY
// Think of a single wave in a 1024 float frame, with osamp = 4
//   if the first sample catches it at phase = 0, the next will
//   catch it at pi/2 ie 1/4 * 2pi
double binPhaseExpectedDiscrepancy = M_TWOPI * (double)k / (double)osamp;
deltaPhase = measuredPhaseDiff - binPhaseExpectedDiscrepancy;

// Wrap delta phase into [-Pi, Pi) interval
deltaPhase -= M_TWOPI * floor(deltaPhase / M_TWOPI + .5);
}

// say sampleRate = 40K samps/sec, fftFrameSize = 1024 samps in FFT giving bin thru bin
// then bin holds one whole wave in the frame, ie 44 waves in 1s ie 44Hz ie sampleRate / fftFrameSize
double bin0Freq = (double)sampleRate / (double)fftFrameSize;
bins[k].idealFreq = (double)k * bin0Freq;

// Consider Δϕ for bin[k] between hops.
// write as 2π / m.
// so after m hops, Δϕ = 2π, ie 1 extra cycle has occurred   <-- SECOND KEY
double m = M_TWOPI / deltaPhase;

// so, m hops should have bin[k].idealFreq * t_mHops cycles.  plus this extra 1.
//
// bin[k].idealFreq * t_mHops + 1 cycles in t_mHops seconds
//   => bins[k].actualFreq = bin[k].idealFreq + 1 / t_mHops
double tFrame = fftFrameSize / sampleRate;
double tHop = tFrame / osamp;
double t_mHops = m * tHop;

bins[k].freq = bins[k].idealFreq + 1. / t_mHops;
}
``````

This is the frequency estimation technique used by phase vocoder methods.

If you look at a single point on a (fixed frequency and fixed amplitude) sine wave in time, the phase will advance with time by an amount proportional to the frequency. Or you can do the converse: if you measure how much the phase of a sinusoid changes over any unit of time, you can calculate the frequency of that sinusoid.

A phase vocoder uses two FFTs to estimate phase with reference to two FFT windows, and the offset of the two FFTs is the distance between the 2 phase measurements in time. From thence, you have your frequency estimate for that FFT bin (an FFT bin being roughly a filter to isolate a sinusoidal component or other sufficiently narrowband signal that fits within that bin).

For this method to work, the spectrum near the FFT bin in use has to be fairly stationary, e.g. not changing in frequency, etc. That's the assumption a phase vocoder requires.

Maybe this will help. Think of the FFT bins as specifying little clocks or rotors, each spinning at the frequency of the bin. For a stable signal, the (theoretical) next position of the rotor can be predicted using the math in the bit you don't get. Against this "should be" (ideal) position, you can compute several useful things: (1) the difference with the phase in a bin of an adjacent frame, which is used by a phase vocoder to better estimate of the bin frequency, or (2) more generally phase deviation, which is a positive indicator of a note onset or some other event in the audio.

Signal frequencies that fall exactly on a bin frequency advance bin phase by integer multiples of 2π. Since the bin phases that correspond to the bin frequencies are multiples of 2π due to the periodic nature of the FFT there is no phase change in this case. The article you mention also explains this.

• That would be true if FFT step were the same as FFT size. However, here steps are made smaller (osamp factor) and then the phase no longer stays the same even for center frequencies. E.g. consider FFT step of merely one sample. For lower frequencies there will be essentially no phase shift at all while for very high frequencies there can be up to PI phase difference. Feb 6, 2011 at 12:31
• I have answered my own question. But if I give the bounty to my answer it will be lost. I was going to give it to Tronic, due to his awesome open source project (Performous), but he has tons of points! So... enjoy ;)
– P i
Feb 8, 2011 at 18:07