I'd like to use STFT for multipitch analysis. I realise detecting the partials existing in the signal is just the beginning. Still I have problem with it.

Let's say I have signal sampled with 'CD' frequency 44100Hz. With window of 1024 samples I get frequency bin resolution of 22500Hz/512=43Hz. This is enough only to discern high piano notes like: C5 = 523.251Hz and C#5 = 554.365.

I used to think 1024 is quite a large window. But maybe it isn't and normally larger windows are used for detecting partials?

Can frequency resolution be increased with some other method than increasing window size, which worsens time resolution? I thought of two methods:


  1. Divide signal into frequency bands with bandpassfilters (for example 0-11.25Hz and 11.25-22.5Hz).
  2. Downsample higher bands so that original high frequencies will be now low frequencies (so do it for second band 11.25-22.5Hz -> 0Hz-22.5Hz) - not sure this is possible.
  3. Concat resulting bins sets with adjusted labels.


  1. Use series of lowpass filters with increasing limit.
  2. Perform FFT on increasing frequency ranges.
  3. For each frequency use best possible resolution (bins from the first FFT in which this frequency was included).
  4. This will cause low frequencies to have better resolution but I think this is ok because for higher notes the frequency difference is grater.

I will be grateful any remarks on this issues.

I also read here: How do window size, sample rate influence FFT pitch estimation? about method of improving peak picking results. I think will try to use it.

  • $\begingroup$ If you know there's only one sine component, you can fit a parabola to the neighboring bins of the peak and interpolate to find the "true" peak. Not sure how this compares to the phase method described by @pichenettes. $\endgroup$ – endolith Oct 3 '12 at 20:15

If you really insist on using FFT (rather than parametric methods, which wouldn't suffer from time/frequency trade-offs), you can fake a much better resolution by using the phase information to recover the instantaneous frequency for each FFT bin. Partials can then be detected by looking for plateaus in the function giving instantaneous frequency as a function of FFT bin index. The common implementation of this technique as described in this paper will "cost" you one extra STFT (instantaneous frequency is recovered by operations on the STFT of the signal, and STFT of the derivative of the signal).

See for example the ifgram function in this Matlab implementation of sinusoidal modeling of audio signals.

Note that this won't help resolving two partials falling into adjacent FFT bins. It will just provide a much more accurate frequency estimate than just converting into a frequency the FFT bin index of a spectral peak.

  • $\begingroup$ What do you mean by parametric methods? Also, was it you that some months ago mentioned an algorithm that was FFT-like but had a frequency octave scale rather than a uniform frequency scale? $\endgroup$ – Jim Clay Oct 3 '12 at 18:33
  • $\begingroup$ Parametric methods are statistical signal analysis methods that presupposes that the signal is generated by a specific process described by a set of parameters, and that computes a least-square estimate of these parameters from the observations. For example if you assume that the signal is a sum of N exponentially damped sinusoids + noise, algorithms like ESPRIT or MUSIC can be used to infer the N complex amplitudes and pulsations. $\endgroup$ – pichenettes Oct 3 '12 at 19:38
  • 2
    $\begingroup$ You are probably referring to the constant-Q transform. The caveat is that it is nowhere as efficient computationally-wise as the FFT; and that inverting this transform is a non-trivial optimization problem. $\endgroup$ – pichenettes Oct 3 '12 at 19:40
  • $\begingroup$ @JimClay: Maybe this should be migrated here? $\endgroup$ – endolith Oct 3 '12 at 20:17
  • 1
    $\begingroup$ To say parametric methods don't suffer from time/frequency trade-offs is misleading. At their core, parametric methods model the system and use the model to extract meaningful data. But the performance is only as good as the model. Assuming the "best" model is chosen (number of poles or number of signal space eigenvectors), the performance of these methods is still very sensitive to data record length. $\endgroup$ – Bryan Oct 4 '12 at 0:10

The term "resolution" has multiple meanings. In general, you can't increase your ability to separate (or "resolve") closely spaced spectral peaks by interpolation using the same window length of data. But you can estimate the frequency of isolated stationary spectral peaks that are well above the noise floor with finer resolution (sometimes much finer resolution) than the FFT bin spacing by various interpolation methods.

Common FFT result interpolation methods for higher resolution estimates include parabolic interpolation, Sinc interpolation, zero-padding the data into a much longer FFT, and phase vocoder methods using (slightly) offset overlapping windows.

An FFT is essentially a bank of bandpass filters, each with a very steep transition but tons of stop-band ripple for a given FIR filter kernel length. As such, these filters do not have great noise rejection of non-periodic-in-window noise. If you suspect this type of interference to be a problem, then a windowed FFT or a custom filterbank may perform better.


After further research invoked by Jim Clay question and pichenettes answer in comments I found that my Method2 is reinvented Bounded Q-transform described for example by Kashima and Mont-Reynaud (I'm not sure I can link to this article, file looks ripped).

Their approach is algorithmically more efficient as they start from the largest frequency range and iteratively downsample it by 2 till they get to the lowest octave.

Benefits of Q-transforms were also explored by Brown for example here. It may be not as efficient as single FFT, but has an advantage of not calculating thick FFT on high frequency bands which don't require this.

Thanks for all answers, comments and links.

  • $\begingroup$ What you're describing sounds very much like a wavelet transform, which seems to be confirmed by this. I realize this is an old post, but future readers may want to look at wavelets as well. Though, as I pointed out in my answer, you can't change the time-frequency uncertainty principle, but knowledge of the data can allow you to cheat a little. $\endgroup$ – orodbhen Aug 15 '17 at 19:38

If you keep a "history" of inputs, and use it to overlap your DFTs, then it would provide more information to extract spectral content from. Of course, that depends on the time-varying nature of your signal. It would be similar in form to a probability distribution function.

This would give you DFTs that are spaced closer in time. However, it would still increase the temporal uncertainty of each DFT, which is constrained by the laws of nature: the exact value of temporal and spectral behavior cannot be simultaneously determined.

If frequency content doesn't vary much within the window, though, then it should be fine.


Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.