The Fourier transform is commonly used for frequency analysis of sounds. However, it has some disadvantages when it comes to analyzing the human perception of sound. For example, its frequency bins are linear, whereas the human ear responds to frequency logarithmically, not linearly.
Wavelet transforms can modify the resolution for different frequency ranges, unlike the Fourier transform. The wavelet transform’s properties allow large temporal supports for lower frequencies while maintaining short temporal widths for higher frequencies.
The Morlet wavelet is closely related to human perception of hearing. It can be applied to music transcription and produces very accurate results that are not possible using Fourier transform techniques. It is capable of capturing short bursts of repeating and alternating music notes with a clear start and end time for each note.
The constant-Q transform (closely related to the Morlet wavelet transform) is also well suited to musical data. As the output of the transform is effectively amplitude/phase against log frequency, fewer spectral bins are required to cover a given range effectively, and this proves useful when frequencies span several octaves.
The transform exhibits a reduction in frequency resolution with higher frequency bins, which is desirable for auditory applications. It mirrors the human auditory system, whereby at lower-frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies.
My question is this: Are there other transforms which closely mimic the human auditory system? Has anyone attempted to design a transform that anatomically/neurologically matches the human auditory system as closely as possible?
For example, it is known that human ears have a logarithmic response to sound intensity. It is also known that equal-loudness contours vary not only with intensity, but with the spacing in frequency of spectral components. Sounds containing spectral components in many critical bands are perceived as louder even if the total sound pressure remains constant.
Finally, the human ear has a frequency-dependent limited temporal resolution. Perhaps this could be taken into account as well.