The Fourier transform is commonly used for frequency analysis of sounds. However, it has some disadvantages when it comes to analyzing the human perception of sound. For example, its frequency bins are linear, whereas the human ear responds to frequency logarithmically, not linearly.

Wavelet transforms can modify the resolution for different frequency ranges, unlike the Fourier transform. The wavelet transform’s properties allow large temporal supports for lower frequencies while maintaining short temporal widths for higher frequencies.

The Morlet wavelet is closely related to human perception of hearing. It can be applied to music transcription and produces very accurate results that are not possible using Fourier transform techniques. It is capable of capturing short bursts of repeating and alternating music notes with a clear start and end time for each note.

The constant-Q transform (closely related to the Morlet wavelet transform) is also well suited to musical data. As the output of the transform is effectively amplitude/phase against log frequency, fewer spectral bins are required to cover a given range effectively, and this proves useful when frequencies span several octaves.

The transform exhibits a reduction in frequency resolution with higher frequency bins, which is desirable for auditory applications. It mirrors the human auditory system, whereby at lower-frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies.

My question is this: Are there other transforms which closely mimic the human auditory system? Has anyone attempted to design a transform that anatomically/neurologically matches the human auditory system as closely as possible?

For example, it is known that human ears have a logarithmic response to sound intensity. It is also known that equal-loudness contours vary not only with intensity, but with the spacing in frequency of spectral components. Sounds containing spectral components in many critical bands are perceived as louder even if the total sound pressure remains constant.

Finally, the human ear has a frequency-dependent limited temporal resolution. Perhaps this could be taken into account as well.

  • Do you impose any mathematical restrictions on "transform"? – Olli Niemitalo Mar 30 '17 at 17:40
  • 1
    Kudos for all the links ! – Gilles Mar 30 '17 at 18:58
  • No single transform can adequately mimic a system as complex as human auditory system. The existing HAS models use complicated signal processing architectures and multiple transforms each modeling another aspect of hearing. May be you want to consider piece by piece modeling. – Fat32 Mar 30 '17 at 21:30
up vote 8 down vote accepted

In designing such transformations, one should take into account competing interests:

  • fidelity to the human auditory system (that varies with people), including non-linear or even chaotic aspects (tinnitus)
  • easiness of the mathematical formulation for the analysis part
  • possibility to discretize it or allow fast implementations
  • existence of a suitable stable inverse

Two recents designs have catch my ears recently: Auditory-motivated Gammatone wavelet transform, Signal Processing, 2014

The ability of the continuous wavelet transform (CWT) to provide good time and frequency localization has made it a popular tool in time–frequency analysis of signals. Wavelets exhibit constant-Q property, which is also possessed by the basilar membrane filters in the peripheral auditory system. The basilar membrane filters or auditory filters are often modeled by a Gammatone function, which provides a good approximation to experimentally determined responses. The filterbank derived from these filters is referred to as a Gammatone filterbank. In general, wavelet analysis can be likened to a filterbank analysis and hence the interesting link between standard wavelet analysis and Gammatone filterbank. However, the Gammatone function does not exactly qualify as a wavelet because its time average is not zero. We show how bona fide wavelets can be constructed out of Gammatone functions. We analyze properties such as admissibility, time-bandwidth product, vanishing moments, which are particularly relevant in the context of wavelets. We also show how the proposed auditory wavelets are produced as the impulse response of a linear, shift-invariant system governed by a linear differential equation with constant coefficients. We propose analog circuit implementations of the proposed CWT. We also show how the Gammatone-derived wavelets can be used for singularity detection and time–frequency analysis of transient signals.

The ERBlet transform: An auditory-based time-frequency representation with perfect reconstruction, ICASSP 2013

This paper describes a method for obtaining a perceptually motivated and perfectly invertible time-frequency representation of a sound signal. Based on frame theory and the recent non-stationary Gabor transform, a linear representation with resolution evolving across frequency is formulated and implemented as a non-uniform filterbank. To match the human auditory time-frequency resolution, the transform uses Gaussian windows equidistantly spaced on the psychoacoustic “ERB” frequency scale. Additionally, the transform features adaptable resolution and redundancy. Simulations showed that perfect reconstruction can be achieved using fast iterative methods and preconditioning even using one filter per ERB and a very low redundancy (1.08). Comparison with a linear gammatone filterbank showed that the ERBlet approximates well the auditory time-frequency resolution.

And I shall mention also:

An Auditory-Based Transform For Audio Signal Processing, WASPAA 2009

An auditory-based transform is presented in this paper. Through an analysis process, the transform coverts time-domain signals into a set of filter bank output. The frequency responses and distributions of the filter bank are similar to those in the basilar membrane of the cochlea. Signal processing can be conducted in the decomposed signal domain. Through a synthesis process, the decomposed signals can be synthesized back to the original signal through a simple computation. Also, fast algorithms for discrete-time signals are presented for both the forward and inverse transforms. The transform has been approved in theory and validated in experiments. An example on noise reduction application is presented. The proposed transform is robust to background and computational noises and is free from pitch harmonics. The derived fast algorithm can also be used to compute continuous wavelet transform

  • 1
    This is exactly what I was looking for. Thank you. – user76284 Mar 31 '17 at 22:06

Your Answer


By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.