Also try using the Wavelet-Transform instead of short time FFT (with overlapping windows).
It is easier to configure (less parameters, there is no need for a window function), offers more flexibility (exponential frequency band; e.g. for music scales) and can reach the Gabor-Heisenberg uncertainty limit without artifacts.
The only downside is that you need to know the entire signal in advance, so it can only be used for recordings.
Your repo and tutorial are really cool! But do you have any sort of interactive "out-of-the-box" demo that doesn't require me to write code to call the library (online or downloadable)?
I'm more familiar with Fourier transforms and have limited experience with wavelets. But if each wavelet intrinsically falls off at a Gaussian curve, cutting it off (possibly with a window) at 3-4 sigmas won't change the wavelet substantially. Maybe for some use cases, the wavelet will be narrower at high frequencies (short delay), and wider at low frequencies (high delay). I don't know how you'd perform incremental updates of a plot drawn with non-uniform delays though...
The signal will continue over the seam between two windows, meaning you will cut the wave in the signal "in half". Mathematically, waves are always infinite and to cut them you would actually introduce overtones (higher frequencies) to model the sharp end / start of the base wave. These then result in artifacts regardless of what method is used for the transformation (Fourier or Wavelet).
> The signal will continue over the seam between two windows, meaning you will cut the wave in the signal "in half".
You can window the wavelet, then slide the finite-duration wavelet by a few samples at a time, even if the wavelet is hundreds to thousands of samples long. This is possible in STFT as well (each part of the original signal shows up in many separate FFTs).
Again, I don't know the implementation details of wavelet transforms. Maybe I'll look into your repo when I have time. What's your asymptotic and practical runtime?
Very strange! Did you get the permission prompt from Firefox after it started spinning? If not you might have denied access to the microphone in all sites which is why the prompt wouldn't come up.
Here is another spectrogram visualizer but with a twist, the frequency bins are the notes of a piano and hence you can use it to tune instruments or your voice.
Little plug of something similar I developed one year ago :
Wisteria : https://gistnoesis.github.io/
It does the real-time spectrogram using tensorflow.js with gpu. And it also run some transformer neural networks real-time to transcript the notes into a piano-roll.
Looks really cool. Sounds like a similar approach could be used to render audio waveforms. I wonder why a project like this one [1] decided to use server side waveform generation instead.
I had a look into peaks.js - it looks like it supports both server and client-side waveform generation these days. Server-side generation still makes sense in some cases imo - like if you have a very long audio file such as a podcast that you don't want users to download the entirety of just to display a waveform.
Hey, cool demo and article! You clearly have more experience with DSP than me haha
I was considering using an AnalyserNode since it's implemented natively by the browser and therefore a lot faster than using a FFT implementation in Javascript. My biggest issue with AnalyserNode though is that there's no way to control the window function or overlap amount between windows. While I'm sure you could make a decent spectrogram with an AnalyserNode (as you've done!), I think implementing the FFT yourself lets you do more fine-tuning.
When I get some time I might make Spectro use a Wasm FFT implementation like PulseFFT (https://github.com/AWSM-WASM/PulseFFT) for better performance. At the moment I'm using jsfft (https://github.com/dntj/jsfft) inside a web worker, which definitely isn't as efficient as a native implementation.
It's not my code btw, only found the post on the internet. Your points about the restrictions of AnalyserNode make sense. A wasm solution is indeed the ideal way to solve it if you want full flexibility.
It is easier to configure (less parameters, there is no need for a window function), offers more flexibility (exponential frequency band; e.g. for music scales) and can reach the Gabor-Heisenberg uncertainty limit without artifacts.
The only downside is that you need to know the entire signal in advance, so it can only be used for recordings.
Shameless self-promo of my implementation: https://github.com/Lichtso/CCWT