I'm working on a piano tuning program and part of it requires real-time pitch detection. Here is the scheme I have so far which works to some degree but could probably use some refinement.
I'm capturing mono, 44.1kHz, 16-bit PCM audio in chunks of 2^14 samples. I combine the last 4 samples into a 2^16 length buffer, apply a Hann window to the buffer and run a FFT on it. Then, I bucketize the results of the FFT in two resolutions. First, I bucketize into 200 buckets and then run the HPS pitch detection algorithm at this granularity. I don't need to get an exact frequency here, I just want to get close. Then, I bucketize into 12000 buckets which gives me 1 cent resolution from 10Hz to 10kHz. Once I know an approximate frequency from the 200 bin HPS algorithm, I search that range of the 12000 bin case for a peak to get a more exact frequency.
This seems to work okay for the notes in the middle of the keyboard. What happens with the low notes is about 1.5s of mis-identification of the note as usually the 2nd or 3rd partial of the real note and then a correct identification of the note.
In all of the spectral plots I created to see what is going on, there is more width to the peaks that I would expect. This width is visually somewhat consistent from the 200 bin to 12000 bin case. I would have expected the peaks to be narrower in the 200 bin case.
So, signal processing is new to me so there may be things that are problems that I wouldn't think to ask about but in terms of specific questions, are the sample sizes sufficient for this task? Is Hann the right choice of window? Should I smooth the data as well before FFT? How sensitive is HPS to the number of bins? I was thinking that if I used a lot of bins then inharmonicity might not make partials overlap their fundamentals with the HPS algorithm's simple approach of dividing by 2, 3, 4, etc.