An FFT reports spectrum frequency peak or peaks (quantized by FFT bin size), which is different from musical pitch. It's possible for the perceived pitch frequency to be completely missing from an FFT spectrum.

Some of the simplest guitar tuners just used low-pass or band-pass filtering and measured the time between zero-crossings. The reciprocal gives a frequency estimate.

Autocorrelation is another common pitch estimation method; and sliding correlation or other self-similarity measures have lots of variations, such as sliding ASDF (squared difference), AMDF (mean difference), non-linear pattern matchers, adaptive checking only for a limited range of lags, lag interpolation, windowing and adaptive window selection, various weightings or using decision theory to select among multiple potential lag history sequences, and etc. One problem with most self-similarity measures is choosing the appropriate octave, as a sub-octave may show nearly the same similarity.

Other possibilities include using PLLs, filtered quadrature demodulators, filtered Hilbert transforms, and etc.

But note that some DSP filtering and demodulation methods are computationally nearly equivalent to doing 1-bin of a windowed DFT, which may or may not fit as an answer to your question.