java 假人的频率/音高检测
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11553047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
frequency / pitch detection for dummies
提问by brainmurphy1
While I have many questions on this site dealing with the concept of pitch detection... They all deal with this magical FFTwith which I am not familiar. I am trying to build an Android application that needs to implement pitch detection. I have absolutely no understanding for the algorithms that are used to do this.
虽然我在这个网站上有很多关于音高检测概念的问题......他们都处理了这个我不熟悉的神奇FFT。我正在尝试构建一个需要实现音高检测的 Android 应用程序。我完全不了解用于执行此操作的算法。
It can't be thathard can it? There are around 8 billion guitar tuner apps on the android market after all.
不会那么难吧?毕竟,Android 市场上有大约 80 亿个吉他调音器应用程序。
Can someone help?
有人可以帮忙吗?
采纳答案by Jon Lin
A Fast Fourier Transform changes a function from time domain to frequency domain. So instead of f(t)
where f
is the signal that you are getting from the microphone and t
is the time index of that signal, you get g(θ)
where g
is the FFT of f
and θ
is the frequency. Once you have g(θ)
, you just need to find which θ
with the highest amplitude, meaning the "loudest" frequency. That will be the primary pitch of the sound that you are picking up.
快速傅立叶变换将函数从时域更改为频域。因此,而不是f(t)
在那里f
是信号,你是从麦克风获取和t
是信号的时间指数,你g(θ)
那里g
是的FFTf
和θ
是频率。一旦你有了g(θ)
,你只需要找到θ
幅度最高的那个,这意味着“最响亮”的频率。这将是您拾取的声音的主要音高。
As for actually implementing the FFT, if you google "fast fourier transform sample code", you'll get a bunch of examples.
至于实际实现FFT,如果你谷歌“快速傅立叶变换示例代码”,你会得到一堆例子。
回答by Bjorn Roche
The FFT is not really the best way to implement pitch detection or pitch tracking. One issue is that the loudest frequency is not always the fundamental frequency. Another is that the FFT, by itself, requires a pretty large amount of data and processing to obtain the resolution you need to tune an instrument, so it can appear slow to respond (i.e. latency). Yet another issue is that the result of an FFT is necessarily intuitive to work with: you get an array of complex numbers and you have to know how to interpret them.
FFT 并不是实现音高检测或音高跟踪的最佳方式。一个问题是最响亮的频率并不总是基频。另一个原因是 FFT 本身需要大量数据和处理才能获得调谐仪器所需的分辨率,因此它的响应速度可能会很慢(即延迟)。另一个问题是 FFT 的结果必须是直观的:你得到一个复数数组,你必须知道如何解释它们。
If you really want to use an FFT, here is one approach:
如果您真的想使用 FFT,这是一种方法:
- Low-pass your signal. This will help prevent noise and higher harmonics from creating spurious results. Conceivably, you could do skip this step and instead weight your results towards the lower values of the FFT instead. For some instruments with strong fundamental frequencies, this might not be necessary.
- Window your signal. Windows should be at lest 4096 in size. Larger is better to a point because it gives you better frequency resolution. If you go too large, it will end up increasing your computation time and latency. The hann function is a good choice for your window. http://en.wikipedia.org/wiki/Hann_function
- FFT the windowed signal as often as you can. Even overlapping windows are good.
- The results of the FFT are complex numbers. Find the magnitude of each complex number using sqrt( real^2 + imag^2 ). The index in the FFT array with the largest magnitude is the index with your peak frequency.
- You may want to average multiple FFTs for more consistent results.
- 低通您的信号。这将有助于防止噪声和高次谐波产生虚假结果。可以想象,您可以跳过此步骤,而是将结果加权到 FFT 的较低值。对于一些基频很强的乐器,这可能没有必要。
- 窗口你的信号。Windows 的大小至少应为 4096。越大越好,因为它可以为您提供更好的频率分辨率。如果你太大,最终会增加你的计算时间和延迟。hann 函数是您窗口的不错选择。http://en.wikipedia.org/wiki/Hann_function
- 尽可能多地对加窗信号进行 FFT。即使重叠的窗口也很好。
- FFT 的结果是复数。使用 sqrt( real^2 + imag^2 ) 找出每个复数的大小。FFT 数组中幅度最大的索引是您的峰值频率的索引。
- 您可能希望对多个 FFT 求平均值以获得更一致的结果。
How do you calculate the frequency from the index? Well, let's say you've got a window of size N. After you FFT, you will have N complex numbers. If your peak is the nth one, and your sample rate is 44100, then your peak frequency will be near (44100/2)*n/N. Why near? well you have an error of (44100/2)*1/N. For a bin size of 4096, this is about 5.3 Hz -- easily audible at A440. You can improve on that by 1. taking phase into account (I've only described how to take magnitude into account), 2. using larger windows (which will increase latency and processing requirements as the FFT is an N Log N algorithm), or 3. use a better algorithm like YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf
你如何从指数计算频率?好吧,假设你有一个大小为 N 的窗口。在你 FFT 之后,你将有 N 个复数。如果您的峰值是第 n 个,并且您的采样率为 44100,那么您的峰值频率将接近 (44100/2)*n/N。为什么靠近?好吧,您的错误是 (44100/2)*1/N。对于 4096 的 bin 大小,这大约是 5.3 Hz——在 A440 上很容易听到。您可以通过 1. 考虑相位(我只描述了如何考虑幅度)来改进它,2. 使用更大的窗口(这将增加延迟和处理要求,因为 FFT 是 N Log N 算法),或 3. 使用更好的算法,如 YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf
You can skip the windowing step and just break the audio into discrete chunks of however many samples you want to analyze. This is equivalent to using a square window, which works, but you may get more noise in your results.
您可以跳过窗口步骤,只需将音频分解为您想要分析的多个样本的离散块。这等效于使用方形窗口,该窗口有效,但您的结果中可能会出现更多噪音。
BTW: Many of those tuner apps license code form third parties, such as z-plane, and iZotope.
顺便说一句:许多调谐器应用程序许可证代码来自第三方,例如 z-plane 和 iZotope。
Update: If you want C source code and a full tutorial for the FFT method, I've written one. The code compiles and runs on Mac OS X, and should be convertible to other platforms pretty easily. It's not designed to be the best, but it is designed to be easy to understand.
更新:如果你想要 C 源代码和 FFT 方法的完整教程,我已经写了一个. 代码在 Mac OS X 上编译和运行,并且应该可以很容易地转换到其他平台。它的设计并不是最好的,但它的设计目的是易于理解。