Python 使用来自实时麦克风的 pyaudio 检测点击
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4160175/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Detect tap with pyaudio from live mic
提问by a sandwhich
How would I use pyaudio to detect a sudden tapping noise from a live microphone?
我将如何使用 pyaudio 检测来自现场麦克风的突然敲击声?
采纳答案by Russell Borogove
One way I've done it:
我这样做的一种方式:
- read a block of samples at a time, say 0.05 seconds worth
- compute the RMS amplitude of the block (square root of the average of the squares of the individual samples)
- if the block's RMS amplitude is greater than a threshold, it's a "noisy block" else it's a "quiet block"
- a sudden tap would be a quiet block followed by a small number of noisy blocks followed by a quiet block
- if you never get a quiet block, your threshold is too low
- if you never get a noisy block, your threshold is too high
- 一次读取一个样本块,比如 0.05 秒
- 计算块的 RMS 幅度(单个样本的平方平均值的平方根)
- 如果块的 RMS 幅度大于阈值,则为“噪声块”,否则为“安静块”
- 突然点击将是一个安静的街区,然后是少量嘈杂的街区,然后是一个安静的街区
- 如果你从来没有得到一个安静的街区,你的门槛太低了
- 如果你从来没有遇到嘈杂的街区,那么你的门槛就太高了
My application was recording "interesting" noises unattended, so it would record as long as there were noisy blocks. It would multiply the threshold by 1.1 if there was a 15-second noisy period ("covering its ears") and multiply the threshold by 0.9 if there was a 15-minutequiet period ("listening harder"). Your application will have different needs.
我的应用程序在无人看管的情况下录制“有趣”的噪音,因此只要有噪音块,它就会录制。如果有 15 秒的嘈杂期(“捂住耳朵”),它将阈值乘以 1.1,如果有 15分钟的安静期(“听得更努力”),则将阈值乘以 0.9 。您的应用程序会有不同的需求。
Also, just noticed some comments in my code regarding observed RMS values. On the built in mic on a Macbook Pro, with +/- 1.0 normalized audio data range, with input volume set to max, some data points:
另外,刚刚注意到我的代码中关于观察到的 RMS 值的一些注释。在 Macbook Pro 的内置麦克风上,具有 +/- 1.0 归一化音频数据范围,输入音量设置为最大,一些数据点:
- 0.003-0.006 (-50dB to -44dB) an obnoxiously loud central heating fan in my house
- 0.010-0.40 (-40dB to -8dB) typing on the same laptop
- 0.10 (-20dB) snapping fingers softly at 1' distance
- 0.60 (-4.4dB) snapping fingers loudly at 1'
- 0.003-0.006 (-50dB 到 -44dB) 我家的中央暖气风扇噪音很大
- 在同一台笔记本电脑上打字 0.010-0.40(-40dB 到 -8dB)
- 0.10 (-20dB) 手指在 1' 距离处轻轻弹响
- 0.60 (-4.4dB) 在 1' 处响亮地弹响手指
Update: here's a sample to get you started.
更新:这是一个让您入门的示例。
#!/usr/bin/python
# open a microphone in pyAudio and listen for taps
import pyaudio
import struct
import math
INITIAL_TAP_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 2
RATE = 44100
INPUT_BLOCK_TIME = 0.05
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)
# if we get this many noisy blocks in a row, increase the threshold
OVERSENSITIVE = 15.0/INPUT_BLOCK_TIME
# if we get this many quiet blocks in a row, decrease the threshold
UNDERSENSITIVE = 120.0/INPUT_BLOCK_TIME
# if the noise was longer than this many blocks, it's not a 'tap'
MAX_TAP_BLOCKS = 0.15/INPUT_BLOCK_TIME
def get_rms( block ):
# RMS amplitude is defined as the square root of the
# mean over time of the square of the amplitude.
# so we need to convert this string of bytes into
# a string of 16-bit samples...
# we will get one short out for each
# two chars in the string.
count = len(block)/2
format = "%dh"%(count)
shorts = struct.unpack( format, block )
# iterate over the block.
sum_squares = 0.0
for sample in shorts:
# sample is a signed short in +/- 32768.
# normalize it to 1.0
n = sample * SHORT_NORMALIZE
sum_squares += n*n
return math.sqrt( sum_squares / count )
class TapTester(object):
def __init__(self):
self.pa = pyaudio.PyAudio()
self.stream = self.open_mic_stream()
self.tap_threshold = INITIAL_TAP_THRESHOLD
self.noisycount = MAX_TAP_BLOCKS+1
self.quietcount = 0
self.errorcount = 0
def stop(self):
self.stream.close()
def find_input_device(self):
device_index = None
for i in range( self.pa.get_device_count() ):
devinfo = self.pa.get_device_info_by_index(i)
print( "Device %d: %s"%(i,devinfo["name"]) )
for keyword in ["mic","input"]:
if keyword in devinfo["name"].lower():
print( "Found an input: device %d - %s"%(i,devinfo["name"]) )
device_index = i
return device_index
if device_index == None:
print( "No preferred input found; using default input device." )
return device_index
def open_mic_stream( self ):
device_index = self.find_input_device()
stream = self.pa.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = device_index,
frames_per_buffer = INPUT_FRAMES_PER_BLOCK)
return stream
def tapDetected(self):
print("Tap!")
def listen(self):
try:
block = self.stream.read(INPUT_FRAMES_PER_BLOCK)
except IOError as e:
# dammit.
self.errorcount += 1
print( "(%d) Error recording: %s"%(self.errorcount,e) )
self.noisycount = 1
return
amplitude = get_rms( block )
if amplitude > self.tap_threshold:
# noisy block
self.quietcount = 0
self.noisycount += 1
if self.noisycount > OVERSENSITIVE:
# turn down the sensitivity
self.tap_threshold *= 1.1
else:
# quiet block.
if 1 <= self.noisycount <= MAX_TAP_BLOCKS:
self.tapDetected()
self.noisycount = 0
self.quietcount += 1
if self.quietcount > UNDERSENSITIVE:
# turn up the sensitivity
self.tap_threshold *= 0.9
if __name__ == "__main__":
tt = TapTester()
for i in range(1000):
tt.listen()
回答by user1405612
a simplified version of the above code...
上面代码的简化版本......
import pyaudio
import struct
import math
INITIAL_TAP_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 2
RATE = 44100
INPUT_BLOCK_TIME = 0.05
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)
OVERSENSITIVE = 15.0/INPUT_BLOCK_TIME
UNDERSENSITIVE = 120.0/INPUT_BLOCK_TIME # if we get this many quiet blocks in a row, decrease the threshold
MAX_TAP_BLOCKS = 0.15/INPUT_BLOCK_TIME # if the noise was longer than this many blocks, it's not a 'tap'
def get_rms(block):
# RMS amplitude is defined as the square root of the
# mean over time of the square of the amplitude.
# so we need to convert this string of bytes into
# a string of 16-bit samples...
# we will get one short out for each
# two chars in the string.
count = len(block)/2
format = "%dh"%(count)
shorts = struct.unpack( format, block )
# iterate over the block.
sum_squares = 0.0
for sample in shorts:
# sample is a signed short in +/- 32768.
# normalize it to 1.0
n = sample * SHORT_NORMALIZE
sum_squares += n*n
return math.sqrt( sum_squares / count )
pa = pyaudio.PyAudio() #]
#|
stream = pa.open(format = FORMAT, #|
channels = CHANNELS, #|---- You always use this in pyaudio...
rate = RATE, #|
input = True, #|
frames_per_buffer = INPUT_FRAMES_PER_BLOCK) #]
tap_threshold = INITIAL_TAP_THRESHOLD #]
noisycount = MAX_TAP_BLOCKS+1 #|---- Variables for noise detector...
quietcount = 0 #|
errorcount = 0 #]
for i in range(1000):
try: #]
block = stream.read(INPUT_FRAMES_PER_BLOCK) #|
except IOError, e: #|---- just in case there is an error!
errorcount += 1 #|
print( "(%d) Error recording: %s"%(errorcount,e) ) #|
noisycount = 1 #]
amplitude = get_rms(block)
if amplitude > tap_threshold: # if its to loud...
quietcount = 0
noisycount += 1
if noisycount > OVERSENSITIVE:
tap_threshold *= 1.1 # turn down the sensitivity
else: # if its to quiet...
if 1 <= noisycount <= MAX_TAP_BLOCKS:
print 'tap!'
noisycount = 0
quietcount += 1
if quietcount > UNDERSENSITIVE:
tap_threshold *= 0.9 # turn up the sensitivity

