Python pyaudio - “聆听”直到检测到语音,然后录制到 .wav 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19070290/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:43:02  来源:igfitidea点击:

pyaudio - "Listen" until voice is detected and then record to a .wav file

pythonmultithreadingaudiopyaudio

提问by Phorce

I'm having some problems and I cannot seem to get my head around the concept.

我遇到了一些问题,我似乎无法理解这个概念。

What I am trying to do is this:

我想要做的是:

Have the microphone "listen" for voiced (above a particular threshold) and then start recording to a .wav file until the person has stopped speaking / the signal is no longer there. For example:

让麦克风“听”有声(高于特定阈值),然后开始录制到 .wav 文件,直到此人停止说话/信号不再存在。例如:

begin:
   listen() -> nothing is being said
   listen() -> nothing is being said
   listen() -> VOICED - _BEGIN RECORDING_
   listen() -> VOICED - _BEGIN RECORDING_
   listen() -> UNVOICED - _END RECORDING_
end

I want to do this also using "threading" so a thread would be created that "listens" to the file constantly, and, another thread will begin when there is voiced data.. But, I cannot for the life of me figure out how I should go about it.. Here is my code so far:

我也想使用“线程”来做到这一点,所以会创建一个线程来不断“监听”文件,并且当有语音数据时,另一个线程将开始......但是,我一生都无法弄清楚如何我应该去做……这是我的代码:

import wave
import sys
import threading
from array import array
from sys import byteorder

try:
    import pyaudio
    CHECK_PYLIB = True
except ImportError:
    CHECK_PYLIB = False

class Audio:
    _chunk = 0.0
    _format = 0.0
    _channels = 0.0
    _rate = 0.0
    record_for = 0.0
    stream = None

    p = None

    sample_width = None
    THRESHOLD = 500

    # initial constructor to accept params
    def __init__(self, chunk, format, channels, rate):
        #### set data-types

        self._chunk = chunk
        self.format = pyaudio.paInt16,
        self.channels = channels
        self.rate = rate

        self.p = pyaudio.PyAudio();

   def open(self):
       # print "opened"
       self.stream = self.p.open(format=pyaudio.paInt16,
                                 channels=2,
                                 rate=44100,
                                 input=True,
                                 frames_per_buffer=1024);
       return True

   def record(self):
       # create a new instance/thread to record the sound
       threading.Thread(target=self.listen).start();

   def is_silence(snd_data):
       return max(snd_data) < THRESHOLD

   def listen(self):
       r = array('h')

       while True:
           snd_data = array('h', self.stream.read(self._chunk))

           if byteorder == 'big':
               snd_data.byteswap()
           r.extend(snd_data)

       return sample_width, r

I'm guessing that I could record "5" second blocks, and, then if the block is deemed as "voiced" then it the thread should be started until all the voice data has been captured. However, because at current it's at while True:i don't want to capture all of the audio up until there are voiced commands, so e.g. "no voice", "no voice", "voice", "voice", "no voice", "no voice" i just want the "voice" inside the wav file.. Anyone have any suggestions?

我猜我可以记录“5”秒的块,然后如果块被视为“有声”,那么线程应该启动,直到所有的语音数据都被捕获。但是,因为目前while True:我不想在有语音命令之前捕获所有音频,所以例如“没有声音”,“没有声音”,“声音”,“声音”,“没有声音”, “没有声音”我只想要 wav 文件中的“声音”。有人有什么建议吗?

Thank you

谢谢

EDIT:

编辑:

import wave
import sys
import time 
import threading 
from array import array
from sys import byteorder
from Queue import Queue, Full

import pyaudio 

CHUNK_SIZE = 1024
MIN_VOLUME = 500

BUF_MAX_SIZE = 1024 * 10 

process_g = 0 

def main():

stopped = threading.Event()

q = Queue(maxsize=int(round(BUF_MAX_SIZE / CHUNK_SIZE)))

listen_t = threading.Thread(target=listen, args=(stopped, q))

listen_t.start()

process_g = threading.Thread(target=process, args=(stopped, q))

process_g.start()

try:

    while True:
        listen_t.join(0.1)
        process_g.join(0.1)
except KeyboardInterrupt:
        stopped.set()

listen_t.join()
process_g.join()

  def process(stopped, q):

  while True:
    if stopped.wait(timeout = 0):
        break
    print "I'm processing.."
    time.sleep(300)

   def listen(stopped, q):

   stream = pyaudio.PyAudio().open(
        format = pyaudio.paInt16,
        channels = 2,
        rate = 44100,
        input = True,
        frames_per_buffer = 1024    
            )

     while True:

      if stopped and stopped.wait(timeout=0):
          break
      try:
        print process_g
        for i in range(0, int(44100 / 1024 * 5)):
            data_chunk = array('h', stream.read(CHUNK_SIZE))
            vol = max(data_chunk)
            if(vol >= MIN_VOLUME):
                print "WORDS.."
            else:
                print "Nothing.."

        except Full:
                pass 

    if __name__ == '__main__':
    main()

Now, after every 5 seconds, I need the "process" function to execute, and then process the data (time.delay(10) whilst it does this and then start the recording back up..

现在,每 5 秒后,我需要执行“处理”函数,然后处理数据(time.delay(10) 同时执行此操作,然后开始备份记录..

采纳答案by Florian Braun

Look here:

看这里:

https://github.com/jeysonmc/python-google-speech-scripts/blob/master/stt_google.py

https://github.com/jeysonmc/python-google-speech-scripts/blob/master/stt_google.py

It even converts Wav to flac and sends it to the google Speech api , just delete the stt_google_wav function if you dont need it ;)

它甚至将 Wav 转换为 flac 并将其发送到 google Speech api,如果您不需要它,只需删除 stt_google_wav 函数;)

回答by Erik Kaplun

Having spent some time on it, I've come up with the following code that seems to be doing what you need, except writing to file:

花了一些时间,我想出了以下代码,除了写入文件之外,它似乎正在做你需要的事情:

import threading
from array import array
from Queue import Queue, Full

import pyaudio


CHUNK_SIZE = 1024
MIN_VOLUME = 500
# if the recording thread can't consume fast enough, the listener will start discarding
BUF_MAX_SIZE = CHUNK_SIZE * 10


def main():
    stopped = threading.Event()
    q = Queue(maxsize=int(round(BUF_MAX_SIZE / CHUNK_SIZE)))

    listen_t = threading.Thread(target=listen, args=(stopped, q))
    listen_t.start()
    record_t = threading.Thread(target=record, args=(stopped, q))
    record_t.start()

    try:
        while True:
            listen_t.join(0.1)
            record_t.join(0.1)
    except KeyboardInterrupt:
        stopped.set()

    listen_t.join()
    record_t.join()


def record(stopped, q):
    while True:
        if stopped.wait(timeout=0):
            break
        chunk = q.get()
        vol = max(chunk)
        if vol >= MIN_VOLUME:
            # TODO: write to file
            print "O",
        else:
            print "-",


def listen(stopped, q):
    stream = pyaudio.PyAudio().open(
        format=pyaudio.paInt16,
        channels=2,
        rate=44100,
        input=True,
        frames_per_buffer=1024,
    )

    while True:
        if stopped.wait(timeout=0):
            break
        try:
            q.put(array('h', stream.read(CHUNK_SIZE)))
        except Full:
            pass  # discard


if __name__ == '__main__':
    main()