java 用于在音频文件中转录语音的开源软件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7613089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 20:42:03  来源:igfitidea点击:

Open Source Software For Transcribing Speech in Audio Files

javapythonspeech-recognitionspeech-to-textcmusphinx

提问by Cerin

Can anyone recommend reliable open source software for transcribing English speech in wav files? The two main programs I've researched are Sphinxand Julius, but I've never been able to get either to work, and the documentation with each on transcribing files is sketchy at best.

谁能推荐可靠的开源软件来转录 wav 文件中的英语语音?我研究过的两个主要程序是SphinxJulius,但我从来没有能够让它们工作,而且每个关于转录文件的文档充其量只是粗略的。

I'm developing on 64-bit Ubuntu 10.04, whose repos include sphinx2 and julius, as well as voxforge's julius acoustic modal for English. I'm focussing on transcribing files, instead of directly processing sound from a mic, because I've given up on expecting projects like these to work with Ubuntu's sound system. This isn't a knock against Ubuntu, as I can record sound with my mic perfectly using Audacity, but neither system seems able to access my mic, so I'm hoping I can simply their configuration by just reading from a file.

我正在 64 位 Ubuntu 10.04 上进行开发,其存储库包括 sphinx2 和 julius,以及 voxforge 的英语 julius 声学模态。我专注于转录文件,而不是直接处理来自麦克风的声音,因为我已经放弃了期待这样的项目与 Ubuntu 的声音系统一起工作。这不是对 Ubuntu 的打击,因为我可以使用 Audacity 完美地用我的麦克风录制声音,但是这两个系统似乎都无法访问我的麦克风,所以我希望我可以通过读取文件来简单地配置它们。

I first tried Sphinx2, from the Ubuntu package sphinx2-bin. Even though the sample sphinx2-demo seemed to work on transcribing a file, there's virtually no documentation on the configuration, so I'm not sure how I'd customize this to read from an arbitrary wav. The audio file used in the demo is in some undocumented "16k" format, which is indirectly referenced through 2 configuration files. There's a brief blurb describing sphinx2-demo as running sphinx2-batch, but inspecting the script shows it's actually calling sphinx2-continuous. Even worse, the --help docs for each script list about 6 dozen options, and doesn't mention which are required or optional. Overall, the lack of sphinx documentation, and the poor quality of existing documentation is driving me nuts.

我首先尝试了来自 Ubuntu 包 sphinx2-bin 的 Sphinx2。尽管示例 sphinx2-demo 似乎可以用于转录文件,但实际上没有关于配置的文档,所以我不确定如何自定义它以从任意 wav 中读取。演示中使用的音频文件是一些未公开的“16k”格式,通过 2 个配置文件间接引用。有一个简短的介绍,将 sphinx2-demo 描述为运行 sphinx2-batch,但检查脚本显示它实际上是在调用 sphinx2-continuous。更糟糕的是,每个脚本的 --help 文档列出了大约 6 打选项,并没有提到哪些是必需的或可选的。总体而言,缺乏 sphinx 文档以及现有文档的低质量让我发疯。

I next tried Julius, again from the Ubuntu package, which was surprisingly recent (4.1), considering the version used in Voxforge's quickstart is 3.5. The package seems to include slightly better documentation, and even an example written in Python (/usr/share/doc/julius-voxforge/examples/controlapp). After reading the example's docs, I tried adapting it to read from a file by creating a file filelist.txtcontaining the text "hello.wav" referring to a file of the same name, containing a recording of someone saying "hello". Placing these in the same directory, I ran:

考虑到 Voxforge 快速入门中使用的版本是 3.5,我接下来尝试了 Ubuntu 包中的 Julius,它出人意料地最近(4.1)。该软件包似乎包含稍微好一点的文档,甚至还有一个用 Python 编写的示例 (/usr/share/doc/julius-voxforge/examples/controlapp)。在阅读了示例的文档后,我尝试通过创建一个filelist.txt包含文本“hello.wav”的文件来调整它以从文件中读取,该文件指的是一个同名文件,其中包含有人说“你好”的录音。将它们放在同一目录中,我运行:

julius -input file -filelist filelist.txt -C julian.jconf

getting the response:

得到回应:

### read waveform input
Error: adin_file: sampling rate != 16000 (8000)
Error: adin_file: error in parsing wav header at hello.wav
Error: adin_file: failed to read speech data: "hello.wav"
0 files processed

Retrying by specifying absolute filenames for filelist.txt and hello.wav produce the same error.

通过为 filelist.txt 和 hello.wav 指定绝对文件名重试会产生相同的错误。

I also tried the Julius call used in the example, to record directly from a mic:

我还尝试了示例中使用的 Julius 调用,直接从麦克风录音:

julius -input mic -C julian.jconf

I called this several times, and the response varied between the error:

我多次调用它,响应在错误之间有所不同:

Cannot read /dev/dsp

and:

和:

STAT: AD-in thread created
<<< please speak >>>

In the later case, no matter what I say into the mic, nothing happens. I can't tell if it's still unable to read the mic, or if it's reading something, but is simply unable to transcribe the audio.

在后一种情况下,无论我对着麦克风说什么,都没有任何反应。我不知道它是否仍然无法读取麦克风,或者它是否正在读取某些内容,但只是无法转录音频。

I'm not sure what to make of this. The errors I'm getting don't leave me with much to go on. Why can't it read a wav? Why can't it read /dev/dsp? Why does it then appear to be able to read /dev/dsp, but not react in any way?

我不知道该怎么做。我遇到的错误并没有让我有很多事情要做。为什么不能读取wav?为什么它不能读取/dev/dsp?为什么它看起来能够读取 /dev/dsp,但没有任何反应?

Has anyone else had anysuccess with open source speech recognizers, especially on Linux?

有没有其他人在开源语音识别器方面取得过任何成功,尤其是在 Linux 上?

采纳答案by Nikolay Shmyrev

Why can't it read a wav?

为什么不能读取wav?

It tells you that the file has wrong sampling rate (8000) instead of requested (16000). Sampling rate is very important for speech recognition software.

它告诉您该文件具有错误的采样率 (8000) 而不是请求的 (16000)。采样率对于语音识别软件非常重要。

Why can't it read /dev/dsp?

为什么它不能读取/dev/dsp?

In recent versions of Ubuntu pulseaudio framework is used instead of OSS. The version you are trying is using OSS so you need to install oss-compatibility package from your distribution to bring OSS support back.

在最新版本的 Ubuntu 中,使用了pulseaudio 框架而不是 OSS。您尝试的版本是使用 OSS,因此您需要从您的发行版安装 oss-compatibility 包以恢复 OSS 支持。

You can try newer Julius which has pulseaudio support

您可以尝试具有pulseaudio支持的较新的Julius

Why does it then appear to be able to read /dev/dsp, but not react in any way?

为什么它看起来能够读取 /dev/dsp,但没有任何反应?

Audio input doesn't work properly.

音频输入不能正常工作。

Has anyone else had any success with open source speech recognizers, especially on Linux?

有没有其他人在开源语音识别器方面取得过任何成功,尤其是在 Linux 上?

Sure, check this video as an example of what people do with CMUSphinx:

当然,请查看此视频作为人们使用 CMUSphinx 做什么的示例:

http://www.youtube.com/watch?v=vfaNLIowSyk

http://www.youtube.com/watch?v=vfaNLIowSyk

I suggest you to revisit CMUSphinx package which is a leading open source speech recognition engine. There are loads of documents on the website, you just need to read them. Remember that speech recognition is a complex area where you can get a great results but you also need to invest your time in understanding the technology. Just like with any other domain.

我建议您重新访问 CMUSphinx 包,它是一个领先的开源语音识别引擎。网站上有大量文档,您只需要阅读它们即可。请记住,语音识别是一个复杂的领域,您可以在其中获得出色的结果,但您还需要花时间了解该技术。就像任何其他域一样。

In short, to transcribe a file with CMUSPhinx you need to do the following 3 simple steps:

简而言之,要使用 CMUSPhinx 转录文件,您需要执行以下 3 个简单步骤:

  1. Take wav file and resample it to 8khz 16 bit mono file with sox:
  1. 获取 wav 文件并使用 sox 将其重新采样为 8khz 16 位单声道文件:
    sox input.wav -r 8000 -c 1 resampled.wav
  1. Install pocketsphinx 0.7
  1. 安装口袋狮身人面像 0.7
   apt-get install pocketsphinx
  1. Decode the file
  1. 解码文件
    pocketsphinx_continuous -samprate 8000 -infile resampled.wav

The result will be printed to standard output. To supress the logger, add stderr redirection to /dev/null

结果将打印到标准输出。要抑制记录器,请将 stderr 重定向添加到 /dev/null

    pocketsphinx_continuous -infile resampled.wav 2> /dev/null