java 将音频转换为文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3958342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 04:08:54  来源:igfitidea点击:

Convert audio to text

c#javaspeech-recognitionaudio-processing

提问by Amira Elsayed Ismail

I just want to know if there is any build in libraries or external libraries in Java or C# that allow me to take an audio file and parse it and extract the text from it.

我只想知道是否有任何内置的库或 Java 或 C# 中的外部库允许我获取音频文件并解析它并从中提取文本。

I need to make an application to do so, but I don't know from where I can start.

我需要提出申请才能这样做,但我不知道从哪里开始。

回答by Ohad Schneider

回答by bulltorious

Here is a complete example using C# and System.Speech

这是一个使用 C# 和 System.Speech 的完整示例

The code can be divided into 2 main parts:

代码可以分为2个主要部分:

configuring the SpeechRecognitionEngine object (and its required elements) handling the SpeechRecognized and SpeechHypothesized events.

配置 SpeechRecognitionEngine 对象(及其所需元素)处理 SpeechRecognized 和 SpeechHypothesized 事件。

Step 1: Configuring the SpeechRecognitionEngine

步骤 1:配置 SpeechRecognitionEngine

_speechRecognitionEngine = new SpeechRecognitionEngine();
_speechRecognitionEngine.SetInputToDefaultAudioDevice();
_dictationGrammar = new DictationGrammar();
_speechRecognitionEngine.LoadGrammar(_dictationGrammar);
_speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

At this point your object is ready to start transcribing audio from the microphone. You need to handle some events though, in order to actually get access to the results.

此时,您的对象已准备好开始从麦克风转录音频。不过,您需要处理一些事件,以便实际访问结果。

Step 2: Handling the SpeechRecognitionEngine Events

步骤 2:处理 SpeechRecognitionEngine 事件

_speechRecognitionEngine.SpeechRecognized -= new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized -= new EventHandler(SpeechHypothesizing);

_speechRecognitionEngine.SpeechRecognized += new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized += new EventHandler(SpeechHypothesizing);

private void SpeechHypothesizing(object sender, SpeechHypothesizedEventArgs e) { ///real-time results from the engine string realTimeResults = e.Result.Text; }

private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { ///final answer from the engine string finalAnswer = e.Result.Text; }

_speechRecognitionEngine.SpeechRecognized -= new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized -= new EventHandler(SpeechHypothesizing);

_speechRecognitionEngine.SpeechRecognized += new EventHandler(SpeechRecognized); _speechRecognitionEngine.SpeechHypothesized += new EventHandler(SpeechHypothesizing);

private void SpeechHypothesizing(object sender, SpeechHypothesizedEventArgs e) { ///来自引擎字符串的实时结果 realTimeResults = e.Result.Text; }

private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { ///来自引擎字符串的最终答案 finalAnswer = e.Result.Text; }

That's it. If you want to use a pre-recorded .wav file instead of a microphone, you would use

而已。如果您想使用预先录制的 .wav 文件而不是麦克风,您可以使用

_speechRecognitionEngine.SetInputToWaveFile(pathToTargetWavFile);

_speechRecognitionEngine.SetInputToWaveFile(pathToTargetWavFile);

instead of

代替

_speechRecognitionEngine.SetInputToDefaultAudioDevice();

_speechRecognitionEngine.SetInputToDefaultAudioDevice();

There are a bunch of different options in these classes and they are worth exploring in more detail.

这些类中有很多不同的选项,值得更详细地探索。

http://ellismis.com/2012/03/17/converting-or-transcribing-audio-to-text-using-c-and-net-system-speech/

http://ellismis.com/2012/03/17/converting-or-transcribing-audio-to-text-using-c-and-net-system-speech/

回答by jassuncao

You might check Microsoft Speech API. I think they provide a SDK that you can use for your objective.

您可以查看Microsoft Speech API。我认为他们提供了一个可以用于您的目标的 SDK。

回答by Grant Peters

For Java, it seems there is a solution from Sun: javax.speech.recognition

对于 Java,似乎有来自 Sun 的解决方案:javax.speech.recognition

回答by Ivelin

You can use SoX (the Swiss Army knife of sound processing programs) to convert audio file to text file with numeric values corresponding to sound frequency/volume.

您可以使用 SoX(声音处理程序的瑞士军刀)将音频文件转换为文本文件,其数值对应于声音频率/音量。

I have done it for a previous project but don't know the exact command options.

我为以前的项目做过,但不知道确切的命令选项。

Here is a link to the project: http://sox.sourceforge.net/Main/HomePage

这是该项目的链接:http: //sox.sourceforge.net/Main/HomePage