C# 语音识别 - 这是用户所说的吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/227140/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C# Speech Recognition - Is this what the user said?
提问by RichieACC
I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and recognise when the user reads it (or an approximation of it). I also need to be able to switch quickly between languages, without changing the language of the operating system.
我需要编写一个使用语音识别引擎的应用程序——无论是内置的 vista 引擎还是第三方引擎——它可以显示一个单词或短语,并识别用户何时阅读它(或它的近似值) )。我还需要能够在语言之间快速切换,而无需更改操作系统的语言。
The users will be using the system for very short periods. The application needs to work without the requirement of first training the recognition engine to the users' voices.
用户将在很短的时间内使用该系统。应用程序需要在不需要首先针对用户的声音训练识别引擎的情况下工作。
It would also be fantastic if this could work on Windows XP or lesser versions of Windows Vista.
如果这可以在 Windows XP 或更低版本的 Windows Vista 上运行,那也太棒了。
Optionally, the system needs to be able to read information on the screen back to the user, in the user's selected language. I can work around this specification using pre-recorded voice-overs, but the preferred method would be to use a text-to-speech engine.
可选地,系统需要能够以用户选择的语言将屏幕上的信息读回给用户。我可以使用预先录制的画外音来解决此规范,但首选方法是使用文本到语音引擎。
Can anyone recommend something for me?
有人可以为我推荐一些东西吗?
回答by Jorge Córdoba
If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):
如果引擎是您要问的,那么我找到了(请注意,我只是列出,我还没有尝试过其中任何一个):
you also have the SAPI SDKfrom Microsoft itself, I've only tried it for text to speech but according to its definition:
您还拥有来自 Microsoft 本身的SAPI SDK,我仅尝试将其用于文本到语音,但根据其定义:
The SDK also includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines(in U.S. English, Simplified Chinese, and Japanese).
SDK 还包括可免费分发的文本转语音 (TTS) 引擎(美国英语和简体中文)和语音识别 (SR) 引擎(美国英语、简体中文和日语)。
回答by itsmatt
Dragon Naturally Speaking SDKmight be worth looking at. This projectlooked interesting.
Dragon Naturally speak SDK可能值得一看。 这个项目看起来很有趣。
Haven't got to play with either of them though.
虽然没有和他们任何一个一起玩。
回答by stephbu
Check out the new Speech class libraries in .NET 3.5
查看 .NET 3.5 中的新语音类库
http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.aspx
http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognizer.aspx
general documentation for SR and TTS
SR 和 TTS 的一般文档
http://msdn.microsoft.com/en-us/library/system.speech.recognition.aspxhttp://msdn.microsoft.com/en-us/library/system.speech.synthesis.aspx
http://msdn.microsoft.com/en-us/library/system.speech.recognition.aspx http://msdn.microsoft.com/en-us/library/system.speech.synthesis.aspx
回答by Mark Brackett
Text to speech is available with the Speech API. Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognitionand System.Speech.Synthesis.TtsEngine, but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.
文本到语音可通过Speech API 使用。就我个人而言,我可能需要 Vista 并使用System.Speech.SpeechRecognition和System.Speech.Synthesis.TtsEngine的托管接口,但是如果您确实需要 XP 支持,P/Invoke 应该可以进入非托管 API。
回答by Robert Elwell
Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.
请注意,如果您不需要先进行培训,就不会取得好成绩。语音识别是语音学的统计应用,这个领域非常坦率地表示信号中存在如此多的变化,以至于任何人都能理解其他人所说的话几乎是一个奇迹。现成的语音识别引擎很可能倾向于使用更一般的英语口音,但对于任何稍微不同的东西都会失败。
That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.
这就是为什么培训如此重要。我们可以轻松地通过过度拟合来做得很好,特别是如果我们减少问题空间。但是创建一个可扩展的机器学习解决方案?其中总是存在摩擦。
That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/
话虽如此,请考虑 Sphinx-4。这是一个用 Java 编写的现成解决方案,可从http://cmusphinx.sourceforge.net/sphinx4/ 获得
回答by Ryan Lundy
A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognitionnamespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:
不久前,Joel on Software 上也有人问过类似的问题。您可以使用System.Speech.Recognition命名空间来执行此操作...但有一些限制。将 System.Speech(应该在 GAC 中)添加到您的项目中。下面是 WinForms 应用程序的一些示例代码:
public partial class Form1 : Form
{
SpeechRecognizer rec = new SpeechRecognizer();
public Form1()
{
InitializeComponent();
rec.SpeechRecognized += rec_SpeechRecognized;
}
void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
lblLetter.Text = e.Result.Text;
}
void Form1_Load(object sender, EventArgs e)
{
var c = new Choices();
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.LoadGrammar(g);
rec.Enabled = true;
}
This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.
这将识别从 1 到 100 的数字,并在表单上显示结果数字。您需要一个带有名为 lblLetter 的标签的表单。
System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)
System.Speech 仅适用于预先定义的单词或短语列表;它不完全是 NaturallySpeaking,无论是在多功能性还是在识别质量上。但是你不必训练它适应用户的声音,如果你只有一些用户可以说的不同的话,它的效果还算不错。而且是免费的!(如果你有 Visual Studio)
It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).
如果你使用非常短的短语,它就不会奏效;我为我的孩子制作了一个程序,让他说出字母表中的字母并在屏幕上看到它们,但效果不佳,因为许多字母听起来很相似(尤其是从四岁孩子的嘴里说出来)。
As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(
至于更灵活的选项……嗯,还有前面提到的 NaturallySpeaking,它有一个 SDK。但是你必须联系销售人员才能获得任何形式的访问权限,并且没有列出定价,因此它会被认为是“它需要多少钱?嗯,你有多少钱?” 种东西。似乎没有“下载并使用它”选项。:(
As for text-to-speech, System.Speech.Synthesisdoes this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")
至于文本转语音,System.Speech.Synthesis 就是这样做的。它甚至比语音识别更容易。我写了一个小程序让我打字,按 Enter 键,然后大声朗读文本。我四岁的孩子被它迷住了。:)(“爸爸,我想和 da wobot 打交道。”)
回答by Philipp Schmid
[Note: I was the development lead for the managed speech recognition API in .NET 3.0]
[注意:我是 .NET 3.0 中托管语音识别 API 的开发负责人]
System.Speech is part of .NET 3.0, so it is available on both Vista and XP. In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS. On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer. There are a few SAPI 5 complient speech recognition engines available as well.
System.Speech 是 .NET 3.0 的一部分,因此它在 Vista 和 XP 上都可用。在 Vista 中,您还有一个额外的好处,即操作系统预先安装了语音识别引擎。在 XP 上,您的选择是:将 SAPI 5.1 SDK 与一个非常旧的引擎一起使用(但对于您的命令和控制场景可能足够好),安装 Office 2003,它安装了较新版本的识别器。还有一些符合 SAPI 5 的语音识别引擎可用。
If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support. Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).
如果您需要切换语言,您将需要使用 System.Speech.Recognition.SpeechRecognitionEngine 类,它允许您为您需要支持的语言选择 SR 引擎。请注意,引擎由它们支持的一组语言定义(它们可能使用相同的二进制文件,仅交换数据文件以支持其他语言)。
Comment if you need to know more.
如果您需要了解更多,请发表评论。
Philipp
菲利普
回答by dbkk
Try Microsoft Speech Server, which I think now is part of Office Communication Server 2007. It contains a SR/TTS engines, C# API and tools that integrate with Visual Studio.
试试Microsoft Speech Server,我认为它现在是Office Communication Server 2007 的一部分。它包含一个 SR/TTS 引擎、C# API 和与 Visual Studio 集成的工具。
回答by Rob Segal
Before this add 'Speech' reference
在此之前添加“演讲”参考
Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did. When adding strings into the Choices object use full text English words not numbers. Seems the MS speech recognition engine can't recognize numbers by themselves.
发现 Kyralessa 于 10 月 22 日发布的代码示例对我不起作用,但稍微修改后的版本对我有用。将字符串添加到 Choices 对象时,请使用全文英文单词而不是数字。似乎 MS 语音识别引擎无法自行识别数字。
I have marked these modifications with some commenting added to the previous example.
我在前面的示例中添加了一些注释来标记这些修改。
public partial class Form1 : Form
{
SpeechRecognizer rec = new SpeechRecognizer();
public Form1()
{
InitializeComponent();
rec.SpeechRecognized += rec_SpeechRecognized;
}
void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
lblLetter.Text = e.Result.Text;
}
void Form1_Load(object sender, EventArgs e)
{
var c = new Choices();
// Doens't work must use English words to add to Choices and
// populate grammar.
//
//for (var i = 0; i <= 100; i++)
// c.Add(i.ToString());
c.Add("one");
c.Add("two");
c.Add("three");
c.Add("four");
// etc...
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.LoadGrammar(g);
rec.Enabled = true;
}
回答by Michael Levy
This is the article from MSDN magazine that first discussed using the System.Speech APIs for Vista. Some of it is out of date because the API changed between beta (when the article was written) and the release of Vista, but this is still one of the best resources I've found and covers a good intro to the System.Speech namespace. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx
这是来自 MSDN 杂志的文章,该文章首先讨论了使用适用于 Vista 的 System.Speech API。其中一些已经过时,因为 API 在测试版(撰写本文时)和 Vista 发布之间发生了变化,但这仍然是我发现的最好的资源之一,并且涵盖了对 System.Speech 命名空间的很好的介绍. 请参阅http://msdn.microsoft.com/en-us/magazine/cc163663.aspx