用于“文本到语音”和“语音到文本”的 C++ API
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16296447/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C++ API for "Text To speech" and "Voice to Text"
提问by Soldier
I would like to know whether there is a good API for "voice recognition" and "text to speech" in C++. I have gone through Festival, which you can't even say whether the computer is talking because it is so real and voceas well.
我想知道在C++中是否有一个很好的“语音识别”和“文本到语音”的API。我经历过节日,你甚至不能说电脑是否在说话,因为它是如此真实和声音。
Unfortunately Festival
seems not supporting to voice recognition (I mean "Voice to Text") and voce
is built in Java and it is a mess in C++ because of JNI.
不幸的是,它Festival
似乎不支持语音识别(我的意思是“语音到文本”)并且voce
是用 Java 构建的,由于 JNI,它在 C++ 中是一团糟。
The API should support both "Text to voice" and "Voice to Text", and it should have a good set of examples, at least outside the owner's website. Perfect if it has a facility to identify set of given voices, but that is optional, so no worries.
API 应该支持“文本到语音”和“语音到文本”,并且它应该有一组很好的示例,至少在所有者的网站之外。如果它具有识别一组给定声音的功能,那就完美了,但这是可选的,所以不用担心。
What I am going to do with the API is, when set of voice commands given, turn the robot device left, right, etc. And also, speak to me saying "Good Morning", "Good Night" etc. These words will be coded in the program.
我将使用 API 做的是,当给出一组语音命令时,将机器人设备向左、向右等转动。此外,对我说“早安”、“晚安”等。这些话将是在程序中编码。
Please help me to find a good C++ voice API for this purpose. If you have access to a tutorial/installation tutorial, please be kind enough to share it with me as well.
为此,请帮我找到一个好的 C++ 语音 API。如果您可以访问教程/安装教程,也请与我分享。
采纳答案by Cyril Leroux
if you develop on Windows you can use MS Speech APIwhich allow you to perform Voice Recognition (ASR) and Text-to-Speech (TTS).
You can find some examples on this pageand a very basic example of Voice Recognition in this post.
如果您在 Windows 上开发,您可以使用MS Speech API,它允许您执行语音识别 (ASR) 和文本到语音 (TTS)。
你可以找到一些例子此页和语音识别的一个非常基本的例子这篇文章。
回答by bobweaver
I found that If I make a audio recording (I used qtmultimedia for this) has to be flac Read more here
我发现如果我进行录音(为此我使用了 qtmultimedia)必须是 flac 在这里阅读更多
I can then upload to google and then have it send me back some JSON
I then wrote some c++/qt for this to make into a qml plugin
Here is that (alpha) code. Note make sure that you replace
< YOUR FLAC FILE.flac >
with your real flac file.
然后我可以上传到谷歌,然后让它给我发回一些 JSON
我然后写了一些 c++/qt 来制作一个 qml 插件这是(alpha)代码。请注意,请确保将
< YOUR FLAC FILE.flac >替换为您的真实 flac 文件。
speechrecognition.cpp
语音识别.cpp
#include <QNetworkReply>
#include <QNetworkRequest>
#include <QSslSocket>
#include <QUrl>
#include <QJsonDocument>
#include <QJsonArray>
#include <QJsonObject>
#include "speechrecognition.h"
#include <QFile>
#include <QDebug>
const char* SpeechRecognition::kContentType = "audio/x-flac; rate=8000";
const char* SpeechRecognition::kUrl = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=directions&lang=en";
SpeechRecognition::SpeechRecognition(QObject* parent)
: QObject(parent)
{
network_ = new QNetworkAccessManager(this);
connect(network_, SIGNAL(finished(QNetworkReply*)),
this, SLOT(replyFinished(QNetworkReply*)));
}
void SpeechRecognition::start(){
const QUrl url(kUrl);
QNetworkRequest req(url);
req.setHeader(QNetworkRequest::ContentTypeHeader, kContentType);
req.setAttribute(QNetworkRequest::DoNotBufferUploadDataAttribute, false);
req.setAttribute(QNetworkRequest::CacheLoadControlAttribute,
QNetworkRequest::AlwaysNetwork);
QFile *compressedFile = new QFile("<YOUR FLAC FILE.flac>");
compressedFile->open(QIODevice::ReadOnly);
reply_ = network_->post(req, compressedFile);
}
void SpeechRecognition::replyFinished(QNetworkReply* reply) {
Result result = Result_ErrorNetwork;
Hypotheses hypotheses;
if (reply->error() != QNetworkReply::NoError) {
qDebug() << "ERROR \n" << reply->errorString();
} else {
qDebug() << "Running ParserResponse for \n" << reply << result;
ParseResponse(reply, &result, &hypotheses);
}
emit Finished(result, hypotheses);
reply_->deleteLater();
reply_ = NULL;
}
void SpeechRecognition::ParseResponse(QIODevice* reply, Result* result,
Hypotheses* hypotheses)
{
QString getReplay ;
getReplay = reply->readAll();
qDebug() << "The Replay " << getReplay;
QJsonDocument jsonDoc = QJsonDocument::fromJson(getReplay.toUtf8());
QVariantMap data = jsonDoc.toVariant().toMap();
const int status = data.value("status", Result_ErrorNetwork).toInt();
*result = static_cast<Result>(status);
if (status != Result_Success)
return;
QVariantList list = data.value("hypotheses", QVariantList()).toList();
foreach (const QVariant& variant, list) {
QVariantMap map = variant.toMap();
if (!map.contains("utterance") || !map.contains("confidence"))
continue;
Hypothesis hypothesis;
hypothesis.utterance = map.value("utterance", QString()).toString();
hypothesis.confidence = map.value("confidence", 0.0).toReal();
*hypotheses << hypothesis;
qDebug() << "confidence = " << hypothesis.confidence << "\n Your Results = "<< hypothesis.utterance;
setResults(hypothesis.utterance);
}
}
void SpeechRecognition::setResults(const QString &results)
{
if(m_results == results)
return;
m_results = results;
emit resultsChanged();
}
QString SpeechRecognition::results()const
{
return m_results;
}
speechrecognition.h
语音识别.h
#ifndef SPEECHRECOGNITION_H
#define SPEECHRECOGNITION_H
#include <QObject>
#include <QList>
class QIODevice;
class QNetworkAccessManager;
class QNetworkReply;
class SpeechRecognition : public QObject {
Q_OBJECT
Q_PROPERTY(QString results READ results NOTIFY resultsChanged)
public:
SpeechRecognition( QObject* parent = 0);
static const char* kUrl;
static const char* kContentType;
struct Hypothesis {
QString utterance;
qreal confidence;
};
typedef QList<Hypothesis> Hypotheses;
// This enumeration follows the values described here:
// http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html#speech-input-error
enum Result {
Result_Success = 0,
Result_ErrorAborted,
Result_ErrorAudio,
Result_ErrorNetwork,
Result_NoSpeech,
Result_NoMatch,
Result_BadGrammar
};
Q_INVOKABLE void start();
void Cancel();
QString results()const;
void setResults(const QString &results);
signals:
void Finished(Result result, const Hypotheses& hypotheses);
void resultsChanged();
private slots:
void replyFinished(QNetworkReply* reply);
private:
void ParseResponse(QIODevice* reply, Result* result, Hypotheses* hypotheses);
private:
QNetworkAccessManager* network_;
QNetworkReply* reply_;
QByteArray buffered_raw_data_;
int num_samples_recorded_;
QString m_results;
};
#endif // SPEECHRECOGNITION_H
回答by Rod Burns
You could theoretically use Twilio if you have an internet connection in the robot and are willing to pay for the service. They have libraries and examples for a bunch of different languages and platforms http://www.twilio.com/docs/libraries
如果您在机器人中有互联网连接并愿意为该服务付费,则理论上您可以使用 Twilio。他们有许多不同语言和平台的库和示例http://www.twilio.com/docs/libraries
Also, check out this blog explaining how to build and control an arduino based robot using Twilio http://www.twilio.com/blog/2012/06/build-a-phone-controlled-robot-using-node-js-arduino-rn-xv-wifly-arduinoand-twilio.html
此外,请查看此博客,解释如何使用 Twilio http://www.twilio.com/blog/2012/06/build-a-phone-controlled-robot-using-node-js-构建和控制基于 arduino 的机器人arduino-rn-xv-wifly-arduinoand-twilio.html