Java：文本到语音引擎概述

Question

提问by DiaWorD

I'm now in search for a Java Text to Speech (TTS) framework. During my investigations I've found several JSAPI1.0-(partially)-compatible frameworks listed on JSAPI Implementations page, as well as a pair of Java TTS frameworks which do not appear to follow JSAPI spec (Mary, Say-It-Now). I've also noted that currently no reference implementation exists for JSAPI.

我现在正在寻找 Java Text to Speech (TTS) 框架。在我的调查过程中，我发现JSAPI 实现页面上列出了几个与 JSAPI1.0（部分）兼容的框架，以及一对似乎不遵循 JSAPI 规范的 Java TTS 框架（Mary，Say-It-Now） . 我还注意到，目前不存在 JSAPI 的参考实现。

Brief tests I've done for FreeTTS (first one listed in JSAPI impls page) show that it is far from reading simple and obvious words (examples: ABC, blackboard). Other tests are currently in progress.

我为 FreeTTS 所做的简短测试（在 JSAPI impls 页面中列出的第一个）表明它远非阅读简单而明显的单词（例如：ABC、黑板）。其他测试目前正在进行中。

And here goes the question (6, actually):

问题来了（实际上是 6 个）：

Which of the Java-based TTS frameworks have you used?
Which ones, by your opinion, are capable of reading the largest wordbase?
What about their voice quality?
What about their performance?
Which non-Java frameworks with Java bindings are there on the scene?
Which of them would you recommend?

您使用过哪些基于 Java 的 TTS 框架？
您认为哪些能够阅读最大的词库？
他们的语音质量如何？
他们的表现如何？
现场有哪些带有 Java 绑定的非 Java 框架？
你会推荐其中的哪一个？

Thank you in advance for your comments and suggestions.

预先感谢您的意见和建议。

Answer 1

采纳答案by pfranza

I've actually had pretty good luck with FreeTTS

实际上，我在FreeTTS 上运气不错

Answer 2

回答by DiaWorD

I've used Mary before and I was very impressed with the quality of the voices. Unfortunately, I haven't used any of the other ones.

我以前用过 Mary，我对它的声音质量印象深刻。不幸的是，我没有使用任何其他的。

Answer 3

回答by DiaWorD

Thanks a lot everyone, the trick is in FreeTTS source. Briefly: if being run as java -jar freetts.jar some-more-args-here, it spells lesser words than when being executed in a manner of bin/Server.jar and bin/Client.jar.

非常感谢大家，诀窍在 FreeTTS 源中。简而言之：如果作为运行java -jar freetts.jar some-more-args-here，它比以 bin/Server.jar 和 bin/Client.jar 的方式执行时拼写更少的单词。

Answer 4

回答by James Schek

I've used AT&T Natural Voiceswhich provides JSAPI and MS SAPI hooks. It provides excellent quality voices, a good "general" speech dictionary, many controls over pronunciation, and multiple languages. It's a little pricey, but works very well.

我使用了AT&T Natural Voices，它提供了 JSAPI 和 MS SAPI 钩子。它提供优质的语音、良好的“通用”语音词典、许多发音控制和多种语言。它有点贵，但效果很好。

I used it to read important sensor telemetry to drivers in a mobile sensor application. We had no complaints about the voice quality. It had about 75% out-of-the-box accuracy with scientific terms and a much higher (maybe 90%+) with normal dialogue. We got it up to about 99+% accuracy by using markups (most errors were on scientific terms with unusual phoneme combinations).

我用它来为移动传感器应用程序中的驱动程序读取重要的传感器遥测数据。我们对语音质量没有任何抱怨。它在科学术语方面具有大约 75% 的开箱即用准确度，而在正常对话方面则更高（可能超过 90%）。我们通过使用标记将其准确度提高到了大约 99+%（大多数错误是科学术语，具有不寻常的音素组合）。

It was a bit hard on the processor (we were running on a Pentium-III equivalent machine and it was pushing 50%-75% peak CPU). This uses a native speech engine (Windows, Linux, and Mac compatible) with a Java interface.

这对处理器来说有点困难（我们在 Pentium-III 等效机器上运行，它推动了 50%-75% 的 CPU 峰值）。它使用带有 Java 接口的本机语音引擎（Windows、Linux 和 Mac 兼容）。

There's a huge variety of voices and languages...

有各种各样的声音和语言......

Answer 5

回答by Cliff

I used FreeTTS but had a major problem getting the MBrola voices to run on My MacbookPro. I did get MBrola voices to run on Windows (painfully) and Linux. I've had no luck loading any other voice packages on FreeTTS which is a shame because the supplied voices are horrible IMO. Outside of that I had a little success with Cloudgarden as well but that only runs on Windows AFAIK. I'd be interested to hear others successes/failures with Voice engines as this type of work is particular challenging. I'm also toying a bit with Sphinx4. I just pulled down JVXML (which appears to be based on Sphinx4) last night but could not get it to run for some strange reason.

我使用了 FreeTTS，但在我的 MacbookPro 上运行 MBrola 声音时遇到了一个主要问题。我确实让 MBrola 声音在 Windows（痛苦地）和 Linux 上运行。我没有运气在 FreeTTS 上加载任何其他语音包，这是一种耻辱，因为提供的语音在 IMO 上很糟糕。除此之外，我在 Cloudgarden 上也取得了一些成功，但它只能在 Windows AFAIK 上运行。我很想听听其他人使用语音引擎的成功/失败，因为这种类型的工作特别具有挑战性。我也在玩 Sphinx4。我昨晚刚刚拉下了 JVXML（它似乎基于 Sphinx4），但由于某些奇怪的原因无法运行。

Answer 6

回答by i30817

I've contributed to mary. I feel it has potential if someonesmarter than me separated the HMM voices out of the core (those voices don't need large data sets and sound ok). I'm also trying to do a event system to freetts to send events when it says a word. I've had success, but it is broken in linux now. (probably because of a timer bug).

我为玛丽做出了贡献。我觉得如果有比我更聪明的人将 HMM 声音从核心中分离出来（这些声音不需要大数据集并且听起来不错），它就有潜力。我也在尝试做一个事件系统来让 freetts 在它说一个词时发送事件。我已经成功了，但它现在在 linux 中坏了。（可能是因为计时器错误）。

Answer 7

回答by nvrandow

Google Translate has a secret tts api: https://translate.google.com/translate_tts?ie=utf-8&tl=en&q=Hello%20World

谷歌翻译有一个秘密的 tts api：https://translate.google.com/translate_tts ?ie =utf-8 &tl =en &q =Hello%20World

Answer 8

回答by Sergey Ponomarev

Actually, there is not a big choice:

实际上，没有什么大的选择：

Festival, most old. Written in C++ but has bindings to Java.
eSpeak, quick and simple, used by Google Translate
mbrola

节日，最古老。用 C++ 编写，但与 Java 绑定。
eSpeak，快速简单，谷歌翻译使用
姆布罗拉

Pure Java:

纯Java：

FreeTTS, which code was ported from Festival, and then was open-sourced and development was stopped.
MaryTTS - more powerful and looks production ready.

FreeTTS，其代码从 Festival 移植，然后开源并停止开发。
MaryTTS - 更强大，看起来可以生产。

Also there is other proprietary programs like:

还有其他专有程序，例如：

Acapella
Nuance Vocalizer

阿卡贝拉
细微差别发声器

If your software is Windows only, you can use Microsoft Speech API.

如果您的软件仅适用于 Windows，则可以使用 Microsoft Speech API。

Answer 9

回答by susan097

I found little comfortable with MarryTTSIt has multilanguage and clear voice to understand.

我发现对MarryTTS不太满意。它具有多语言和清晰的语音，易于理解。

T convert speech to text, the better optiion is sphinx4-5prealpha. I give one thumb, because it has adjustable, flexibility and modifiable recognizer and grammer.

将语音转换为文本，更好的选择是 sphinx4-5prealpha。我给一个拇指，因为它具有可调整的、灵活的和可修改的识别器和语法。

Java：文本到语音引擎概述

提问by DiaWorD

采纳答案by pfranza

回答by DiaWorD

回答by DiaWorD

回答by James Schek

回答by Cliff

回答by i30817

回答by nvrandow

回答by Sergey Ponomarev

回答by susan097

相关推荐

最近更新

标签

Java：文本到语音引擎概述

提问by DiaWorD

采纳答案by pfranza

回答by DiaWorD

回答by DiaWorD

回答by James Schek

回答by Cliff

回答by i30817

回答by nvrandow

回答by Sergey Ponomarev

回答by susan097

相关推荐

使用 Java 将十六进制转储的字符串表示形式转换为字节数组？

如何在 Java HttpServletRequest 中获取客户端 Ip 地址

从数组（Java）中获取大小为 n 的所有组合的算法？

Java 持续集成服务器

相关推荐

最近更新

标签