java 随机文本生成器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2398577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 20:57:48  来源:igfitidea点击:

Random text generator

java

提问by verbatim64x

What is the best way to generate random a string which is composed of alphabets and its a maximum of 8million characters which will be tested using string searching algorithms? is Math.random still be ok for the randomness or the reliability of the spread of characters based on statistics? any comment is appreciated, right me if im wrong with my ideas.

生成随机字符串的最佳方法是什么,该字符串由字母及其最多 800 万个字符组成,将使用字符串搜索算法进行测试?Math.random 是否仍然可以用于基于统计的字符传播的随机性或可靠性?任何评论都表示赞赏,如果我的想法有误,请纠正我。

回答by Adamski

It depends entirely on the purpose of generating this string. If you're generating strings in order to test the performance of a search algorithm then you may want to generate "English-like" text containing a distribution of words similar to a typical document.

这完全取决于生成此字符串的目的。如果您正在生成字符串以测试搜索算法的性能,那么您可能希望生成包含类似于典型文档的单词分布的“类英语”文本。

One way to achieve this would be to build a Markov Chain, whereby for each state you generate a given word; e.g. "The" and then transition to a new state with a certain probability; e.g. "The" -> "first". You could auto-generate the Markov chain using a large body of sample text, such as the Brown Corpus.

实现这一目标的一种方法是构建一个马尔可夫链,从而为每个状态生成一个给定的单词;例如“The”,然后以一定的概率转换到一个新状态;例如“The” -> “first”。您可以使用大量示例文本(例如Brown Corpus )自动生成马尔可夫链。

Or even simpler, you could test your algorithm using a particular corpus (such as the Brown Corpus) rather than having to generate any samples yourself.

或者更简单的是,您可以使用特定的语料库(例如布朗语料库)来测试您的算法,而不必自己生成任何样本。

回答by Joey

Sure, why not? 8 MiB isn't that much, actually. Even bad PRNGs have periods at least of a few billion and Java uses an 48-bit LCG. So yes, it should be ok.

当然,为什么不呢?实际上,8 MiB 并不算多。即使是糟糕的 PRNG 也至少有几十亿个周期,而 Java 使用 48 位 LCG。所以是的,应该没问题。

回答by dlopezgonzalez

This class of commons-lang library does that job

这类 commons-lang 库可以完成这项工作

org.apache.commons.lang.RandomStringUtils

org.apache.commons.lang.RandomStringUtils

You can use method "random"

您可以使用“随机”方法

String s = org.apache.commons.lang.RandomStringUtils.random(5, true, false);