如何在我的 Oracle 数据库中生成随机样本数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6189275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 23:50:22  来源:igfitidea点击:

How do I generate random sample data in my Oracle database?

javasqloracledataset

提问by Justin Kredible

Does anyone know of a tool that can inspect a specified schema and generate random data based on the tables and columns of that schema?

有谁知道可以检查指定模式并根据该模式的表和列生成随机数据的工具?

采纳答案by Gary Myers

Another alternative is Swingbench Data Generator

另一种选择是Swingbench 数据生成器

It is useful to use the SAMPLEclause (for example generating order lines for a random combination of orders and products)

使用SAMPLE子句很有用(例如为订单和产品的随机组合生成订单行)

回答by APC

This is an interesting question. It is easy enough to generate random values - a simple loop round the data dictionary with calls to DBMS_RANDOM would do the trick.

这是个有趣的问题。生成随机值很容易——通过调用 DBMS_RANDOM 对数据字典进行简单循环就可以了。

Except for two things.

除了两件事。

One is, as @FrustratedWithForms points out, there is the complication of foreign key constraints. Let's tip lookup values (reference data) into the mix too.

一个是,正如@FrustratedWithForms 指出的那样,外键约束很复杂。让我们也将查找值(参考数据)加入到组合中。

The second is, random isn't very realistic. The main driver for using random data is a need for large volumes of data, probably for performance testing. But real datasets aren't random, they contain skews and clumps, variable string lengths, and of course patterns (especially where dates are concerned).

第二个是,随机不是很现实。使用随机数据的主要驱动因素是需要大量数据,可能用于性能测试。但真正的数据集不是随机的,它们包含倾斜和团块、可变的字符串长度,当然还有模式(尤其是在涉及日期的情况下)。

So, rather than trying to generate random data I suggest you try to get a real dataset. Ideally your user/customer will be able to provide one, preferably anonymized. Otherwise try taking something which is already in the public domain, and massage it to fit your specific requirements. The Info Chimps are the top bananas when it comes to these matters. Check them out.

因此,与其尝试生成随机数据,我建议您尝试获取真实数据集。理想情况下,您的用户/客户将能够提供一个,最好是匿名的。否则,尝试采用已经在公共领域中的东西,并对其进行按摩以满足您的特定要求。在这些问题上,信息黑猩猩是最棒的。 检查出来

回答by FrustratedWithFormsDesigner

Allround Automation's PL/SQL Developerhas a data generator tool. But be warned: it's a bit flaky - it seems to work fine on a single-table basis but gets tripped up when there are dependencies between tables.

Allround Automation 的PL/SQL Developer有一个数据生成器工具。但请注意:它有点不稳定 - 它似乎在单表的基础上工作正常,但当表之间存在依赖关系时会被绊倒。

I admit that eventually I just started writing my own SQL scripts to generate data. Turned out to be much more stable.

我承认最终我只是开始编写自己的 SQL 脚本来生成数据。结果发现稳定了很多。

回答by a_horse_with_no_name

Have a look at Databene Benerator.

看看Databene Benerator

It's a bit complicated to do the initial setup but is quite powerful.

进行初始设置有点复杂,但功能非常强大。

回答by Ian Carpenter

Bit of a wild card this one but thought I would mention it.

这个有点外卡,但我想我会提到它。

If you have data in a production environment that you can't use because it may contain sensitive information, Oracle have a product called "Oracle Data Masking" that will replace the sensitive information with realistic values.

如果生产环境中的数据可能包含敏感信息而无法使用,Oracle 有一款名为“Oracle Data Masking”的产品,可以用实际值替换敏感信息。

I don't know the cost of this product but if you want more information, it can be found here.

我不知道这个产品的成本,但如果你想了解更多信息,可以在这里找到。