Python 创建随机二进制文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14275975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:57:19  来源:igfitidea点击:

Creating random binary files

pythonrandom

提问by gardarh

I'm trying to use python to create a random binary file. This is what I've got already:

我正在尝试使用 python 创建一个随机二进制文件。这是我已经得到的:

f = open(filename,'wb')
for i in xrange(size_kb):
    for ii in xrange(1024/4):
        f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))

f.close()

But it's terribly slow (0.82 seconds for size_kb=1024 on my 3.9GHz SSD disk machine). A big bottleneck seems to be the random int generation (replacing the randint() with a 0 reduces running time from 0.82s to 0.14s).

但它非常慢(在我的 3.9GHz SSD 磁盘机上 size_kb=1024 为 0.82 秒)。一个很大的瓶颈似乎是随机 int 生成(用 0 替换 randint() 将运行时间从 0.82s 减少到 0.14s)。

Now I know there are more efficient ways of creating random data files (namely dd if=/dev/urandom) but I'm trying to figure this out for sake of curiosity... is there an obvious way to improve this?

现在我知道有更有效的方法来创建随机数据文件(即 dd if=/dev/urandom),但出于好奇,我试图弄清楚这一点......有没有明显的方法来改进这个?

采纳答案by Jon Clements

IMHO - the following is completely redundant:

恕我直言 - 以下是完全多余的:

f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))

There's absolutely no need to use struct.pack, just do something like:

绝对不需要使用struct.pack,只需执行以下操作:

import os

with open('output_file', 'wb') as fout:
    fout.write(os.urandom(1024)) # replace 1024 with size_kb if not unreasonably large

Then, if you need to re-use the file for reading integers, then struct.unpackthen.

然后,如果您需要重新使用该文件来读取整数,struct.unpack那么。

(my use case is generating a file for a unit test so I just need a file that isn't identical with other generated files).

(我的用例正在为单元测试生成一个文件,所以我只需要一个与其他生成的文件不同的文件)。

Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.

另一种选择是将 UUID4 写入文件,但由于我不知道确切的用例,我不确定这是否可行。

回答by tvuillemin

The python code you should write completely depends on the way you intend to use the random binary file. If you just need a "rather good" randomness for multiple purposes, then the code of Jon Clements is probably the best.

您应该编写的python 代码完全取决于您打算使用随机二进制文件的方式。如果您只是出于多种目的需要“相当好的”随机性,那么 Jon Clements 的代码可能是最好的。

However, on Linux OS at least, os.urandom relies on /dev/urandom, which is described in the Linux Kernel (drivers/char/random.c) as follows:

但是,至少在 Linux 操作系统上,os.urandom 依赖于 /dev/urandom,它在 Linux Kernel (drivers/char/random.c) 中描述如下:

The /dev/urandom device [...] will return as many bytes as are requested. As more and more random bytes are requested without giving time for the entropy pool to recharge, this will result in random numbers that are merely cryptographically strong. For many applications, however, this is acceptable.

/dev/urandom 设备 [...] 将根据请求返回尽可能多的字节。随着越来越多的随机字节被请求而没有给熵池充电的时间,这将导致随机数仅具有加密强度。然而,对于许多应用,这是可以接受的。

So the question is, is this acceptable for your application ? If you prefer a more secure RNG, you could read bytes on /dev/random instead. The main inconvenient of this device: it can block indefinitely if the Linux kernel is not able to gather enough entropy. There are also other cryptographically secure RNGs like EGD.

所以问题是,这对您的应用程序是否可以接受?如果您更喜欢更安全的 RNG,您可以在 /dev/random 上读取字节。此设备的主要不便之处:如果 Linux 内核无法收集足够的熵,它会无限期地阻塞。还有其他加密安全的 RNG,如EGD

Alternatively, if your main concern is execution speed and if you just need some "light-randomness" for a Monte-Carlo method (i.e unpredictability doesn't matter, uniform distribution does), you could consider generate your random binary file once and use it many times, at least for development.

或者,如果您主要关心的是执行速度,并且如果您只需要蒙特卡罗方法的一些“轻随机性”(即不可预测性无关紧要,均匀分布无关紧要),您可以考虑生成一次随机二进制文件并使用它很多次,至少是为了发展。