如何在 Python 中以十六进制形式读取二进制文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34687516/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read binary files as hex in Python?
提问by Per Persson
I want to read a file with data, coded in hex format:
我想读取一个包含数据的文件,以十六进制格式编码:
01ff0aa121221aff110120...etc
the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)
这些文件包含 >100.000 个这样的字节,有些超过 1.000.000(它们来自 DNA 测序)
I tried the following code (and other similar):
我尝试了以下代码(和其他类似代码):
filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
a=f.read(1)
b=a.encode("hex")
c.append(b)
f.close()
This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!
这给每个字节单独的“aa”“01”“f1”等,这对我来说是完美的!
This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.
这适用于(在这种情况下)恰好是“1a”的第 905 字节。我还尝试了同样停在同一字节的 ord() 函数。
There might be a simple solution?
可能有一个简单的解决方案?
回答by ShadowRanger
Simple solution is binascii
:
简单的解决方案是binascii
:
import binascii
# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
# Slurp the whole file and efficiently convert it to hex all at once
hexdata = binascii.hexlify(f.read())
This just gets you a str
of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:
这只会为您提供一个str
十六进制值,但它比您尝试做的要快得多。如果你真的想要一堆长度为 2 的十六进制字符串为每个字节,你可以很容易地转换结果:
hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))
which will produce the list of len 2 str
s corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata
, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip
:
这将产生与str
每个字节的十六进制编码相对应的 len 2 s的列表。为了避免 的临时副本hexdata
,您可以使用类似但稍微不那么直观的方法,通过使用相同的迭代器两次来避免切片zip
:
hexlist = map(''.join, zip(*[iter(hexdata)]*2))
Update:
更新:
For people on Python 3.5 and higher, bytes
objects spawned a .hex()
method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:
对于使用 Python 3.5 及更高版本的人,bytes
对象产生了一个.hex()
方法,因此不需要模块将原始二进制数据转换为 ASCII 十六进制。顶部的代码块可以简化为:
with open('data.geno', 'rb') as f:
hexdata = f.read().hex()
回答by Dmitry Rubanovich
If the file is encoded in hex format, shouldn't each byte be represented by 2 characters? So
如果文件以十六进制格式编码,每个字节不应该用 2 个字符表示吗?所以
c=[]
with open('data.geno','rb') as f:
b = f.read(2)
while b:
c.append(b.decode('hex'))
b=f.read(2)
回答by Per Persson
Thanks for all interesting answers!
感谢所有有趣的答案!
The simple solution that worked immediately, was to change "r" to "rb", so:
立即起作用的简单解决方案是将“r”更改为“rb”,因此:
f=open('data.geno','r') # don't work
f=open('data.geno','rb') # works fine
The code in this case is actually only two binary bites, so one byte contains four data, binary; 00, 01, 10, 11.
本例中的代码实际上只有两个二进制位,所以一个字节包含四个数据,二进制;00、01、10、11。
Yours!
你的!
回答by D-slr8
Just an additional note to these, make sure to add a break into your .read of the file or it will just keep going.
只是对这些进行补充说明,请确保在您的 .read 文件中添加一个中断,否则它将继续运行。
def HexView():
with open(<yourfilehere>, 'rb') as in_file:
while True:
hexdata = in_file.read(16).hex() # I like to read 16 bytes in then new line it.
if len(hexdata) == 0: # breaks loop once no more binary data is read
break
print(hexdata.upper()) # I also like it all in caps.