如何在 Python 中以十六进制形式读取二进制文件？

Question

提问by Per Persson

I want to read a file with data, coded in hex format:

我想读取一个包含数据的文件，以十六进制格式编码：

01ff0aa121221aff110120...etc

the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)

这些文件包含 >100.000 个这样的字节，有些超过 1.000.000（它们来自 DNA 测序）

I tried the following code (and other similar):

我尝试了以下代码（和其他类似代码）：

filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
  a=f.read(1)
  b=a.encode("hex")
  c.append(b)
f.close()

This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!

这给每个字节单独的“aa”“01”“f1”等，这对我来说是完美的！

This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.

这适用于（在这种情况下）恰好是“1a”的第 905 字节。我还尝试了同样停在同一字节的 ord() 函数。

There might be a simple solution?

可能有一个简单的解决方案？

Answer 1

回答by ShadowRanger

Simple solution is binascii:

简单的解决方案是binascii：

import binascii

# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
    # Slurp the whole file and efficiently convert it to hex all at once
    hexdata = binascii.hexlify(f.read())

This just gets you a strof the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:

这只会为您提供一个str十六进制值，但它比您尝试做的要快得多。如果你真的想要一堆长度为 2 的十六进制字符串为每个字节，你可以很容易地转换结果：

hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))

which will produce the list of len 2 strs corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip:

这将产生与str每个字节的十六进制编码相对应的 len 2 s的列表。为了避免的临时副本hexdata，您可以使用类似但稍微不那么直观的方法，通过使用相同的迭代器两次来避免切片zip：

hexlist = map(''.join, zip(*[iter(hexdata)]*2))

Update:

更新：

For people on Python 3.5 and higher, bytesobjects spawned a .hex()method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:

对于使用 Python 3.5 及更高版本的人，bytes对象产生了一个.hex()方法，因此不需要模块将原始二进制数据转换为 ASCII 十六进制。顶部的代码块可以简化为：

with open('data.geno', 'rb') as f:
    hexdata = f.read().hex()

Answer 2

回答by Dmitry Rubanovich

If the file is encoded in hex format, shouldn't each byte be represented by 2 characters? So

如果文件以十六进制格式编码，每个字节不应该用 2 个字符表示吗？所以

c=[]
with open('data.geno','rb') as f:
    b = f.read(2)
    while b:
        c.append(b.decode('hex'))
        b=f.read(2)

Answer 3

回答by Per Persson

Thanks for all interesting answers!

感谢所有有趣的答案！

The simple solution that worked immediately, was to change "r" to "rb", so:

立即起作用的简单解决方案是将“r”更改为“rb”，因此：

f=open('data.geno','r')  # don't work
f=open('data.geno','rb')  # works fine

The code in this case is actually only two binary bites, so one byte contains four data, binary; 00, 01, 10, 11.

本例中的代码实际上只有两个二进制位，所以一个字节包含四个数据，二进制；00、01、10、11。

Yours!

你的！

Answer 4

回答by D-slr8

Just an additional note to these, make sure to add a break into your .read of the file or it will just keep going.

只是对这些进行补充说明，请确保在您的 .read 文件中添加一个中断，否则它将继续运行。

def HexView():
    with open(<yourfilehere>, 'rb') as in_file:
        while True:
            hexdata = in_file.read(16).hex()     # I like to read 16 bytes in then new line it.
            if len(hexdata) == 0:                # breaks loop once no more binary data is read
                break
            print(hexdata.upper())               # I also like it all in caps.

如何在 Python 中以十六进制形式读取二进制文件？

提问by Per Persson

回答by ShadowRanger

回答by Dmitry Rubanovich

回答by Per Persson

回答by D-slr8

相关推荐

最近更新

标签

如何在 Python 中以十六进制形式读取二进制文件？

提问by Per Persson

回答by ShadowRanger

回答by Dmitry Rubanovich

回答by Per Persson

回答by D-slr8

相关推荐

在python中将json转换为字符串

Python“IOError：[Errno 21]是一个目录：'/home/thomasshera/Pictures/Star Wars'”

python中的'Power of'

Python 每组具有标准化 y 轴的 Seaborn 计数图

相关推荐

最近更新

标签