Python 使用 Numpy fromfile 和给定的偏移量读取二进制文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30124255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 07:59:25  来源:igfitidea点击:

Read a binary file using Numpy fromfile and a given offset

pythonarraysnumpybinary

提问by scls

I have a binary file which contains records of position of a plane. Each record look like:

我有一个包含飞机位置记录的二进制文件。每条记录看起来像:

0x00: Time, float32
0x04: X, float32 // X axis position
0x08: Y, float32 // Y axis position
0x0C: Elevation, float32
0x10: float32*4 = Quaternion (x,y,z axis and w scalar)
0x20: Distance, float32 (unused)

So each record is 32 bytes long.

所以每条记录的长度为 32 字节。

I would like to get a Numpy array.

我想得到一个 Numpy 数组。

At offset 1859 there is an unsigned int 32 (4 bytes) which indicates the number of elements of the array. 12019 in my case.

在偏移量 1859 处有一个 unsigned int 32(4 个字节),它指示数组的元素数。12019 就我而言。

I don't care (for now) header data (before offset 1859)

我不关心(现在)标题数据(偏移量 1859 之前)

Array only start at offset 1863 (=1859+4).

数组仅从偏移量 1863 (=1859+4) 开始。

I defined my own Numpy dtype like

我定义了我自己的 Numpy dtype

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

And I'm reading file using fromfile:

我正在使用fromfile以下文件读取文件:

a_bytes = np.fromfile(filename, dtype=dtype)

But I don't see any parameter to provide to fromfileto pass offset.

但我没有看到提供任何参数fromfile来传递偏移量。

采纳答案by reptilicus

You can open the file with a standard python file open, then seek to skip the header, then pass in the file object to fromfile. Something like this:

您可以在打开标准 python 文件的情况下打开文件,然后寻求跳过标题,然后将文件对象传递给fromfile. 像这样的东西:

import numpy as np
import os

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

f = open("myfile", "rb")
f.seek(1863, os.SEEK_SET)

data = np.fromfile(f, dtype=dtype)
print x 

回答by Eugene Veselov

I faced a similar problem, but none of the answers above satisfied me. I needed to implement something like virtual table with a very big number of binary records that potentially occupied more memory than I can afford in one numpy array. So my question was how to read and write a small set of integers from/to a binary file - a subset of a file into a subset of numpy array.

我遇到了类似的问题,但上面的答案都没有让我满意。我需要实现类似虚拟表的东西,其中包含大量二进制记录,这些记录可能占用比我在一个 numpy 数组中所能承受的更多的内存。所以我的问题是如何从/向二进制文件读取和写入一小组整数 - 文件的子集到 numpy 数组的子集。

This is a solution that worked for me:

这是一个对我有用的解决方案:

import numpy as np
recordLen = 10 # number of int64's per record
recordSize = recordLen * 8 # size of a record in bytes
memArray = np.zeros(recordLen, dtype=np.int64) # a buffer for 1 record

# Create a binary file and open it for write+read
with open('BinaryFile.dat', 'w+b') as file:
    # Writing the array into the file as record recordNo:
    recordNo = 200 # the index of a target record in the file
    file.seek(recordSize * recordNo)
    bytes = memArray.tobytes()
    file.write(bytes)

    # Reading a record recordNo from file into the memArray
    file.seek(recordSize * recordNo)
    bytes = file.read(recordSize)
    memArray = np.frombuffer(bytes, dtype=np.int64).copy()
    # Note copy() added to make the memArray mutable