Python 使用 Numpy fromfile 和给定的偏移量读取二进制文件

Question

提问by scls

I have a binary file which contains records of position of a plane. Each record look like:

我有一个包含飞机位置记录的二进制文件。每条记录看起来像：

0x00: Time, float32
0x04: X, float32 // X axis position
0x08: Y, float32 // Y axis position
0x0C: Elevation, float32
0x10: float32*4 = Quaternion (x,y,z axis and w scalar)
0x20: Distance, float32 (unused)

So each record is 32 bytes long.

所以每条记录的长度为 32 字节。

I would like to get a Numpy array.

我想得到一个 Numpy 数组。

At offset 1859 there is an unsigned int 32 (4 bytes) which indicates the number of elements of the array. 12019 in my case.

在偏移量 1859 处有一个 unsigned int 32（4 个字节），它指示数组的元素数。12019 就我而言。

I don't care (for now) header data (before offset 1859)

我不关心（现在）标题数据（偏移量 1859 之前）

Array only start at offset 1863 (=1859+4).

数组仅从偏移量 1863 (=1859+4) 开始。

I defined my own Numpy dtype like

我定义了我自己的 Numpy dtype

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

And I'm reading file using fromfile:

我正在使用fromfile以下文件读取文件：

a_bytes = np.fromfile(filename, dtype=dtype)

But I don't see any parameter to provide to fromfileto pass offset.

但我没有看到提供任何参数fromfile来传递偏移量。

Answer 1

采纳答案by reptilicus

You can open the file with a standard python file open, then seek to skip the header, then pass in the file object to fromfile. Something like this:

您可以在打开标准 python 文件的情况下打开文件，然后寻求跳过标题，然后将文件对象传递给fromfile. 像这样的东西：

import numpy as np
import os

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

f = open("myfile", "rb")
f.seek(1863, os.SEEK_SET)

data = np.fromfile(f, dtype=dtype)
print x

Answer 2

回答by Eugene Veselov

I faced a similar problem, but none of the answers above satisfied me. I needed to implement something like virtual table with a very big number of binary records that potentially occupied more memory than I can afford in one numpy array. So my question was how to read and write a small set of integers from/to a binary file - a subset of a file into a subset of numpy array.

我遇到了类似的问题，但上面的答案都没有让我满意。我需要实现类似虚拟表的东西，其中包含大量二进制记录，这些记录可能占用比我在一个 numpy 数组中所能承受的更多的内存。所以我的问题是如何从/向二进制文件读取和写入一小组整数 - 文件的子集到 numpy 数组的子集。

This is a solution that worked for me:

这是一个对我有用的解决方案：

import numpy as np
recordLen = 10 # number of int64's per record
recordSize = recordLen * 8 # size of a record in bytes
memArray = np.zeros(recordLen, dtype=np.int64) # a buffer for 1 record

# Create a binary file and open it for write+read
with open('BinaryFile.dat', 'w+b') as file:
    # Writing the array into the file as record recordNo:
    recordNo = 200 # the index of a target record in the file
    file.seek(recordSize * recordNo)
    bytes = memArray.tobytes()
    file.write(bytes)

    # Reading a record recordNo from file into the memArray
    file.seek(recordSize * recordNo)
    bytes = file.read(recordSize)
    memArray = np.frombuffer(bytes, dtype=np.int64).copy()
    # Note copy() added to make the memArray mutable

Python 使用 Numpy fromfile 和给定的偏移量读取二进制文件

提问by scls

采纳答案by reptilicus

回答by Eugene Veselov

相关推荐

最近更新

标签

Python 使用 Numpy fromfile 和给定的偏移量读取二进制文件

提问by scls

采纳答案by reptilicus

回答by Eugene Veselov

相关推荐

为什么“1000000000000000 in range(1000000000000001)”在Python 3中如此之快？

Python：类型错误：__init__() 正好有 2 个参数（给定 1 个）

Python @property 装饰器如何工作？

Python <type 'numpy.string_'> 和 <type 'str'> 类型有什么区别？

相关推荐

最近更新

标签

Python：类型错误：init() 正好有 2 个参数（给定 1 个）