python Python二进制数据读取

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1591920/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:38:48  来源:igfitidea点击:

Python binary data reading

pythonbinary-data

提问by Hyman

A urllib2 request receives binary response as below:

一个 urllib2 请求接收二进制响应如下:

00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF

Its structure is:

它的结构是:

DATA, TYPE, DESCRIPTION

00 00 00 01, 4 bytes, Symbol Count =1

00 04, 2 bytes, Symbol Length = 4

41 4D 54 44, 6 bytes, Symbol = AMTD

00, 1 byte, Error code = 0 (OK)

00 00 00 02, 4 bytes, Bar Count =  2

FIRST BAR

41 97 33 33, 4 bytes, Close = 18.90

41 99 5C 29, 4 bytes, High = 19.17

41 90 3D 71, 4 bytes, Low = 18.03

41 91 D7 0A, 4 bytes, Open = 18.23

47 0F C6 14, 4 bytes, Volume = 3,680,608

00 00 01 16 6A E0 68 80, 8 bytes, Timestamp = November 23,2007

SECOND BAR

41 93 B4 05, 4 bytes, Close = 18.4629

41 97 1E B8, 4 bytes, High = 18.89

41 90 7A E1, 4 bytes, Low = 18.06

41 96 8F 57, 4 bytes, Open = 18.82

46 E6 2E 80, 4 bytes, Volume = 2,946,325

00 00 01 16 7A 53 7C 80, 8 bytes, Timestamp = November 26,2007

TERMINATOR

FF FF, 2 bytes,

How to read binary data like this?

如何读取这样的二进制数据?

Thanks in advance.

提前致谢。

Update:

更新:

I tried struct module on first 6 bytes with following code:

我使用以下代码在前 6 个字节上尝试了 struct 模块:

struct.unpack('ih', response.read(6))

(16777216, 1024)

(16777216, 1024)

But it should output (1, 4). I take a look at the manual but have no clue what was wrong.

但它应该输出 (1, 4)。我看了一下手册,但不知道出了什么问题。

回答by Alex Martelli

So here's my best shot at interpreting the data you're giving...:

所以这是我解释你提供的数据的最好方法......:

import datetime
import struct

class Printable(object):
  specials = ()
  def __str__(self):
    resultlines = []
    for pair in self.__dict__.items():
      if pair[0] in self.specials: continue
      resultlines.append('%10s %s' % pair)
    return '\n'.join(resultlines)

head_fmt = '>IH6sBH'
head_struct = struct.Struct(head_fmt)
class Header(Printable):
  specials = ('bars',)
  def __init__(self, symbol_count, symbol_length,
               symbol, error_code, bar_count):
    self.__dict__.update(locals())
    self.bars = []
    del self.self

bar_fmt = '>5fQ'
bar_struct = struct.Struct(bar_fmt)
class Bar(Printable):
  specials = ('header',)
  def __init__(self, header, close, high, low,
               open, volume, timestamp):
    self.__dict__.update(locals())
    self.header.bars.append(self)
    del self.self
    self.timestamp /= 1000.0
    self.timestamp = datetime.date.fromtimestamp(self.timestamp)

def showdata(data):
  terminator = '\xff' * 2
  assert data[-2:] == terminator
  head_data = head_struct.unpack(data[:head_struct.size])
  try:
    assert head_data[4] * bar_struct.size + head_struct.size == \
           len(data) - len(terminator)
  except AssertionError:
    print 'data length is %d' % len(data)
    print 'head struct size is %d' % head_struct.size
    print 'bar struct size is %d' % bar_struct.size
    print 'number of bars is %d' % head_data[4]
    print 'head data:', head_data
    print 'terminator:', terminator
    print 'so, something is wrong, since',
    print head_data[4] * bar_struct.size + head_struct.size, '!=',
    print len(data) - len(terminator)
    raise

  head = Header(*head_data)
  for i in range(head.bar_count):
    bar_substr = data[head_struct.size + i * bar_struct.size:
                      head_struct.size + (i+1) * bar_struct.size]
    bar_data = bar_struct.unpack(bar_substr)
    Bar(head, *bar_data)
  assert len(head.bars) == head.bar_count
  print head
  for i, x in enumerate(head.bars):
    print 'Bar #%s' % i
    print x

datas = '''
00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF
'''

data = ''.join(chr(int(x, 16)) for x in datas.split())
showdata(data)

this emits:

这发出:

symbol_count 1
 bar_count 2
    symbol AMTD
error_code 0
symbol_length 4
Bar #0
    volume 36806.078125
 timestamp 2007-11-22
      high 19.1700000763
       low 18.0300006866
     close 18.8999996185
      open 18.2299995422
Bar #1
    volume 29463.25
 timestamp 2007-11-25
      high 18.8899993896
       low 18.0599994659
     close 18.4629001617
      open 18.8199901581

...which seems to be pretty close to what you want, net of some output formatting details. Hope this helps!-)

...这似乎与您想要的非常接近,除去一些输出格式的细节。希望这可以帮助!-)

回答by jfs

>>> data
'\x00\x00\x00\x01\x00\x04AMTD\x00\x00\x00\x00\x02A\x9733A\x99\)A\x90=qA\x91\xd7\nG\x0f\xc6\x14\x00\x00\x01\x16j\xe0h\x80A\x93\xb4\x05A\x97\x1e\xb8A\x90z\xe1A\x96\x8fWF\xe6.\x80\x00\x00\x01\x16zS|\x80\xff\xff'
>>> from struct import unpack, calcsize
>>> scount, slength = unpack("!IH", data[:6])
>>> assert scount == 1
>>> symbol, error_code = unpack("!%dsb" % slength, data[6:6+slength+1])
>>> assert error_code == 0
>>> symbol
'AMTD'
>>> bar_count = unpack("!I", data[6+slength+1:6+slength+1+4])
>>> bar_count
(2,)
>>> bar_format = "!5fQ"                                                         
>>> from collections import namedtuple
>>> Bar = namedtuple("Bar", "Close High Low Open Volume Timestamp")             
>>> b = Bar(*unpack(bar_format, data[6+slength+1+4:6+slength+1+4+calcsize(bar_format)]))
>>> b
Bar(Close=18.899999618530273, High=19.170000076293945, Low=18.030000686645508, Open=18.229999542236328, Volume=36806.078125, Timestamp=1195794000000L)
>>> import time
>>> time.ctime(b.Timestamp//1000)
'Fri Nov 23 08:00:00 2007'
>>> int(b.Volume*100 + 0.5)
3680608

回答by mhawke

>>> struct.unpack('ih', response.read(6))
(16777216, 1024)
>>> struct.unpack('ih', response.read(6))
(16777216, 1024)

You are unpacking big-endian data on a little-endian machine. Try this instead:

您正在小端机器上解包大端数据。试试这个:

>>> struct.unpack('!IH', response.read(6))
(1L, 4)

This tells unpack to consider the data in network-order (big-endian). Also, the values of counts and lengths can not be negative, so you should should use the unsigned variants in your format string.

这告诉解包以网络顺序(大端)考虑数据。此外,计数和长度的值不能为负,因此您应该在格式字符串中使用无符号变体。

回答by monkut

Take a look at the struct.unpackin the struct module.

看一下struct 模块中的struct.unpack

回答by monkut

Use pack/unpack functions from "struct" package. More info here http://docs.python.org/library/struct.html

使用“struct”包中的打包/解包函数。更多信息在这里http://docs.python.org/library/struct.html

Bye!

再见!

回答by Andrey Vlasovskikh

As it was already mentioned, structis the module you need to use.

正如已经提到的,struct是您需要使用的模块。

Please read its documentation to learn about byte ordering, etc.

请阅读其文档以了解字节顺序等。

In your example you need to do the following (as your data is big-endian and unsigned):

在您的示例中,您需要执行以下操作(因为您的数据是 big-endian 且未签名):

>>> import struct
>>> x = '\x00\x00\x00\x01\x00\x04'
>>> struct.unpack('>IH', x)
(1, 4)