如何在 Python 的字符串中查找空字节?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18970830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:28:26  来源:igfitidea点击:

How to find null byte in a string in Python?

pythonstringlistnullbyte

提问by user2806298

I'm having an issue parsing data after reading a file. What I'm doing is reading a binary file in and need to create a list of attributes from the read file all of the data in the file is terminated with a null byte. What I'm trying to do is find every instance of a null byte terminated attribute.

我在读取文件后解析数据时遇到问题。我正在做的是读取一个二进制文件,并需要从读取的文件中创建一个属性列表,文件中的所有数据都以空字节终止。我想要做的是找到空字节终止属性的每个实例。

Essentially taking a string like

基本上采取一个字符串

Health\x00experience\x00charactername\x00

and storing it in a list.

并将其存储在列表中。

The real issue is I need to keep the null bytes in tact, I just need to be able to find each instance of a null byte and store the data that precedes it.

真正的问题是我需要保持空字节完整,我只需要能够找到空字节的每个实例并存储它之前的数据。

采纳答案by kalhartt

While it boils down to using split('\x00')a convenience wrapper might be nice.

虽然归结为使用split('\x00')便利包装器可能很好。

def readlines(f, bufsize):
    buf = ""
    data = True
    while data:
        data = f.read(bufsize)
        buf += data
        lines = buf.split('\x00')
        buf = lines.pop()
        for line in lines:
            yield line + '\x00'
    yield buf + '\x00'

then you can do something like

然后你可以做类似的事情

with open('myfile', 'rb') as f:
    mylist = [item for item in readlines(f, 524288)]

This has the added benefit of not needing to load the entire contents into memory before splitting the text.

这有一个额外的好处,即在拆分文本之前不需要将整个内容加载到内存中。

回答by abarnert

Python doesn't treat NUL bytes as anything special; they're no different from spaces or commas. So, split()works fine:

Python 不会将 NUL 字节视为任何特殊内容;它们与空格或逗号没有区别。所以,split()工作正常:

>>> my_string = "Health\x00experience\x00charactername\x00"
>>> my_string.split('\x00')
['Health', 'experience', 'charactername', '']

Note that splitis treating \x00as a separator, not a terminator, so we get an extra empty string at the end. If that's a problem, you can just slice it off:

请注意,它split\x00视为分隔符,而不是终止符,因此我们在末尾得到一个额外的空字符串。如果这是一个问题,你可以把它切掉:

>>> my_string.split('\x00')[:-1]
['Health', 'experience', 'charactername']

回答by Tim Peters

Split on null bytes; .split()returns a list:

拆分空字节;.split()返回一个列表:

>> print("Health\x00experience\x00charactername\x00".split("\x00"))
['Health', 'experience', 'charactername', '']

If you know the data always ends with a null byte, you can slice the list to chop off the last empty string (like result_list[:-1]).

如果您知道数据总是以空字节结尾,您可以对列表进行切片以切掉最后一个空字符串(如result_list[:-1])。

回答by kenorb

To check if string has NULL byte, simply use inoperator, for example:

要检查字符串是否有 NULL 字节,只需使用in运算符,例如:

if b'\x00' in data:

To find the position of it, use find()which would return the lowest index in the string where substring sub is found. Then use optional arguments startand endfor slice notation.

要找到它的位置,使用find()which 将返回字符串中找到子字符串 sub 的最低索引。然后使用可选参数startend进行切片表示法。