Python 将字节字符串拆分为行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13857856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:45:54  来源:igfitidea点击:

split byte string into lines

pythonpython-3.x

提问by Flavius

How can I split a byte string into a list of lines?

如何将字节字符串拆分为行列表?

In python 2 I had:

在 python 2 中,我有:

rest = "some\nlines"
for line in rest.split("\n"):
    print line

The code above is simplified for the sake of brevity, but now after some regex processing, I have a byte array in restand I need to iterate the lines.

为了简洁起见,上面的代码被简化了,但现在经过一些正则表达式处理,我有一个字节数组rest,我需要迭代这些行。

采纳答案by Janus Troelsen

There is no reason to convert to string. Just give splitbytes parameters. Split strings with strings, bytes with bytes.

没有理由转换为字符串。只需给出split字节参数。用字符串分割字符串,用字节分割字节。

Python 3.2.3 (default, Oct 19 2012, 19:53:57) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = b'asdf\nasdf'
>>> a.split(b'\n')
[b'asdf', b'asdf']

回答by warvariuc

Decode the bytes into unicode (str) and then use str.split:

将字节解码为 un​​icode (str),然后使用str.split

Python 3.2.3 (default, Oct 19 2012, 19:53:16) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = b'asdf\nasdf'
>>> a.split('\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Type str doesn't support the buffer API
>>> a = a.decode()
>>> a.split('\n')
['asdf', 'asdf']
>>> 

You can also split by b'\n', but I guess you have to work with strings not bytes anyway. So convert all your input data to stras soon as possible and work only with unicode in your code and convert it to bytes when needed for output as late as possible.

您也可以拆分为b'\n',但我想您无论如何都必须使用字符串而不是字节。因此str,尽快将所有输入数据转换为,并且仅在代码中使用 unicode,并在需要时将其转换为字节以尽可能晚地输出。

回答by namit

try this.. .

rest = b"some\nlines"
rest=rest.decode("utf-8")

then you can do rest.split("\n")

尝试这个.. 。

rest = b"some\nlines"
rest=rest.decode("utf-8")

那么你可以做rest.split("\n")