在 python 列表中替换 \x00 的最佳方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16071461/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:42:26  来源:igfitidea点击:

Best way to replace \x00 in python lists?

pythonregexreplace

提问by user2292661

I have a list of values from a parsed PE file that include /x00 null bytes at the end of each section. I want to be able to remove the /x00 bytes from the string without removing all "x"s from the file. I have tried doing .replace and re.sub, but not which much success.

我有一个来自解析的 PE 文件的值列表,每个部分的末尾都包含 /x00 空字节。我希望能够从字符串中删除 /x00 字节而不从文件中删除所有“x”。我试过做 .replace 和 re.sub,但没有多大成功。

Using Python 2.6.6

使用 Python 2.6.6

Example.

例子。

import re

List = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]

while count < len(List):
    test = re.sub('\\x00', '', str(list[count])
    print test
    count += 1

>>>test  (removes x, but I want to keep it) #changed from tet to test
>>>data
>>>rsrc

I want to get the following output

我想得到以下输出

text data rsrc

文本数据 rsrc

Any ideas on the best way of going about this?

关于解决此问题的最佳方式的任何想法?

采纳答案by jamylak

>>> L = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]
>>> [[x[0]] for x in L]
[['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]
>>> [[x[0].replace('\x00', '')] for x in L]
[['.text'], ['.data'], ['.rsrc']]

Or to modify the list in place instead of creating a new one:

或者就地修改列表而不是创建新列表:

for x in L:
    x[0] = x[0].replace('\x00', '')

回答by thkang

from itertools import chain

List = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]    
new_list = [x.replace("\x00", "") for x in chain(*List)]
#['.text', '.data', '.rsrc']

回答by Chris Doggett

Try a unicode pattern, like this:

尝试一个 unicode 模式,像这样:

re.sub(u'\x00', '', s)

It should give the following results:

它应该给出以下结果:

l = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]
for x in l:
    for s in l:
        print re.sub(u'\x00', '', s)
        count += 1

.text
.data
.rsrc

Or, using list comprehensions:

或者,使用列表推导式:

[[re.sub(u'\x00', '', s) for s in x] for x in l]

Actually, should work without the 'u' in front of the string. Just remove the first 3 slashes, and use this as your regex pattern:

实际上,应该在字符串前面没有 'u' 的情况下工作。只需删除前 3 个斜杠,并将其用作您的正则表达式模式:

'\x00'

回答by Luka Rahne

lst = (i[0].rstrip('\x00') for i in List)
for j in lst: 
   print j,

回答by martineau

What you're really wanting to do is replace '\x00'characters in stringsin a list.

您真正想要做的是替换列表'\x00'字符串中的字符。

Towards that goal, people often overlook the fact that in Python 2 the non-Unicode string translate()method will also optionally (or only) delete 8-bit characters as illustrated below. (It doesn't accept this argument in Python 3 because strings are Unicode objects by default.)

为了这个目标,人们经常忽略这样一个事实,即在 Python 2 中,非 Unicode 字符串translate()方法也可以选择(或仅)删除 8 位字符,如下图所示。(它在 Python 3 中不接受这个参数,因为字符串默认是 Unicode 对象。)

Your Listdata structure seems a little odd, since it's a list of one-element lists consisting of just single strings. Regardless, in the code below I've renamed it sectionssince Capitalized words should only be used for the names of classes according to PEP 8 -- Style Guide for Python Code.

你的List数据结构看起来有点奇怪,因为它是一个由单个字符串组成的单元素列表。无论如何,在下面的代码中,我已将其重命名,sections因为根据PEP 8 - Python 代码风格指南,大写单词只能用于类的名称。

sections = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]

for section in sections:
    test = section[0].translate(None, '\x00')
    print test

Output:

输出:

.text
.data
.rsrc

回答by Atri Basu

I think a better way to take care of this particular problem is to use the following function:

我认为处理这个特定问题的更好方法是使用以下函数:

import string

for item  in List:
  filter(lambda x: x in string.printable, str(item))

This will get rid of not just \x00 but any other such hex values that are appended to your string.

这不仅会消除 \x00,还会消除附加到字符串的任何其他此类十六进制值。