测试 python 字符串是否可打印
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3636928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Test if a python string is printable
提问by BCS
I have some code that pulls data from a com-port and I want to make sure that what I got really is a printable string (i.e. ASCII, maybe UTF-8) before printing it. Is there a function for doing this? The first half dozen places I looked, didn't have anything that looks like what I want. (string has printablebut I didn't see anything (there, or in the string methods) to check if every char in one string is in another.
我有一些代码可以从 com-port 中提取数据,我想在打印之前确保我得到的确实是一个可打印的字符串(即 ASCII,可能是 UTF-8)。有这样做的功能吗?我看了前六个地方,没有任何看起来像我想要的东西。(字符串具有可打印性,但我没有看到任何东西(在那里,或在字符串方法中)来检查一个字符串中的每个字符是否在另一个字符串中。
Note: control characters are notprintable for my purposes.
注意:出于我的目的,控制字符不可打印。
Edit: I was/am looking for a single function, not a roll-your-own solution:
编辑:我正在/正在寻找一个单一的功能,而不是你自己的解决方案:
What I ended up with is:
我最终得到的是:
all(ord(c) < 127 and c in string.printable for c in input_str)
回答by Alex Martelli
try/exceptseems the best way:
try/except似乎是最好的方法:
def isprintable(s, codec='utf8'):
try: s.decode(codec)
except UnicodeDecodeError: return False
else: return True
I would not rely on string.printable, which might deem "non-printable" control characters that can commonly be "printed" for terminal control purposes (e.g., in "colorization" ANSI escape sequences, if your terminal is ANSI-compliant). But that, of course, depends on your exact purposes for wanting to check this!-)
我不会依赖string.printable,它可能认为“不可打印”控制字符通常可以“打印”用于终端控制目的(例如,在“着色”ANSI 转义序列中,如果您的终端符合 ANSI 标准)。但是,当然,这取决于您想要检查这个的确切目的!-)
回答by Dave Webb
As you've said the stringmodule has printableso it's just a case of checking if all the characters in your string are in printable:
正如您所说,该string模块具有,printable因此它只是检查字符串中的所有字符是否都在的情况下printable:
>>> hello = 'Hello World!'
>>> bell = chr(7)
>>> import string
>>> all(c in string.printable for c in hello)
True
>>> all(c in string.printable for c in bell)
False
You could convert both strings to sets - so the set would contain each character in the string once - and check if the set created by your string is a subset ofthe printable characters:
您可以将两个字符串都转换为集合 - 因此该集合将包含字符串中的每个字符一次 - 并检查您的字符串创建的集合是否是可打印字符的子集:
>>> printset = set(string.printable)
>>> helloset = set(hello)
>>> bellset = set(bell)
>>> helloset
set(['!', ' ', 'e', 'd', 'H', 'l', 'o', 'r', 'W'])
>>> helloset.issubset(printset)
True
>>> set(bell).issubset(printset)
False
So, in summary, you would probably want to do this:
因此,总而言之,您可能想要这样做:
import string
printset = set(string.printable)
isprintable = set(yourstring).issubset(printset)
回答by JohnMudd
>>> # Printable
>>> s = 'test'
>>> len(s)+2 == len(repr(s))
True
>>> # Unprintable
>>> s = 'test\x00'
>>> len(s)+2 == len(repr(s))
False
回答by zvone
This Python 3 string contains all kinds of special characters:
这个 Python 3 字符串包含各种特殊字符:
s = 'abcd\x65\x66 ?ü??\xf1 \u00a0\u00a1\u00a2 漢字 \a\b\r\t\n\v\ 1\x9a \u2640\u2642\uffff'
If you try to show it in the console (or use repr), it makes a pretty good job of escaping all non-printable characters from that string:
如果您尝试在控制台中显示它(或使用repr),它可以很好地从该字符串中转义所有不可打印的字符:
>>> s
'abcdef ?ü??? \xa0?¢ 漢字 \x07\x08\r\t\n\x0b\ \x99\x9a ♀♂\uffff'
It is smart enough to recognise e.g. horizontal tab (\t) as printable, but vertical tab (\v) as not printable (shows up as \x0brather than \v).
它足够聪明,可以将水平标签 ( \t)识别为可打印,但将垂直标签 ( \v)识别为不可打印(显示为\x0b而不是\v)。
Every other non printable character also shows up as either \xNNor \uNNNNin the repr. Therefore, we can use that as the test:
每个其他不可打印的字符也显示为\xNN或\uNNNN在repr. 因此,我们可以将其用作测试:
def is_printable(s):
return not any(repr(ch).startswith("'\x") or repr(ch).startswith("'\u") for ch in s)
There may be some borderline characters, for example non-breaking white space (\xa0) is treated as non-printable here. Maybe it shouldn't be, but those special ones could then be hard-coded.
可能会有一些边界字符,例如不间断的空格 ( \xa0) 在此处被视为不可打印。也许它不应该是,但是那些特殊的可以被硬编码。
P.S.
聚苯乙烯
You could do this to extract only printable characters from a string:
您可以这样做以仅从字符串中提取可打印的字符:
>>> ''.join(ch for ch in s if is_printable(ch))
'abcdef ?ü??? ?¢ 漢字 \r\t\n\ ♀♂'
回答by gatkin
The categoryfunction from the unicodedatamodulemight suit your needs. For instance, you can use this to check whether there are any control characters in a string while still allowing non-ASCII characters.
模块中的category功能可能适合您的需求。例如,您可以使用它来检查字符串中是否有任何控制字符,同时仍然允许非 ASCII 字符。unicodedata
>>> import unicodedata
>>> def has_control_chars(s):
... return any(unicodedata.category(c) == 'Cc' for c in s)
>>> has_control_chars('Hello 世界')
False
>>> has_control_chars('Hello \x1f 世界')
True
回答by Jerrychayan
Mine is a solution to get rid of any known set of characters. it might help.
我的是摆脱任何已知字符集的解决方案。它可能会有所帮助。
non_printable_chars = set("\n\t\r ") # Space included intensionally
is_printable = lambda string:bool(set(string) - set(non_printable_chars))
...
...
if is_printable(string):
print("""do something""")
...
回答by Peter Glen
ctrlchar = "\n\r| "
# ------------------------------------------------------------------------
# This will let you control what you deem 'printable'
# Clean enough to display any binary
def isprint(chh):
if ord(chh) > 127:
return False
if ord(chh) < 32:
return False
if chh in ctrlchar:
return False
if chh in string.printable:
return True
return False
回答by Peter Glen
# Here is the full routine to display an arbitrary binary string
# Python 2
ctrlchar = "\n\r| "
# ------------------------------------------------------------------------
def isprint(chh):
if ord(chh) > 127:
return False
if ord(chh) < 32:
return False
if chh in ctrlchar:
return False
if chh in string.printable:
return True
return False
# ------------------------------------------------------------------------
# Return a hex dump formatted string
def hexdump(strx, llen = 16):
lenx = len(strx)
outx = ""
for aa in range(lenx/16):
outx += " "
for bb in range(16):
outx += "%02x " % ord(strx[aa * 16 + bb])
outx += " | "
for cc in range(16):
chh = strx[aa * 16 + cc]
if isprint(chh):
outx += "%c" % chh
else:
outx += "."
outx += " | \n"
# Print remainder on last line
remn = lenx % 16 ; divi = lenx / 16
if remn:
outx += " "
for dd in range(remn):
outx += "%02x " % ord(strx[divi * 16 + dd])
outx += " " * ((16 - remn) * 3)
outx += " | "
for cc in range(remn):
chh = strx[divi * 16 + cc]
if isprint(chh):
outx += "%c" % chh
else:
outx += "."
outx += " " * ((16 - remn))
outx += " | \n"
return(outx)
回答by thakis
In Python 3, strings have an isprintable()method:
在 Python 3 中,字符串有一个isprintable()方法:
>>> 'a, '.isprintable()
True
For Python 2.7, see Dave Webb's answer.
对于 Python 2.7,请参阅 Dave Webb 的回答。
回答by yunqimg
In the ASCII table, [\x20-\x7e] are printable characters.
Use regular expressions to check whether characters other than these characters are included in the string.
You can make sure whether this is a printable string.
在 ASCII 表中,[\x20-\x7e] 是可打印的字符。
使用正则表达式检查字符串中是否包含这些字符以外的字符。
您可以确定这是否是可打印的字符串。
>>> import re
>>> # Printable
>>> print re.search(r'[^\x20-\x7e]', 'test')
None
>>> # Unprintable
>>> re.search(r'[^\x20-\x7e]', 'test\x00') != None
True
>>> # Optional expression
>>> pattern = r'[^\t-\r\x20-\x7e]'

