测试 python 字符串是否可打印

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3636928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:01:47  来源:igfitidea点击:

Test if a python string is printable

pythonstring

提问by BCS

I have some code that pulls data from a com-port and I want to make sure that what I got really is a printable string (i.e. ASCII, maybe UTF-8) before printing it. Is there a function for doing this? The first half dozen places I looked, didn't have anything that looks like what I want. (string has printablebut I didn't see anything (there, or in the string methods) to check if every char in one string is in another.

我有一些代码可以从 com-port 中提取数据,我想在打印之前确保我得到的确实是一个可打印的字符串(即 ASCII,可能是 UTF-8)。有这样做的功能吗?我看了前六个地方,没有任何看起来像我想要的东西。(字符串具有可打印性,但我没有看到任何东西(在那里,或在字符串方法中)来检查一个字符串中的每个字符是否在另一个字符串中。

Note: control characters are notprintable for my purposes.

注意:出于我的目的,控制字符不可打印。



Edit: I was/am looking for a single function, not a roll-your-own solution:

编辑:我正在/正在寻找一个单一的功能,而不是你自己的解决方案:

What I ended up with is:

我最终得到的是:

all(ord(c) < 127 and c in string.printable for c in input_str)

回答by Alex Martelli

try/exceptseems the best way:

try/except似乎是最好的方法:

def isprintable(s, codec='utf8'):
    try: s.decode(codec)
    except UnicodeDecodeError: return False
    else: return True

I would not rely on string.printable, which might deem "non-printable" control characters that can commonly be "printed" for terminal control purposes (e.g., in "colorization" ANSI escape sequences, if your terminal is ANSI-compliant). But that, of course, depends on your exact purposes for wanting to check this!-)

我不会依赖string.printable,它可能认为“不可打印”控制字符通常可以“打印”用于终端控制目的(例如,在“着色”ANSI 转义序列中,如果您的终端符合 ANSI 标准)。但是,当然,这取决于您想要检查这个的确切目的!-)

回答by Dave Webb

As you've said the stringmodule has printableso it's just a case of checking if all the characters in your string are in printable:

正如您所说,该string模块具有,printable因此它只是检查字符串中的所有字符是否都在的情况下printable

>>> hello = 'Hello World!'
>>> bell = chr(7)
>>> import string
>>> all(c in string.printable for c in hello)
True
>>> all(c in string.printable for c in bell)
False

You could convert both strings to sets - so the set would contain each character in the string once - and check if the set created by your string is a subset ofthe printable characters:

您可以将两个字符串都转换为集合 - 因此该集合将包含字符串中的每个字符一次 - 并检查您的字符串创建的集合是否是可打印字符的子集

>>> printset = set(string.printable)
>>> helloset = set(hello)
>>> bellset = set(bell)
>>> helloset
set(['!', ' ', 'e', 'd', 'H', 'l', 'o', 'r', 'W'])
>>> helloset.issubset(printset)
True
>>> set(bell).issubset(printset)
False

So, in summary, you would probably want to do this:

因此,总而言之,您可能想要这样做:

import string
printset = set(string.printable)
isprintable = set(yourstring).issubset(printset)

回答by JohnMudd

>>> # Printable
>>> s = 'test'
>>> len(s)+2 == len(repr(s))
True

>>> # Unprintable
>>> s = 'test\x00'
>>> len(s)+2 == len(repr(s))
False

回答by zvone

This Python 3 string contains all kinds of special characters:

这个 Python 3 字符串包含各种特殊字符:

s = 'abcd\x65\x66 ?ü??\xf1 \u00a0\u00a1\u00a2 漢字 \a\b\r\t\n\v\ 1\x9a \u2640\u2642\uffff'

If you try to show it in the console (or use repr), it makes a pretty good job of escaping all non-printable characters from that string:

如果您尝试在控制台中显示它(或使用repr),它可以很好地从该字符串中转义所有不可打印的字符:

>>> s
'abcdef ?ü??? \xa0?¢ 漢字 \x07\x08\r\t\n\x0b\ \x99\x9a ♀♂\uffff'

It is smart enough to recognise e.g. horizontal tab (\t) as printable, but vertical tab (\v) as not printable (shows up as \x0brather than \v).

它足够聪明,可以将水平标签 ( \t)识别为可打印,但将垂直标签 ( \v)识别为不可打印(显示为\x0b而不是\v)。

Every other non printable character also shows up as either \xNNor \uNNNNin the repr. Therefore, we can use that as the test:

每个其他不可打印的字符也显示为\xNN\uNNNNrepr. 因此,我们可以将其用作测试:

def is_printable(s):
    return not any(repr(ch).startswith("'\x") or repr(ch).startswith("'\u") for ch in s)

There may be some borderline characters, for example non-breaking white space (\xa0) is treated as non-printable here. Maybe it shouldn't be, but those special ones could then be hard-coded.

可能会有一些边界字符,例如不间断的空格 ( \xa0) 在此处被视为不可打印。也许它不应该是,但是那些特殊的可以被硬编码。



P.S.

聚苯乙烯

You could do this to extract only printable characters from a string:

您可以这样做以仅从字符串中提取可打印的字符:

>>> ''.join(ch for ch in s if is_printable(ch))
'abcdef ?ü??? ?¢ 漢字 \r\t\n\  ♀♂'

回答by gatkin

The categoryfunction from the unicodedatamodulemight suit your needs. For instance, you can use this to check whether there are any control characters in a string while still allowing non-ASCII characters.

模块中category功能可能适合您的需求。例如,您可以使用它来检查字符串中是否有任何控制字符,同时仍然允许非 ASCII 字符。unicodedata

>>> import unicodedata

>>> def has_control_chars(s):
...     return any(unicodedata.category(c) == 'Cc' for c in s)

>>> has_control_chars('Hello 世界')
False

>>> has_control_chars('Hello \x1f 世界')
True

回答by Jerrychayan

Mine is a solution to get rid of any known set of characters. it might help.

我的是摆脱任何已知字符集的解决方案。它可能会有所帮助。

non_printable_chars = set("\n\t\r ")     # Space included intensionally
is_printable = lambda string:bool(set(string) - set(non_printable_chars))
...
...
if is_printable(string):
    print("""do something""")

...

回答by Peter Glen

ctrlchar = "\n\r| "

# ------------------------------------------------------------------------
# This will let you control what you deem 'printable'
# Clean enough to display any binary 

def isprint(chh):
    if ord(chh) > 127:
        return False
    if ord(chh) < 32:
        return False
    if chh in ctrlchar:
        return False
    if chh in string.printable:
        return True
    return False

回答by Peter Glen

# Here is the full routine to display an arbitrary binary string
# Python 2

ctrlchar = "\n\r| "

# ------------------------------------------------------------------------

def isprint(chh):
    if ord(chh) > 127:
        return False
    if ord(chh) < 32:
        return False
    if chh in ctrlchar:
        return False
    if chh in string.printable:
        return True
    return False


# ------------------------------------------------------------------------
# Return a hex dump formatted string

def hexdump(strx, llen = 16):
    lenx = len(strx)
    outx = ""
    for aa in range(lenx/16):
        outx += " "
        for bb in range(16):
            outx += "%02x " % ord(strx[aa * 16 + bb])
        outx += " | "     
        for cc in range(16):
            chh = strx[aa * 16 + cc]
            if isprint(chh):
                outx += "%c" % chh
            else:
                outx += "."
        outx += " | \n"

    # Print remainder on last line
    remn = lenx % 16 ;   divi = lenx / 16
    if remn:
        outx += " "
        for dd in range(remn):
            outx += "%02x " % ord(strx[divi * 16 + dd])
        outx += " " * ((16 - remn) * 3) 
        outx += " | "     
        for cc in range(remn):
            chh = strx[divi * 16 + cc]
            if isprint(chh):
                outx += "%c" % chh
            else:
                outx += "."
        outx += " " * ((16 - remn)) 
        outx += " | \n"


    return(outx)

回答by thakis

In Python 3, strings have an isprintable()method:

在 Python 3 中,字符串有一个isprintable()方法:

>>> 'a, '.isprintable()
True

For Python 2.7, see Dave Webb's answer.

对于 Python 2.7,请参阅 Dave Webb 的回答。

回答by yunqimg

In the ASCII table, [\x20-\x7e] are printable characters.
Use regular expressions to check whether characters other than these characters are included in the string.
You can make sure whether this is a printable string.

在 ASCII 表中,[\x20-\x7e] 是可打印的字符。
使用正则表达式检查字符串中是否包含这些字符以外的字符。
您可以确定这是否是可打印的字符串。

>>> import re

>>> # Printable
>>> print re.search(r'[^\x20-\x7e]', 'test')
None

>>> # Unprintable
>>> re.search(r'[^\x20-\x7e]', 'test\x00') != None
True

>>> # Optional expression
>>> pattern = r'[^\t-\r\x20-\x7e]'