如何检查字符串是否是有效的python标识符?包括关键字检查?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12700893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:38:19  来源:igfitidea点击:

How to check if a string is a valid python identifier? including keyword check?

pythonkeywordidentifierreserved

提问by Paul Molodowitch

Does anyone know if there is any builtin python method that will check if something is a valid python variable name, INCLUDING a check against reserved keywords? (so, ie, something like 'in' or 'for' would fail...)

有谁知道是否有任何内置的 python 方法可以检查某些东西是否是有效的 python 变量名,包括对保留关键字的检查?(所以,即像 'in' 或 'for' 这样的东西会失败......)

Failing that, does anyone know of where I can get a list of reserved keywords (ie, dyanamically, from within python, as opposed to copy-and-pasting something from the online docs)? Or, have another good way of writing your own check?

如果做不到这一点,有谁知道我可以从哪里获得保留关键字列表(即,动态地,从 python 中,而不是从在线文档中复制和粘贴某些内容)?或者,有另一种写自己支票的好方法吗?

Surprisingly, testing by wrapping a setattr in try/except doesn't work, as something like this:

令人惊讶的是,通过在 try/except 中包装 setattr 进行测试不起作用,如下所示:

setattr(myObj, 'My Sweet Name!', 23)

...actually works! (...and can even be retrieved with getattr!)

……确实有效!(...甚至可以用 getattr 检索!)

采纳答案by asmeurer

The keywordmodule contains the list of all reserved keywords:

keyword模块包含所有保留关键字的列表:

>>> import keyword
>>> keyword.iskeyword("in")
True
>>> keyword.kwlist
['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']

Note that this list will be different depending on what major version of Python you are using, as the list of keywords changes (especially between Python 2 and Python 3).

请注意,此列表将根据您使用的 Python 的主要版本而有所不同,因为关键字列表会发生变化(尤其是在 Python 2 和 Python 3 之间)。

If you also want all builtin names, use __builtins__

如果您还需要所有内置名称,请使用 __builtins__

>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__', '__doc__', '__import__', '__name__', '__package__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

And note that some of these (like copyright) are not really that big of a deal to override.

并请注意,其中一些(如copyright)并不是要覆盖的大问题。

One more caveat: note that in Python 2, True, False, and Noneare not considered keywords. However, assigning to Noneis a SyntaxError. Assigning to Trueor Falseis allowed, though not recommended (same with any other builtin). In Python 3, they are keywords, so this is not an issue.

还有一点需要注意:请注意,在 Python 2 中True,, False, 和None不被视为关键字。但是,分配给None是 SyntaxError。分配给TrueFalse是允许的,但不推荐(与任何其他内置函数相同)。在 Python 3 中,它们是关键字,所以这不是问题。

回答by Joran Beasley

The list of python keywords is short so you can just check syntax with a simple regex and membership in a relatively small list of keywords

python 关键字列表很短,因此您只需使用简单的正则表达式和相对较小的关键字列表中的成员资格即可检查语法

import keyword #thanks asmeurer
import re
my_var = "$testBadVar"
print re.match("[_A-Za-z][_a-zA-Z0-9]*",my_var) and not keyword.iskeyword(my_var)

a shorter but more dangerous alternative would be

一个更短但更危险的选择是

my_bad_var="%#ASD"
try:exec("{0}=1".format(my_bad_var))
except SyntaxError: #this maynot be right error
   print "Invalid variable name!"

and lastly a slightly safer variant

最后是一个稍微安全一点的变体

my_bad_var="%#ASD"

try:
  cc = compile("{0}=1".format(my_bad_var),"asd","single")
  eval(cc)
  print "VALID"
 except SyntaxError: #maybe different error
  print "INVALID!"

回答by Roeland Huys

John: as a slight improvement, I added a $ in the re, otherwise, the test does not detect spaces:

John:作为一个小小的改进,我在 re 里加了一个 $,否则测试不会检测到空格:

import keyword 
import re
my_var = "$testBadVar"
print re.match("[_A-Za-z][_a-zA-Z0-9]*$",my_var) and not keyword.iskeyword(my_var)

回答by toriningen

Python 3

蟒蛇 3

Python 3 now has 'foo'.isidentifier(), so that seems to be the best solution for recent Python versions (thanks fellow runciter@freenodefor suggestion). However, somewhat counter-intuitively, it does not check against the list of keywords, so combination of both must be used:

Python 3 现在有'foo'.isidentifier(),所以这似乎是最近 Python 版本的最佳解决方案(感谢runciter@freenode的建议)。然而,有点违反直觉,它不检查关键字列表,因此必须使用两者的组合:

import keyword

def isidentifier(ident: str) -> bool:
    """Determines if string is valid Python identifier."""

    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    if not ident.isidentifier():
        return False

    if keyword.iskeyword(ident):
        return False

    return True

Python 2

蟒蛇 2

For Python 2, easiest possible way to check if given string is valid Python identifier is to let Python parse it itself.

对于 Python 2,检查给定字符串是否为有效 Python 标识符的最简单方法是让 Python 自己解析它。

There are two possible approaches. Fastest is to use ast, and check if AST of single expression is of desired shape:

有两种可能的方法。最快的是使用ast,并检查单个表达式的 AST 是否具有所需的形状:

import ast

def isidentifier(ident):
    """Determines, if string is valid Python identifier."""

    # Smoke test — if it's not string, then it's not identifier, but we don't
    # want to just silence exception. It's better to fail fast.
    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    # Resulting AST of simple identifier is <Module [<Expr <Name "foo">>]>
    try:
        root = ast.parse(ident)
    except SyntaxError:
        return False

    if not isinstance(root, ast.Module):
        return False

    if len(root.body) != 1:
        return False

    if not isinstance(root.body[0], ast.Expr):
        return False

    if not isinstance(root.body[0].value, ast.Name):
        return False

    if root.body[0].value.id != ident:
        return False

    return True

Another is to let tokenizemodule split the identifier into the stream of tokens, and check it only contains our name:

另一种是让tokenize模块将标识符拆分为令牌流,并检查它是否只包含我们的名字:

import keyword
import tokenize

def isidentifier(ident):
    """Determines if string is valid Python identifier."""

    # Smoke test - if it's not string, then it's not identifier, but we don't
    # want to just silence exception. It's better to fail fast.
    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    # Quick test - if string is in keyword list, it's definitely not an ident.
    if keyword.iskeyword(ident):
        return False

    readline = lambda g=(lambda: (yield ident))(): next(g)
    tokens = list(tokenize.generate_tokens(readline))

    # You should get exactly 2 tokens
    if len(tokens) != 2:
        return False

    # First is NAME, identifier.
    if tokens[0][0] != tokenize.NAME:
        return False

    # Name should span all the string, so there would be no whitespace.
    if ident != tokens[0][1]:
        return False

    # Second is ENDMARKER, ending stream
    if tokens[1][0] != tokenize.ENDMARKER:
        return False

    return True

The same function, but compatible with Python 3, looks like this:

相同的函数,但与 Python 3 兼容,如下所示:

import keyword
import tokenize

def isidentifier_py3(ident):
    """Determines if string is valid Python identifier."""

    # Smoke test — if it's not string, then it's not identifier, but we don't
    # want to just silence exception. It's better to fail fast.
    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    # Quick test — if string is in keyword list, it's definitely not an ident.
    if keyword.iskeyword(ident):
        return False

    readline = lambda g=(lambda: (yield ident.encode('utf-8-sig')))(): next(g)
    tokens = list(tokenize.tokenize(readline))

    # You should get exactly 3 tokens
    if len(tokens) != 3:
        return False

    # If using Python 3, first one is ENCODING, it's always utf-8 because 
    # we explicitly passed in UTF-8 BOM with ident.
    if tokens[0].type != tokenize.ENCODING:
        return False

    # Second is NAME, identifier.
    if tokens[1].type != tokenize.NAME:
        return False

    # Name should span all the string, so there would be no whitespace.
    if ident != tokens[1].string:
        return False

    # Third is ENDMARKER, ending stream
    if tokens[2].type != tokenize.ENDMARKER:
        return False

    return True

However, be aware of bugs in Python 3 tokenizeimplementation that reject some completely valid identifiers like ??, ?and 贈?. astworks fine though. Generally, I'd advise against using tokenize-based implemetation for actual checks.

但是,请注意 Python 3tokenize实现中的错误,这些错误会拒绝某些完全有效的标识符,例如??,?贈?ast虽然工作正常。通常,我建议不要使用tokenize基于实现的实际检查。

Also, some may consider heavy machinery like AST parser to be a tad overkill. This simple implementation is self-contained and guaranteed to work on any Python 2:

此外,有些人可能认为像 AST 解析器这样的重型机器有点矫枉过正。这个简单的实现是自包含的,并保证适用于任何 Python 2:

import keyword
import string

def isidentifier(ident):
    """Determines if string is valid Python identifier."""

    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    if not ident:
        return False

    if keyword.iskeyword(ident):
        return False

    first = '_' + string.lowercase + string.uppercase
    if ident[0] not in first:
        return False

    other = first + string.digits
    for ch in ident[1:]:
        if ch not in other:
            return False

    return True

Here are few tests to check these all work:

以下是一些检查这些所有工作的测试:

assert(isidentifier('foo'))
assert(isidentifier('foo1_23'))
assert(not isidentifier('pass'))    # syntactically correct keyword
assert(not isidentifier('foo '))    # trailing whitespace
assert(not isidentifier(' foo'))    # leading whitespace
assert(not isidentifier('1234'))    # number
assert(not isidentifier('1234abc')) # number and letters
assert(not isidentifier(''))      # Unicode not from allowed range
assert(not isidentifier(''))        # empty string
assert(not isidentifier('   '))     # whitespace only
assert(not isidentifier('foo bar')) # several tokens
assert(not isidentifier('no-dashed-names-for-you')) # no such thing in Python

# Unicode identifiers are only allowed in Python 3:
assert(isidentifier('??')) # Unicode $Other_ID_Start and $Other_ID_Continue

Performance

表现

All measurements has been conducted on my machine (MBPr Mid 2014) on the same randomly generated test set of 1 500 000 elements, 1000 000 valid and 500 000 invalid. YMMV

所有测量都是在我的机器上(MBPr Mid 2014)在相同的随机生成的测试集上进行的,该测试集包含 1 500 000 个元素,1000 000 个有效元素和 500 000 个无效元素。青年会

== Python 3:
method | calls/sec | faster
---------------------------
token  |    48 286 |  1.00x
ast    |   175 530 |  3.64x
native | 1 924 680 | 39.86x

== Python 2:
method | calls/sec | faster
---------------------------
token  |    83 994 |  1.00x
ast    |   208 206 |  2.48x
simple | 1 066 461 | 12.70x