Python,如何解析字符串看起来像 sys.argv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/899276/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python, how to parse strings to look like sys.argv
提问by Gregg Lind
I would like to parse a string like this:
我想解析这样的字符串:
-o 1 --long "Some long string"
into this:
进入这个:
["-o", "1", "--long", 'Some long string']
or similar.
或类似。
This is different than either getopt, or optparse, which startwith sys.argv parsed input (like the output I have above). Is there a standard way to do this? Basically, this is "splitting" while keeping quoted strings together.
这与 getopt 或 optparse 不同,后者以 sys.argv 解析的输入开始(如我上面的输出)。有没有标准的方法来做到这一点?基本上,这是“拆分”,同时将引用的字符串保持在一起。
My best function so far:
到目前为止我最好的功能:
import csv
def split_quote(string,quotechar='"'):
'''
>>> split_quote('--blah "Some argument" here')
['--blah', 'Some argument', 'here']
>>> split_quote("--blah 'Some argument' here", quotechar="'")
['--blah', 'Some argument', 'here']
'''
s = csv.StringIO(string)
C = csv.reader(s, delimiter=" ",quotechar=quotechar)
return list(C)[0]
回答by Jacob Gabrielson
回答by Craig McQueen
Before I was aware of shlex.split, I made the following:
在我意识到之前shlex.split,我做了以下事情:
import sys
_WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))
_QUOTE_CHARS_DICT = {
'\': '\',
' ': ' ',
'"': '"',
'r': '\r',
'n': '\n',
't': '\t',
}
def _raise_type_error():
raise TypeError("Bytes must be decoded to Unicode first")
def parse_to_argv_gen(instring):
is_in_quotes = False
instring_iter = iter(instring)
join_string = instring[0:0]
c_list = []
c = ' '
while True:
# Skip whitespace
try:
while True:
if not isinstance(c, str) and sys.version_info[0] >= 3:
_raise_type_error()
if c not in _WORD_DIVIDERS:
break
c = next(instring_iter)
except StopIteration:
break
# Read word
try:
while True:
if not isinstance(c, str) and sys.version_info[0] >= 3:
_raise_type_error()
if not is_in_quotes and c in _WORD_DIVIDERS:
break
if c == '"':
is_in_quotes = not is_in_quotes
c = None
elif c == '\':
c = next(instring_iter)
c = _QUOTE_CHARS_DICT.get(c)
if c is not None:
c_list.append(c)
c = next(instring_iter)
yield join_string.join(c_list)
c_list = []
except StopIteration:
yield join_string.join(c_list)
break
def parse_to_argv(instring):
return list(parse_to_argv_gen(instring))
This works with Python 2.x and 3.x. On Python 2.x, it works directly with byte strings and Unicode strings. On Python 3.x, it onlyaccepts [Unicode] strings, not bytesobjects.
这适用于 Python 2.x 和 3.x。在 Python 2.x 上,它直接处理字节字符串和 Unicode 字符串。在 Python 3.x 上,它只接受 [Unicode] 字符串,而不接受bytes对象。
This doesn't behave exactly the same as shell argv splitting—it also allows quoting of CR, LF and TAB characters as \r, \nand \t, converting them to real CR, LF, TAB (shlex.splitdoesn't do that). So writing my own function was useful for my needs. I guess shlex.splitis better if you just want plain shell-style argv splitting. I'm sharing this code in case it's useful as a baseline for doing something slightly different.
这与 shell argv 拆分的行为并不完全相同——它还允许将 CR、LF 和 TAB 字符引用为\r,\n和\t,将它们转换为真正的 CR、LF、TAB(shlex.split不这样做)。所以编写我自己的函数对我的需求很有用。shlex.split如果您只想要简单的 shell 样式的 argv 拆分,我想会更好。我正在分享这段代码,以防它作为做一些稍微不同的事情的基线有用。

