Python,如何解析字符串看起来像 sys.argv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/899276/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:05:13  来源:igfitidea点击:

Python, how to parse strings to look like sys.argv

pythonparsingargv

提问by Gregg Lind

I would like to parse a string like this:

我想解析这样的字符串:

-o 1  --long "Some long string"  

into this:

进入这个:

["-o", "1", "--long", 'Some long string']

or similar.

或类似。

This is different than either getopt, or optparse, which startwith sys.argv parsed input (like the output I have above). Is there a standard way to do this? Basically, this is "splitting" while keeping quoted strings together.

这与 getopt 或 optparse 不同,后者以 sys.argv 解析的输入开始(如我上面的输出)。有没有标准的方法来做到这一点?基本上,这是“拆分”,同时将引用的字符串保持在一起。

My best function so far:

到目前为止我最好的功能:

import csv
def split_quote(string,quotechar='"'):
    '''

    >>> split_quote('--blah "Some argument" here')
    ['--blah', 'Some argument', 'here']

    >>> split_quote("--blah 'Some argument' here", quotechar="'")
    ['--blah', 'Some argument', 'here']
    '''
    s = csv.StringIO(string)
    C = csv.reader(s, delimiter=" ",quotechar=quotechar)
    return list(C)[0]

回答by Jacob Gabrielson

I believe you want the shlexmodule.

我相信你想要shlex模块。

>>> import shlex
>>> shlex.split('-o 1 --long "Some long string"')
['-o', '1', '--long', 'Some long string']

回答by Craig McQueen

Before I was aware of shlex.split, I made the following:

在我意识到之前shlex.split,我做了以下事情:

import sys

_WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))

_QUOTE_CHARS_DICT = {
    '\':   '\',
    ' ':    ' ',
    '"':    '"',
    'r':    '\r',
    'n':    '\n',
    't':    '\t',
}

def _raise_type_error():
    raise TypeError("Bytes must be decoded to Unicode first")

def parse_to_argv_gen(instring):
    is_in_quotes = False
    instring_iter = iter(instring)
    join_string = instring[0:0]

    c_list = []
    c = ' '
    while True:
        # Skip whitespace
        try:
            while True:
                if not isinstance(c, str) and sys.version_info[0] >= 3:
                    _raise_type_error()
                if c not in _WORD_DIVIDERS:
                    break
                c = next(instring_iter)
        except StopIteration:
            break
        # Read word
        try:
            while True:
                if not isinstance(c, str) and sys.version_info[0] >= 3:
                    _raise_type_error()
                if not is_in_quotes and c in _WORD_DIVIDERS:
                    break
                if c == '"':
                    is_in_quotes = not is_in_quotes
                    c = None
                elif c == '\':
                    c = next(instring_iter)
                    c = _QUOTE_CHARS_DICT.get(c)
                if c is not None:
                    c_list.append(c)
                c = next(instring_iter)
            yield join_string.join(c_list)
            c_list = []
        except StopIteration:
            yield join_string.join(c_list)
            break

def parse_to_argv(instring):
    return list(parse_to_argv_gen(instring))

This works with Python 2.x and 3.x. On Python 2.x, it works directly with byte strings and Unicode strings. On Python 3.x, it onlyaccepts [Unicode] strings, not bytesobjects.

这适用于 Python 2.x 和 3.x。在 Python 2.x 上,它直接处理字节字符串和 Unicode 字符串。在 Python 3.x 上,它只接受 [Unicode] 字符串,而不接受bytes对象。

This doesn't behave exactly the same as shell argv splitting—it also allows quoting of CR, LF and TAB characters as \r, \nand \t, converting them to real CR, LF, TAB (shlex.splitdoesn't do that). So writing my own function was useful for my needs. I guess shlex.splitis better if you just want plain shell-style argv splitting. I'm sharing this code in case it's useful as a baseline for doing something slightly different.

这与 shell argv 拆分的行为并不完全相同——它还允许将 CR、LF 和 TAB 字符引用为\r,\n\t,将它们转换为真正的 CR、LF、TAB(shlex.split不这样做)。所以编写我自己的函数对我的需求很有用。shlex.split如果您只想要简单的 shell 样式的 argv 拆分,我想会更好。我正在分享这段代码,以防它作为做一些稍微不同的事情的基线有用。