Python 在引号上拆分字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16603310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:08:45  来源:igfitidea点击:

Python split string on quotes

pythonpython-2.7

提问by user2377057

I'm a python learner. If I have a lines of text in a file that looks like this

我是一名蟒蛇学习者。如果我的文件中有一行文本,看起来像这样

"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"

"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"

Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult. I've tried

我可以将引号周围的行分开吗?唯一的常量是它们在文件中相对于数据行本身的位置。数据行的范围可以从 10 到 100+ 个字符(它们将是嵌套的网络文件夹)。我看不出如何使用任何其他方式来分割这些标记,但我缺乏 Python 知识使这变得困难。我试过了

optfile=line.split("")

and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.

和其他变化,但不断收到 valueerror: empty separator。我明白为什么这么说,我只是不知道如何改变它。一如既往地非常感谢任何帮助。

Many thanks

非常感谢

采纳答案by Thomas Jung

Finding all regular expression matches will do it:

查找所有正则表达式匹配就可以了:

input=r'"Y:\DATA
["Y:\DATA
[f[1:-1] for f in re.findall('".+?"', input)]
001\SERVER\DATA.TXT", "V:\DATA2
[f.group(1) for f in re.finditer('"(.+?)"', input)]
002\SERVER2\DATA2.TXT"]
001\SERVER\DATA.TXT" "V:\DATA2
input.split("\"")
002\SERVER2\DATA2.TXT"' re.findall('".+?"', # or '"[^"]+"', input)

This will return the list of file names:

这将返回文件名列表:

['\n',
 'Y:\DATA\x0001\SERVER\DATA.TXT',
 ' ',
 'V:\DATA2\x0002\SERVER2\DATA2.TXT',
 '\n']

To get the file name without quotes use:

要获取不带引号的文件名,请使用:

[line for line in [line.strip() for line in input.split("\"")] if line]

or use re.finditer:

或使用re.finditer

['Y:\DATA\x0001\SERVER\DATA.TXT', 'V:\DATA2\x0002\SERVER2\DATA2.TXT']

回答by Thomas Jung

You must escape the ":

你必须逃避"

"FILE PATH" "FILE PATH 2"

results in

结果是

["FILE PATH","FILE PATH 2"]

To drop the resulting empty lines:

要删除生成的空行:

import re
with open('file.txt') as f:
    for line in f:
        print(re.split(r'(?<=")\s(?=")',line))

results in

结果是

"Y:\DATA
>>> 
['"Y:\DATA\00001\SERVER\DATA MINER.TXT"', '"V:\DATA2\00002\SERVER2\DATA2.TXT"']
001\SERVER\DATA MINER.TXT" "V:\DATA2
import re
def simpleParse(input_):
    def reduce_(quotes):
        return '' if quotes.group(0) == '"' else '"'
    rex = r'("[^"]*"(?:\s|$)|[^\s]+)'

    return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]
002\SERVER2\DATA2.TXT"

回答by HennyH

I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line aboutitems contained within quotations. I.e with a line

我认为您想要的是提取以空格分隔的文件路径。也就是说,您要拆分有关引用中包含的项目的行。即用线

import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
    char = trial[z]
    if re.match(r'[^\s]', char):
        if not reading:
            reading = True
            begin = z
            if re.match(r'"', char):
                begin = z
                qc = 1
            else:
                begin = z - 1
                qc = 0
            lc = begin
        else:
            if re.match(r'"', char):
                qc = qc + 1
                lq = z
    elif reading and qc % 2 == 0:
        reading = False
        if lq == z - 1:
            tokens.append(trial[begin + 1: z - 1])
        else: 
            tokens.append(trial[begin + 1: z])
if reading:
    tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]

You want

你要

import shlex

with open('somefile') as fin:
    for line in fin:
        print shlex.split(line)

In which case:

在这种情况下:

['Y:\DATA\00001\SERVER\DATA.TXT', 'V:\DATA2\00002\SERVER2\DATA2.TXT']

With file.txt:

file.txt

import csv

sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'

def main():
    for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
        print l

Outputs:

输出:

['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']

回答by Redsplinter

This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.

这是我的解决方案。它解析大多数理智的输入,就像它直接传递到命令行一样。

import shlex

my_string = '"Y:\DATA
['Y:\DATA\x0001\SERVER\DATA.TXT', 'V:\DATA2\x0002\SERVER2\DATA2.TXT']
001\SERVER\DATA.TXT" "V:\DATA2
input = '"Y:\DATA
['Y:\DATA\x0001\SERVER\DATA.TXT', 'V:\DATA2\x0002\SERVER2\DATA2.TXT']
001\SERVER\DATA.TXT" "V:\DATA2
repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)
002\SERVER2\DATA2.TXT"' input = input.replace('" "','"').split('"')[1:-1]
002\SERVER2\DATA2.TXT"' shlex.split(my_string)

Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.

用例:将一堆单镜头脚本收集到实用程序启动器中,而无需重做太多命令输入。

Edit: Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:

编辑:对命令行处理蹩脚引用的愚蠢方式有强迫症,并写了以下内容:

sys.argv = args

回答by Jon Clements

I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:

我只想补充一点,如果您正在处理看起来像是命令行参数的行,那么您可能会利用shlex 模块

[s for s in line.split('"') if s.strip() != '']

Would give:

会给:

line = r'"Y:\DATA##代码##001\SERVER\DATA.TXT" "V:\DATA2##代码##002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\DATA\00001\SERVER\DATA.TXT', 'V:\DATA2\00002\SERVER2\DATA2.TXT']

回答by McKelvin

No regex, no split, just use csv.reader

没有正则表达式,没有拆分,只需使用 csv.reader

##代码##

The output is

输出是

##代码##

回答by Kashif Siddiqui

shlexmodule can help you.

shlex模块可以帮助你。

##代码##

This will spit

这会吐

##代码##

Reference: https://docs.python.org/2/library/shlex.html

参考:https: //docs.python.org/2/library/shlex.html

回答by D'Arcy

I know this got answered a million year ago, but this works too:

我知道这在一百万年前得到了回答,但这也有效:

##代码##

Should output it as a list containing:

应该将其输出为包含以下内容的列表:

##代码##

回答by OldSteve

My question Python - Error Caused by Space in argv Arumentwas marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-

我的问题 Python - Error Caused by Space in argv Arument被标记为这个问题的重复。我们有许多 Python 书籍可以追溯到 Python 2.3。最早提到使用 argv 列表,但没有示例,所以我将内容更改为:-

##代码##

and in SWCore to:-

并在 SWCore 中:-

##代码##

The shlex module worked but I prefer this.

shlex 模块有效,但我更喜欢这个。

回答by Frank from Frankfurt

The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.

以下代码在每次出现倒逗号 (") 时拆分行,并删除空字符串和仅由空格组成的字符串。

##代码##

There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.

不需要使用正则表达式、转义字符、某些模块或假设路径之间有一定数量的空白字符。

Test:

测试:

##代码##