Python 根据文件中的制表符拆分字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17038426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:16:41  来源:igfitidea点击:

splitting a string based on tab in the file

pythonstringsplit

提问by hjelpmig

I have file that contains values separated by tab ("\t"). I am trying to create a list and store all values of file in the list. But I get some problem. Here is my code.

我有包含由制表符(“\ t”)分隔的值的文件。我正在尝试创建一个列表并将文件的所有值存储在列表中。但我遇到了一些问题。这是我的代码。

line = "abc def ghi"
values = line.split("\t")

It works fine as long as there is only one tab between each value. But if there is one than one tab then it copies the tab to values as well. In my case mostly the extra tab will be after the last value in the file.

只要每个值之间只有一个选项卡,它就可以正常工作。但是如果有不止一个选项卡,那么它也会将该选项卡复制到值中。在我的情况下,额外的选项卡通常位于文件中的最后一个值之后。

采纳答案by Ashwini Chaudhary

You can use regexhere:

你可以regex在这里使用:

>>> import re
>>> strs = "foo\tbar\t\tspam"
>>> re.split(r'\t+', strs)
['foo', 'bar', 'spam']

update:

更新:

You can use str.rstripto get rid of trailing '\t'and then apply regex.

您可以使用 str.rstrip摆脱尾随'\t'然后应用正则表达式。

>>> yas = "yas\t\tbs\tcda\t\t"
>>> re.split(r'\t+', yas.rstrip('\t'))
['yas', 'bs', 'cda']

回答by DimmuR

You can use regexp to do this:

您可以使用正则表达式来做到这一点:

import re
patt = re.compile("[^\t]+")


s = "a\t\tbcde\t\tef"
patt.findall(s)
['a', 'bcde', 'ef']  

回答by CornSmith

Split on tab, but then remove all blank matches.

在选项卡上拆分,然后删除所有空白匹配项。

text = "hi\tthere\t\t\tmy main man"
print [splits for splits in text.split("\t") if splits is not ""]

Outputs:

输出:

['hi', 'there', 'my main man']

回答by Sylvain Leroux

Python has support for CSV files in the eponymous csvmodule. It is relatively misnamed since it support much more that just commaseparated values.

Python 在 eponymouscsv模块中支持 CSV 文件。它的名称相对错误,因为它支持的不仅仅是逗号分隔值。

If you need to go beyond basic word splitting you should take a look. Say, for example, because you are in need to deal with quoted values...

如果你需要超越基本的分词,你应该看看。比如说,因为你需要处理引用的值......

回答by Sylvain Leroux

An other regex-based solution:

另一个regex基于的解决方案:

>>> strs = "foo\tbar\t\tspam"

>>> r = re.compile(r'([^\t]*)\t*')
>>> r.findall(strs)[:-1]
['foo', 'bar', 'spam']