我如何做 strtok() 在 C 中所做的事情,在 Python 中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/456084/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:10:02  来源:igfitidea点击:

How do I do what strtok() does in C, in Python?

python

提问by Tall Jeff

I am learning Python and trying to figure out an efficient way to tokenize a string of numbers separated by commas into a list. Well formed cases work as I expect, but less well formed cases not so much.

我正在学习 Python 并试图找出一种有效的方法来将一串由逗号分隔的数字标记为一个列表。格式良好的案例按我的预期工作,但格式不太好的案例则不然。

If I have this:

如果我有这个:

A = '1,2,3,4'
B = [int(x) for x in A.split(',')]

B results in [1, 2, 3, 4]

which is what I expect, but if the string is something more like

这是我所期望的,但如果字符串更像

A = '1,,2,3,4,'

if I'm using the same list comprehension expression for B as above, I get an exception. I think I understand why (because some of the "x" string values are not integers), but I'm thinking that there would be a way to parse this still quite elegantly such that tokenization of the string a works a bit more directly like strtok(A,",\n\t")would have done when called iteratively in C.

如果我对 B 使用相同的列表理解表达式,则会出现异常。我想我明白为什么(因为一些“x”字符串值不是整数),但我认为有一种方法可以非常优雅地解析它,这样字符串 a 的标记化就更直接地工作了strtok(A,",\n\t")在 C 中迭代调用时会完成。

To be clear what I am asking; I am looking for an elegant/efficient/typical way in Python to have all of the following example cases of strings:

明确我在问什么;我正在 Python 中寻找一种优雅/高效/典型的方式来拥有以下所有字符串示例:

A='1,,2,3,\n,4,\n'
A='1,2,3,4'
A=',1,2,3,4,\t\n'
A='\n\t,1,2,3,,4\n'

return with the same list of:

返回相同的列表:

B=[1,2,3,4]

via some sort of compact expression.

通过某种紧凑的表达方式。

回答by Dave Ray

How about this:

这个怎么样:

A = '1, 2,,3,4  '
B = [int(x) for x in A.split(',') if x.strip()]

x.strip() trims whitespace from the string, which will make it empty if the string is all whitespace. An empty string is "false" in a boolean context, so it's filtered by the if part of the list comprehension.

x.strip() 从字符串中修剪空白,如果字符串全是空白,它将使其为空。空字符串在布尔上下文中为“false”,因此它由列表推导式的 if 部分过滤。

回答by Nick

Generally, I try to avoid regular expressions, but if you want to split on a bunch of different things, they work. Try this:

一般来说,我尽量避免使用正则表达式,但如果你想拆分一堆不同的东西,它们会起作用。试试这个:

import re
result = [int(x) for x in filter(None, re.split('[,\n,\t]', A))]

回答by Alec Thomas

Mmm, functional goodness (with a bit of generator expression thrown in):

嗯,功能上的优点(加入了一些生成器表达式):

a = "1,2,,3,4,"
print map(int, filter(None, (i.strip() for i in a.split(','))))

For full functional joy:

为了全功能的乐趣:

import string
a = "1,2,,3,4,"
print map(int, filter(None, map(string.strip, a.split(','))))

回答by user1683793

For the sake of completeness, I will answer this seven year old question: The C program that uses strtok:

为了完整起见,我将回答这个七年前的问题:使用 strtok 的 C 程序:

int main()
{
    char myLine[]="This is;a-line,with pieces";
    char *p;
    for(p=strtok(myLine, " ;-,"); p != NULL; p=strtok(NULL, " ;-,"))
    {
        printf("piece=%s\n", p);
    }
}

can be accomplished in python with re.split as:

可以在 python 中使用 re.split 完成:

import re
myLine="This is;a-line,with pieces"
for p in re.split("[ ;\-,]",myLine):
    print("piece="+p)

回答by joeforker

Why accept inferior substitutes that cannot segfault your interpreter? With ctypes you can just call the real thing! :-)

为什么要接受不能对您的解释器造成段错误的劣质替代品?使用 ctypes,您可以调用真实的东西!:-)

# strtok in Python
from ctypes import c_char_p, cdll

try: libc = cdll.LoadLibrary('libc.so.6')
except WindowsError:
     libc = cdll.LoadLibrary('msvcrt.dll')

libc.strtok.restype = c_char_p
dat = c_char_p("1,,2,3,4")
sep = c_char_p(",\n\t")
result = [libc.strtok(dat, sep)] + list(iter(lambda: libc.strtok(None, sep), None))
print(result)

回答by runeh

This will work, and never raise an exception, if all the numbers are ints. The isdigit()call is false if there's a decimal point in the string.

如果所有数字都是整数,这将起作用,并且永远不会引发异常。该isdigit()电话是假的,如果有字符串中的小数点。

>>> nums = ['1,,2,3,\n,4\n', '1,2,3,4', ',1,2,3,4,\t\n', '\n\t,1,2,3,,4\n']
>>> for n in nums:
...     [ int(i.strip()) for i in n if i.strip() and i.strip().isdigit() ]
... 
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]

回答by Algorias

How about this?

这个怎么样?

>>> a = "1,2,,3,4,"
>>> map(int,filter(None,a.split(",")))
[1, 2, 3, 4]

filter will remove all false values (i.e. empty strings), which are then mapped to int.

filter 将删除所有错误值(即空字符串),然后将它们映射到 int。

EDIT: Just tested this against the above posted versions, and it seems to be significantly faster, 15% or so compared to the strip() one and more than twice as fast as the isdigit() one

编辑:刚刚针对上面发布的版本对此进行了测试,它似乎比 strip() 快了 15% 左右,比 isdigit() 快了两倍多

回答by Josh Smeaton

Why not just wrap in a try except block which catches anything not an integer?

为什么不直接包装在 try except 块中,它可以捕获不是整数的任何东西?

回答by Aneesh K Thampi

I was desperately in need of strtok equivalent in Python. So I developed a simple one by my own

我非常需要 Python 中的 strtok 等效项。所以我自己开发了一个简单的

def strtok(val,delim):
    token_list=[]
    token_list.append(val)  
    for key in delim:       
        nList=[]        
        for token in token_list:            
            subTokens = [ x for x in token.split(key) if x.strip()]
            nList= nList + subTokens            
        token_list = nList  
    return token_list

回答by Simon Groenewolt

I'd guess regular expressions are the way to go: http://docs.python.org/library/re.html

我猜正则表达式是要走的路:http: //docs.python.org/library/re.html