Python 拆分一个字符串,它在数字和字母字符之间切换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13673781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Splitting a string where it switches between numeric and alphabetic characters
提问by Chris
I am parsing some data where the standard format is something like 10 pizzas. Sometimes, data is input correctly and we might end up with 5pizzasinstead of 5 pizzas. In this scenario, I want to parse out the number of pizzas.
我正在解析一些标准格式类似于10 pizzas. 有时,数据输入正确,我们最终可能会得到5pizzas而不是5 pizzas。在这种情况下,我想解析出比萨饼的数量。
The na?ve way of doing this would be to check character by character, building up a string until we reach a non-digit and then casting that string as an integer.
最简单的方法是逐个字符地检查,构建一个字符串直到我们到达一个非数字,然后将该字符串转换为一个整数。
num_pizzas = ""
for character in data_input:
if character.isdigit():
num_pizzas += character
else:
break
num_pizzas = int(num_pizzas)
This is pretty clunky, though. Is there an easier way to split a string where it switches from numeric digits to alphabetic characters?
不过,这很笨拙。有没有更简单的方法来分割字符串,它从数字切换到字母字符?
采纳答案by Gareth Latty
You ask for a way to split a string on digits, but then in your example, what you actually want is just the first numbers, this done easily with itertools.takewhile():
您要求一种在数字上拆分字符串的方法,但是在您的示例中,您真正想要的只是第一个数字,这可以通过以下方式轻松完成itertools.takewhile():
>>> int("".join(itertools.takewhile(str.isdigit, "10pizzas")))
10
This makes a lot of sense - what we are doing is taking the character from the string while they are digits. This has the advantage of stopping processing as soon as we get to the first non-digit character.
这很有意义 - 我们正在做的是从字符串中取出字符,而它们是数字。这具有在我们到达第一个非数字字符时立即停止处理的优点。
If you need the later data too, then what you are looking for is itertools.groupby()mixed in with a simple list comprehension:
如果您也需要后面的数据,那么您正在寻找的内容itertools.groupby()与简单的列表理解混合在一起:
>>> ["".join(x) for _, x in itertools.groupby("dfsd98sd8f68as7df56", key=str.isdigit)]
['dfsd', '98', 'sd', '8', 'f', '68', 'as', '7', 'df', '56']
If you then want to make one giant number:
如果你想制作一个巨大的数字:
>>> int("".join("".join(x) for is_number, x in itertools.groupby("dfsd98sd8f68as7df56", key=str.isdigit) if is_number is True))
98868756
回答by cnicutar
How about a regex ?
正则表达式怎么样?
reg = re.compile(r'(?P<numbers>\d*)(?P<rest>.*)')
result = reg.search(str)
if result:
numbers = result.group('numbers')
rest = result.group('rest')
回答by Mark Byers
To split the string at digits you can use re.splitwith the regular expression \d+:
要在数字处拆分字符串,您可以使用re.split正则表达式\d+:
>>> import re
>>> def my_split(s):
return filter(None, re.split(r'(\d+)', s))
>>> my_split('5pizzas')
['5', 'pizzas']
>>> my_split('foo123bar')
['foo', '123', 'bar']
To find the first number use re.search:
要找到第一个数字,请使用re.search:
>>> re.search('\d+', '5pizzas').group()
'5'
>>> re.search('\d+', 'foo123bar').group()
'123'
If you know the number must be at the start of the string then you can use re.matchinstead of re.search. If you want to find all the numbers and discard the rest you can use re.findall.
如果您知道数字必须位于字符串的开头,则可以使用re.match代替re.search. 如果要查找所有数字并丢弃其余数字,可以使用re.findall.
回答by Patrick Artner
Answer added as possible way to solve How to split a string into a list by digits?which was dupe-linkedto this question.
答案已添加为解决如何将字符串按数字拆分为列表的可能方法?这是dupe-linked对这个问题。
You can do the splitting yourself:
您可以自己进行拆分:
- use a temporary list to accumulate characters that are not digits
- if you find a digit, add the temporary list (
''.join()-ed) to the result list (only if not empty) and do not forget to clear the temporary list - repeat until all characters are processed and if the temp-lists still has content, add it
- 使用临时列表来累积不是数字的字符
- 如果找到数字,将临时列表(
''.join()-ed)添加到结果列表(仅当不为空时)并且不要忘记清除临时列表 - 重复直到处理完所有字符,如果临时列表仍有内容,则添加它
text = "Ka12Tu12La"
splitted = [] # our result
tmp = [] # our temporary character collector
for c in text:
if not c.isdigit():
tmp.append(c) # not a digit, add it
elif tmp: # c is a digit, if tmp filled, add it
splitted.append(''.join(tmp))
tmp = []
if tmp:
splitted.append(''.join(tmp))
print(splitted)
Output:
输出:
['Ka', 'Tu', 'La']
References:
参考:

