Python 在给定字符的第 n 次出现处拆分字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17060039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:20:01  来源:igfitidea点击:

Split string at nth occurrence of a given character

pythonstringsplit

提问by cherrun

Is there a Python-way to split a string after the nth occurrence of a given delimiter?

在给定分隔符出现第 n 次后,是否有一种 Python 方式来拆分字符串?

Given a string:

给定一个字符串:

'20_231_myString_234'

It should be split into (with the delimiter being '_', after its second occurrence):

它应该被拆分为(在第二次出现后,分隔符为“_”):

['20_231', 'myString_234']

Or is the only way to accomplish this to count, split and join?

或者是实现这一点的唯一方法来计数、拆分和加入?

采纳答案by jamylak

>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')

Seems like this is the most readable way, the alternative is regex)

似乎这是最易读的方式,替代方案是正则表达式)

回答by perreal

Using reto get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*)where nis a variable:

使用re得到以下形式的正则表达式^((?:[^_]*_){n-1}[^_]*)_(.*),其中n是一个变量:

n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()

or have a nice function:

或者有一个不错的功能:

import re
def nthofchar(s, c, n):
    regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
    l = ()
    m = re.match(regex, s)
    if m: l = m.groups()
    return l

s='20_231_myString_234'
print nthofchar(s, '_', 2)

Or without regexes, using iterative find:

或者不使用正则表达式,使用迭代查找:

def nth_split(s, delim, n): 
    p, c = -1, 0
    while c < n:  
        p = s.index(delim, p + 1)
        c += 1
    return s[:p], s[p + 1:] 

s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2

回答by Micha? Fita

It depends what is your patternfor this split. Because if first two elementsare always numbers for example, you may build regular expressionand use remodule. It is able to split your string as well.

这取决于您对这种拆分的模式是什么。因为如果前两个元素总是数字,例如,您可以构建正则表达式并使用re模块。它也可以拆分您的字符串。

回答by pypat

I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.

我喜欢这个解决方案,因为它不需要任何实际的正则表达式,并且可以很容易地适应另一个“第n个”或分隔符。

import re

string = "20_231_myString_234"
occur = 2  # on which occourence you want to split

indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]

print (part1, ' ', part2)

回答by Nullify

>>>import re
>>>str= '20_231_myString_234'

>>> occerence = [m.start() for m in re.finditer('_',str)]  # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']

回答by AllBlackt

I had a larger string to split ever nth character, ended up with the following code:

我有一个更大的字符串来分割第 n 个字符,最后得到以下代码:

# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []

groups = err_str.split(sep)
while len(groups):
    n_split_groups.append(sep.join(groups[:n]))
    groups = groups[n:]

print n_split_groups

Thanks @perreal!

谢谢@perreal!

回答by Yuval

I thought I would contribute my two cents. The second parameter to split()allows you to limit the split after a certain number of strings:

我以为我会贡献我的两分钱。第二个参数 tosplit()允许您在一定数量的字符串后限制拆分:

def split_at(s, delim, n):
    r = s.split(delim, n)[n]
    return s[:-len(r)-len(delim)], r

On my machine, the two good answers by @perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.

在我的机器上,@perreal 的两个很好的答案,迭代查找和正则表达式,实际上比这种方法慢 1.4 和 1.6 倍(分别)。

It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:

值得注意的是,如果您不需要初始位,它可以变得更快。然后代码变成:

def remove_head_parts(s, delim, n):
    return s.split(delim, n)[n]

Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.

我承认对命名不太确定,但它确实可以。有点令人惊讶的是,它比迭代查找快 2 倍,比正则表达式快 3 倍。

I put up my testing script online. You are welcome to review and comment.

我把我的测试脚本放到网上。欢迎大家点评和评论。