Python使用正则表达式和replace()在某些字符之间查找子字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4622472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 16:33:59  来源:igfitidea点击:

Python finding substring between certain characters using regex and replace()

pythonregexstringreplace

提问by jCuga

Suppose I have a string with lots of random stuff in it like the following:

假设我有一个包含很多随机内容的字符串,如下所示:

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"

And I'm interested in obtaining the substring sitting between 'Value=' and '&', which in this example would be 'five'.

我对获取位于“Value=' 和”&”之间的子字符串感兴趣,在本例中为“5”。

I can use a regex like the following:

我可以使用如下的正则表达式:

 match = re.search(r'Value=?([^&>]+)', strJunk)
 >>> print match.group(0)
 Value=five
 >>> print match.group(1)
 five

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This question stems from me only having a tenuous grasp of regex)

为什么 match.group(0) 是整个“Value=five”而 group(1) 只是“5”?有没有办法让我得到“五”作为唯一的结果?(这个问题源于我对正则表达式的了解很少)

I am also going to have to make a substitution in this string such such as the following:

我还必须在此字符串中进行替换,例如以下内容:

 val1 = match.group(1)
 strJunk.replace(val1, "six", 1)    

Which yields:

其中产生:

 'asdf2adsf29Value=six&lakl23ljk43asdldl'

Considering that I plan on performing the above two tasks (finding the string between 'Value=' and '&', as well as replacing that value) over and over, I was wondering if there are any other more efficient ways of looking for the substring and replacing it in the original string. I'm fine sticking with what I've got but I just want to make sure that I'm not taking up more time than I have to be if better methods are out there.

考虑到我计划一遍又一遍地执行上述两项任务(查找 'Value=' 和 '&' 之间的字符串,以及替换该值),我想知道是否还有其他更有效的方法来查找子字符串并将其替换为原始字符串。我很好地坚持我所拥有的,但我只是想确保如果有更好的方法,我不会占用比我必须更多的时间。

采纳答案by David German

Named groups make it easier to get the group contents afterwards. Compiling your regex once, and then reusing the compiled object, will be much more efficient than recompiling it for each use (which is what happens when you call re.search repeatedly). You can use positive lookbehind and lookahead assertions to make this regex suitable for the substitution you want to do.

命名组使之后更容易获取组内容。编译您的正则表达式一次,然后重用已编译的对象,将比每次使用时重新编译它(这是重复调用 re.search 时会发生的情况)高效得多。您可以使用正向后视和前瞻断言使此正则表达式适合您想要进行的替换。

>>> value_regex = re.compile("(?<=Value=)(?P<value>.*?)(?=&)")
>>> match = value_regex.search(strJunk)
>>> match.group('value')
'five'
>>> value_regex.sub("six", strJunk)
'asdf2adsf29Value=six&lakl23ljk43asdldl'

回答by Mahmoud Abdelkader

I'm not exactly sure if you're parsing URLs, in which case, you should be definitely using the urlparsemodule.

我不确定您是否正在解析 URL,在这种情况下,您肯定应该使用urlparse模块。

However, given that this is not your question, the ability to split on multiple fields using regular expressions is extremely fast in Python, so you should be able to do what you want as follows:

但是,鉴于这不是您的问题,使用正则表达式拆分多个字段的能力在 Python 中非常快,因此您应该能够按如下方式执行您想要的操作:

import re

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = re.split(r'[&=]', strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

Hope this helps!

希望这可以帮助!

EDIT:

编辑:

If you will split multiple times, you can use re.compile()to compile the regular expression. So you'll have:

如果要拆分多次,可以使用re.compile()来编译正则表达式。所以你会有:

import re
rx_split_on_delimiters = re.compile(r'[&=]')  # store this somewhere

strJunk ="asdf2adsf29Value=five&lakl23ljk43asdldl"
split_result = rx_split_on_delimiters.split(strJunk)
split_result[1] = 'six'
print "{0}={1}&{2}".format(*split_result)

回答by Vikas

How come match.group(0) is the whole thing 'Value=five' and group(1) is just 'five'? And is there a way for me to just get 'five' as the only result? (This question stems from me only having a tenuous grasp of regex)

为什么 match.group(0) 是整个“Value=five”而 group(1) 只是“5”?有没有办法让我得到“五”作为唯一的结果?(这个问题源于我对正则表达式的了解很少)

I thought that look behind assertion can help you here.

我认为在断言后面看可以帮助你。

>>> match = re.search(r'(?<=Value=)([^&>]+)', strJunk)
>>> match.group(0)
'five'

but you can only provide a constant length string in look behind assertion.

但是您只能在断言后查看中提供恒定长度的字符串。

>>> match = re.search(r'(?<=Value=?)([^&>]+)', strJunk)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: look-behind requires fixed-width pattern

I can't thing of a way to do this without regex. Your way of doing this should be faster than look behind assertion.

没有正则表达式,我无法做到这一点。您这样做的方式应该比查看断言更快。