在python中删除()和[]之间的文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14596884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove text between () and [] in python
提问by Tic
I have a very long string of text with ()and []in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.
我有一个文本很长的字符串(),并[]在里面。我正在尝试删除括号和方括号之间的字符,但我不知道如何删除。
The list is similar to this:
该列表类似于:
x = "This is a sentence. (once a day) [twice a day]"
This list isn't what I'm working with but is very similar and a lot shorter.
这个列表不是我正在使用的,但非常相似,而且要短得多。
Thanks for the help.
谢谢您的帮助。
采纳答案by mbowden
This should work for parens. regular expressions will 'consume' the text it has matched so it won't work for nested parens.
这应该适用于父母。正则表达式将“消耗”它匹配的文本,因此它不适用于嵌套的括号。
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parens... simply loop to find more
或者这会找到一组括号......只需循环即可找到更多
start = mystring.find( '(' )
end = mystring.find( ')' )
if start != -1 and end != -1:
result = mystring[start+1:end]
回答by pradyunsg
Run this script, it works even with nested brackets.
Uses basic logical tests.
运行这个脚本,它甚至可以使用嵌套的括号。
使用基本的逻辑测试。
def a(test_str):
ret = ''
skip1c = 0
skip2c = 0
for i in test_str:
if i == '[':
skip1c += 1
elif i == '(':
skip2c += 1
elif i == ']' and skip1c > 0:
skip1c -= 1
elif i == ')'and skip2c > 0:
skip2c -= 1
elif skip1c == 0 and skip2c == 0:
ret += i
return ret
x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
Just incase you don't run it,
Here's the output:
以防万一你不运行它,
这是输出:
>>>
ewq This is a sentence.
'ewq This is a sentence. '
回答by jvallver
You can use re.sub function.
您可以使用 re.sub 功能。
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
If you want to remove the [] and the () you can use this code:
如果要删除 [] 和 (),可以使用以下代码:
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
Important: This code will not work with nested symbols
重要提示:此代码不适用于嵌套符号
回答by jfs
Here's a solution similar to @pradyunsg's answer(it works with arbitrary nested brackets):
这是类似于@pradyunsg 的答案的解决方案(它适用于任意嵌套括号):
def remove_text_inside_brackets(text, brackets="()[]"):
count = [0] * (len(brackets) // 2) # count open/close brackets
saved_chars = []
for character in text:
for i, b in enumerate(brackets):
if character == b: # found bracket
kind, is_close = divmod(i, 2)
count[kind] += (-1)**is_close # `+1`: open, `-1`: close
if count[kind] < 0: # unbalanced bracket
count[kind] = 0 # keep it
else: # found bracket to remove
break
else: # character is not a [balanced] bracket
if not any(count): # outside brackets
saved_chars.append(character)
return ''.join(saved_chars)
print(repr(remove_text_inside_brackets(
"This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence. '

