在python中删除()和[]之间的文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14596884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:56:56  来源:igfitidea点击:

Remove text between () and [] in python

pythonpython-2.7

提问by Tic

I have a very long string of text with ()and []in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.

我有一个文本很长的字符串(),并[]在里面。我正在尝试删除括号和方括号之间的字符,但我不知道如何删除。

The list is similar to this:

该列表类似于:

x = "This is a sentence. (once a day) [twice a day]"

This list isn't what I'm working with but is very similar and a lot shorter.

这个列表不是我正在使用的,但非常相似,而且要短得多。

Thanks for the help.

谢谢您的帮助。

采纳答案by mbowden

This should work for parens. regular expressions will 'consume' the text it has matched so it won't work for nested parens.

这应该适用于父母。正则表达式将“消耗”它匹配的文本,因此它不适用于嵌套的括号。

import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)

or this would find one set of parens... simply loop to find more

或者这会找到一组括号......只需循环即可找到更多

start = mystring.find( '(' )
end = mystring.find( ')' )
if start != -1 and end != -1:
  result = mystring[start+1:end]

回答by pradyunsg

Run this script, it works even with nested brackets.
Uses basic logical tests.

运行这个脚本,它甚至可以使用嵌套的括号。
使用基本的逻辑测试。

def a(test_str):
    ret = ''
    skip1c = 0
    skip2c = 0
    for i in test_str:
        if i == '[':
            skip1c += 1
        elif i == '(':
            skip2c += 1
        elif i == ']' and skip1c > 0:
            skip1c -= 1
        elif i == ')'and skip2c > 0:
            skip2c -= 1
        elif skip1c == 0 and skip2c == 0:
            ret += i
    return ret

x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)

Just incase you don't run it,
Here's the output:

以防万一你不运行它,
这是输出:

>>> 
ewq This is a sentence.  
'ewq This is a sentence.  ' 

回答by jvallver

You can use re.sub function.

您可以使用 re.sub 功能。

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'

If you want to remove the [] and the () you can use this code:

如果要删除 [] 和 (),可以使用以下代码:

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence.  '

Important: This code will not work with nested symbols

重要提示:此代码不适用于嵌套符号

回答by jfs

Here's a solution similar to @pradyunsg's answer(it works with arbitrary nested brackets):

这是类似于@pradyunsg 的答案的解决方案(它适用于任意嵌套括号):

def remove_text_inside_brackets(text, brackets="()[]"):
    count = [0] * (len(brackets) // 2) # count open/close brackets
    saved_chars = []
    for character in text:
        for i, b in enumerate(brackets):
            if character == b: # found bracket
                kind, is_close = divmod(i, 2)
                count[kind] += (-1)**is_close # `+1`: open, `-1`: close
                if count[kind] < 0: # unbalanced bracket
                    count[kind] = 0  # keep it
                else:  # found bracket to remove
                    break
        else: # character is not a [balanced] bracket
            if not any(count): # outside brackets
                saved_chars.append(character)
    return ''.join(saved_chars)

print(repr(remove_text_inside_brackets(
    "This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence.  '