在python中删除()和[]之间的文本

Question

提问by Tic

I have a very long string of text with ()and []in it. I'm trying to remove the characters between the parentheses and brackets but I cannot figure out how.

我有一个文本很长的字符串()，并[]在里面。我正在尝试删除括号和方括号之间的字符，但我不知道如何删除。

The list is similar to this:

该列表类似于：

x = "This is a sentence. (once a day) [twice a day]"

This list isn't what I'm working with but is very similar and a lot shorter.

这个列表不是我正在使用的，但非常相似，而且要短得多。

Thanks for the help.

谢谢您的帮助。

Answer 1

采纳答案by mbowden

This should work for parens. regular expressions will 'consume' the text it has matched so it won't work for nested parens.

这应该适用于父母。正则表达式将“消耗”它匹配的文本，因此它不适用于嵌套的括号。

import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)

or this would find one set of parens... simply loop to find more

或者这会找到一组括号......只需循环即可找到更多

start = mystring.find( '(' )
end = mystring.find( ')' )
if start != -1 and end != -1:
  result = mystring[start+1:end]

Answer 2

回答by pradyunsg

Run this script, it works even with nested brackets.
Uses basic logical tests.

运行这个脚本，它甚至可以使用嵌套的括号。
使用基本的逻辑测试。

def a(test_str):
    ret = ''
    skip1c = 0
    skip2c = 0
    for i in test_str:
        if i == '[':
            skip1c += 1
        elif i == '(':
            skip2c += 1
        elif i == ']' and skip1c > 0:
            skip1c -= 1
        elif i == ')'and skip2c > 0:
            skip2c -= 1
        elif skip1c == 0 and skip2c == 0:
            ret += i
    return ret

x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)

Just incase you don't run it,
Here's the output:

以防万一你不运行它，
这是输出：

>>> 
ewq This is a sentence.  
'ewq This is a sentence.  '

Answer 3

回答by jvallver

You can use re.sub function.

您可以使用 re.sub 功能。

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'

If you want to remove the [] and the () you can use this code:

如果要删除 [] 和 ()，可以使用以下代码：

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence.  '

Important: This code will not work with nested symbols

重要提示：此代码不适用于嵌套符号

Answer 4

回答by jfs

Here's a solution similar to @pradyunsg's answer(it works with arbitrary nested brackets):

这是类似于@pradyunsg 的答案的解决方案（它适用于任意嵌套括号）：

def remove_text_inside_brackets(text, brackets="()[]"):
    count = [0] * (len(brackets) // 2) # count open/close brackets
    saved_chars = []
    for character in text:
        for i, b in enumerate(brackets):
            if character == b: # found bracket
                kind, is_close = divmod(i, 2)
                count[kind] += (-1)**is_close # `+1`: open, `-1`: close
                if count[kind] < 0: # unbalanced bracket
                    count[kind] = 0  # keep it
                else:  # found bracket to remove
                    break
        else: # character is not a [balanced] bracket
            if not any(count): # outside brackets
                saved_chars.append(character)
    return ''.join(saved_chars)

print(repr(remove_text_inside_brackets(
    "This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence.  '

在python中删除()和[]之间的文本

提问by Tic

采纳答案by mbowden

回答by pradyunsg

回答by jvallver

回答by jfs

相关推荐

最近更新

标签

在python中删除()和[]之间的文本

提问by Tic

采纳答案by mbowden

回答by pradyunsg

回答by jvallver

回答by jfs

相关推荐

正则表达式匹配任何长度超过八个字母的东西，在 Python 中

python/django中setattr和对象操作的区别

Python 从表格小部件中的选定单元格中检索单元格数据

如何在python中打印/返回一个类？

相关推荐

最近更新

标签