Python - 使用正则表达式过滤数据

Question

提问by Chris Bunch

Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:

有没有一种简单的方法可以从给定的字符串中删除与给定正则表达式匹配的所有字符？我知道在 Ruby 中我可以使用gsub：

>> key = "cd baz ; ls -l"
=> "cd baz ; ls -l"
>> newkey = key.gsub(/[^\w\d]/, "")
=> "cdbazlsl"

What would the equivalent function be in Python?

Python 中的等效函数是什么？

Answer 1

回答by SilentGhost

import re
re.sub(pattern, '', s)

Docs

文档

Answer 2

回答by Alex Martelli

The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for

到目前为止的答案都集中在做与您的 Ruby 代码相同的事情上，这与您在问题的英文部分提出的问题完全相反：代码删除了匹配的字符，而您的文本要求

a simple way to remove all characters from a given string that fail to match

从给定字符串中删除所有不匹配字符的简单方法

For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:

例如，假设您的 RE 的模式是r'\d{2,}'“两位或更多位数字”——因此不匹配的部分将是所有非数字加上所有单个孤立数字。根据您的文本要求，删除不匹配的部分也很容易：

>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'

Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.subare correct for the question as it now stands). I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)

编辑：好的，OP 现在澄清了这个问题（他确实是指他的代码，而不是他的文字，说的是什么，现在文字也是正确的；-）但我将答案保留在完整性中（其他建议re.sub的答案是对于现在的问题是正确的）。我意识到您可能指的是您在 Ruby 代码中“说”的内容，而不是您在英文文本中所说的内容，但是，为了以防万一，我想我最好完成这组答案！-)

Answer 3

回答by hughdbrown

re.subn()is your friend:

re.subn()是你的朋友：

>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'

Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)

返回一个元组。如果您只想要结果字符串，则取第一个元素。或者只是调用 re.sub()，正如 SilentGhost 所指出的。（也就是说，他的回答更准确。）

Answer 4

回答by Jochen Ritzel

import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )

Answer 5

回答by Alexander

May be the shortest way:

可能是最短的方法：

In [32]: pattern='[-0-9.]'
   ....: price_str="￥-607.6B"
   ....: ''.join(re.findall(pattern,price_str))
Out[32]: '-607.6'

Python - 使用正则表达式过滤数据

提问by Chris Bunch

回答by SilentGhost

回答by Alex Martelli

回答by hughdbrown

回答by Jochen Ritzel

回答by Alexander

相关推荐

最近更新

标签

Python - 使用正则表达式过滤数据

提问by Chris Bunch

回答by SilentGhost

回答by Alex Martelli

回答by hughdbrown

回答by Jochen Ritzel

回答by Alexander

相关推荐

从 Python 3.x 中的 Python 对象继承是否有必要或有用？

Python 脚本知道它使用了多少内存

python 如何按值过滤字典？

python 如果对象也有其他类，Beautiful Soup 找不到 CSS 类

相关推荐

最近更新

标签