Python - 使用正则表达式过滤数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1284789/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:51:10  来源:igfitidea点击:

Python - Use a Regex to Filter Data

pythonregex

提问by Chris Bunch

Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:

有没有一种简单的方法可以从给定的字符串中删除与给定正则表达式匹配的所有字符?我知道在 Ruby 中我可以使用gsub

>> key = "cd baz ; ls -l"
=> "cd baz ; ls -l"
>> newkey = key.gsub(/[^\w\d]/, "")
=> "cdbazlsl"

What would the equivalent function be in Python?

Python 中的等效函数是什么?

回答by SilentGhost

import re
re.sub(pattern, '', s)

Docs

文档

回答by Alex Martelli

The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for

到目前为止的答案都集中在做与您的 Ruby 代码相同的事情上,这与您在问题的英文部分提出的问题完全相反:代码删除了匹配的字符,而您的文本要求

a simple way to remove all characters from a given string that fail to match

从给定字符串中删除所有不匹配字符的简单方法

For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:

例如,假设您的 RE 的模式是r'\d{2,}'“两位或更多位数字”——因此不匹配的部分将是所有非数字加上所有单个孤立数字。根据您的文本要求,删除不匹配的部分也很容易:

>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'

Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.subare correct for the question as it now stands). I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)

编辑:好的,OP 现在澄清了这个问题(他确实是指他的代码,而不是他的文字,说的是什么,现在文字也是正确的;-)但我将答案保留在完整性中(其他建议re.sub的答案是对于现在的问题是正确的)。我意识到您可能指的是您在 Ruby 代码中“说”的内容,而不是您在英文文本中所说的内容,但是,为了以防万一,我想我最好完成这组答案!-)

回答by hughdbrown

re.subn()is your friend:

re.subn()是你的朋友:

>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'

Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)

返回一个元组。如果您只想要结果字符串,则取第一个元素。或者只是调用 re.sub(),正如 SilentGhost 所指出的。(也就是说,他的回答更准确。)

回答by Jochen Ritzel

import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )

回答by Alexander

May be the shortest way:

可能是最短的方法:

In [32]: pattern='[-0-9.]'
   ....: price_str="¥-607.6B"
   ....: ''.join(re.findall(pattern,price_str))
Out[32]: '-607.6'