Python 将命令行参数转换为正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17830198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert command line arguments to regular expression
提问by dbrg77
Say, for example, I want to know whether the pattern "\section" is in the text "abcd\sectiondefghi". Of course, I can do this:
比如说,我想知道模式“\section”是否在文本“abcd\sectiondefghi”中。当然,我可以这样做:
import re
motif = r"\section"
txt = r"abcd\sectiondefghi"
pattern = re.compile(motif)
print pattern.findall(txt)
That will give me what I want. However, each time I want to find a new pattern in a new text, I have to change the code which is painful. Therefore, I want to write something more flexible, like this (test.py
):
那会给我我想要的。但是,每次我想在新文本中找到新模式时,我都必须更改代码,这很痛苦。因此,我想写一些更灵活的东西,像这样 ( test.py
):
import re
import sys
motif = sys.argv[1]
txt = sys.argv[2]
pattern = re.compile(motif)
print pattern.findall(txt)
Then, I want to run it in terminal like this:
然后,我想像这样在终端中运行它:
python test.py \section abcd\sectiondefghi
However, that will not work (I hate to use \\\\section
).
但是,这行不通(我讨厌使用\\\\section
)。
So, is there any way of converting my user input (either from terminal or from a file) to python raw string? Or is there a better way of doing the regular expression pattern compilation from user input?
那么,有没有办法将我的用户输入(来自终端或文件)转换为 python 原始字符串?或者有没有更好的方法来从用户输入进行正则表达式模式编译?
Thank you very much.
非常感谢。
采纳答案by Martijn Pieters
Use re.escape()
to make sure input text is treated as literal text in a regular expression:
用于re.escape()
确保输入文本在正则表达式中被视为文字文本:
pattern = re.compile(re.escape(motif))
Demo:
演示:
>>> import re
>>> motif = r"\section"
>>> txt = r"abcd\sectiondefghi"
>>> pattern = re.compile(re.escape(motif))
>>> txt = r"abcd\sectiondefghi"
>>> print pattern.findall(txt)
['\section']
re.escape()
escapes all non-alphanumerics; adding a backslash in front of each such a character:
re.escape()
转义所有非字母数字;在每个这样的字符前添加一个反斜杠:
>>> re.escape(motif)
'\\section'
>>> re.escape('\n [hello world!]')
'\\n\ \[hello\ world\!\]'
回答by Inbar Rose
One way to do this is using an argument parser, like optparse
or argparse
.
一种方法是使用参数解析器,例如optparse
or argparse
。
Your code would look something like this:
您的代码如下所示:
import re
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-s", "--string", dest="string",
help="The string to parse")
parser.add_option("-r", "--regexp", dest="regexp",
help="The regular expression")
parser.add_option("-a", "--action", dest="action", default='findall',
help="The action to perform with the regexp")
(options, args) = parser.parse_args()
print getattr(re, options.action)(re.escape(options.regexp), options.string)
An example of me using it:
我使用它的一个例子:
> code.py -s "this is a string" -r "this is a (\S+)"
['string']
Using your example:
使用您的示例:
> code.py -s "abcd\sectiondefghi" -r "\section"
['\section']
# remember, this is a python list containing a string, the extra \ is okay.
回答by Fredrik
So just to be clear, is the thing you search for ("\section" in your example) supposed to be a regular expression or a literal string? If the latter, the re
module isn't really the right tool for the task; given a search string needle
and a target string haystack
, you can do:
所以只是要清楚,你搜索的东西(在你的例子中是“\section”)应该是正则表达式还是文字字符串?如果是后者,则该re
模块并不是真正适合该任务的工具;给定搜索字符串needle
和目标字符串haystack
,您可以执行以下操作:
# is it in there
needle in haystack
# how many copies are there
n = haystack.count(needle)
python test.py \section abcd\sectiondefghi
# where is it
ix = haystack.find(needle)
all of which are more efficient than the regexp-based version.
所有这些都比基于正则表达式的版本更有效。
re.escape
is still useful if you need to insert a literal fragment into a larger regexp at runtime, but if you end up doing re.compile(re.escape(needle))
, there are for most cases better tools for the task.
re.escape
如果您需要在运行时将文字片段插入到更大的正则表达式中,仍然很有用,但如果您最终这样做了re.compile(re.escape(needle))
,在大多数情况下,有更好的工具来完成任务。
EDIT: I'm beginning to suspect that the real issue here is the shell's escaping rules, which has nothing to do with Python or raw strings. That is, if you type:
编辑:我开始怀疑这里的真正问题是 shell 的转义规则,这与 Python 或原始字符串无关。也就是说,如果您键入:
python test.py \section abcd\sectiondefghi
into a Unix-style shell, the "\section" part is converted to "\section" by the shell, before Python sees it. The simplest way to fix that is to tell the shell to skip unescaping, which you can do by putting the argument inside single quotes:
在 Unix 风格的 shell 中,“\section”部分在 Python 看到之前被 shell 转换为“\section”。解决这个问题的最简单方法是告诉 shell 跳过转义,您可以通过将参数放在单引号内来实现:
python test.py '\section' 'abcd\sectiondefghi'
Compare and contrast:
比较和对比:
$ python -c "import sys; print ','.join(sys.argv)" test.py \section abcd\sectiondefghi
-c,test.py,\section,abcdsectiondefghi
$ python -c "import sys; print ','.join(sys.argv)" test.py '\section' 'abcd\sectiondefghi'
-c,test.py,\section,abcd\sectiondefghi
(explicitly using print on a joined string here to avoid repr
adding even more confusion...)
(在此处明确在连接的字符串上使用 print 以避免repr
增加更多混乱......)