Linux 如何在 wget 中使用正则表达式拒绝文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11231736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use regular expressions in wget for rejecting files?
提问by Hakim
I am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.
我正在尝试使用 wget 工具下载网站的内容。我使用 -R 选项来拒绝某些文件类型。但还有一些我不想下载的其他文件。这些文件命名如下,并且没有任何扩展名。
string-ID
for example:
例如:
newsbrief-02
How I can tell wget not to download these files (the files which their names start with specified string)?
我如何告诉 wget 不要下载这些文件(名称以指定字符串开头的文件)?
采纳答案by Igor Chubin
You can not specify a regular expression in the wget -R
key, but you can specify a template (like file template in a shell).
您不能在wget -R
键中指定正则表达式,但可以指定模板(如 shell 中的文件模板)。
The answer looks like:
答案看起来像:
$ wget -R 'newsbrief-*' ...
You can also use ?
and symbol classes []
.
您还可以使用?
和符号类[]
。
For more information see info wget.
有关更多信息,请参阅info wget。
回答by Skippy le Grand Gourou
Since (apparently) v1.14 wget
accepts regular expressions?: --reject-regex
and --accept-regex
(with --regex-type posix
by default, can be set to pcre
if compiled with libpcre
support).
由于(显然)v1.14wget
接受正则表达式?:--reject-regex
和--accept-regex
(--regex-type posix
默认情况下,pcre
如果编译libpcre
支持,则可以设置为)。
Beware that it seems you can use --reject-regex
only once per wget
call. That is, you have to use |
in a single regex if you want to select on several regex :
请注意,您似乎--reject-regex
每次wget
通话只能使用一次。也就是说,|
如果要选择多个正则表达式,则必须在单个正则表达式中使用:
wget --reject-regex 'expr1|expr2|…' http://example.com