Linux 如何在 wget 中使用正则表达式拒绝文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11231736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 07:06:12  来源:igfitidea点击:

How to use regular expressions in wget for rejecting files?

regexlinuxwgetdownload

提问by Hakim

I am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.

我正在尝试使用 wget 工具下载网站的内容。我使用 -R 选项来拒绝某些文件类型。但还有一些我不想下载的其他文件。这些文件命名如下,并且没有任何扩展名。

string-ID

for example:

例如:

newsbrief-02

How I can tell wget not to download these files (the files which their names start with specified string)?

我如何告诉 wget 不要下载这些文件(名称以指定字符串开头的文件)?

采纳答案by Igor Chubin

You can not specify a regular expression in the wget -Rkey, but you can specify a template (like file template in a shell).

您不能在wget -R键中指定正则表达式,但可以指定模板(如 shell 中的文件模板)。

The answer looks like:

答案看起来像:

$ wget -R 'newsbrief-*' ...

You can also use ?and symbol classes [].

您还可以使用?和符号类[]

For more information see info wget.

有关更多信息,请参阅info wget

回答by Skippy le Grand Gourou

Since (apparently) v1.14 wgetaccepts regular expressions?: --reject-regexand --accept-regex(with --regex-type posixby default, can be set to pcreif compiled with libpcresupport).

由于(显然)v1.14wget接受正则表达式?:--reject-regex--accept-regex--regex-type posix默认情况下,pcre如果编译libpcre支持,则可以设置为)。

Beware that it seems you can use --reject-regexonly once per wgetcall. That is, you have to use |in a single regex if you want to select on several regex :

请注意,您似乎--reject-regex每次wget通话只能使用一次。也就是说,|如果要选择多个正则表达式,则必须在单个正则表达式中使用:

wget --reject-regex 'expr1|expr2|…' http://example.com