apache wget -k 在 Windows 和 Linux 上以不同的方式转换文件

Question

提问by cherouvim

I've got GNU Wget 1.10.2for windows and linux and the -koption behaves differently on those two.

我有适用于 windows 和 linux 的GNU Wget 1.10.2，并且-k选项在这两者上的行为不同。

-k, --convert-links make links in downloaded HTML point to local files.

-k, --convert-links 使下载的 HTML 中的链接指向本地文件。

On windows it produces:

在 Windows 上它产生：

www.example.com/index.html
www.example.com/index.html@page=about
www.example.com/index.html@page=contact
www.example.com/index.html@page=sitemap

and on linux it produces:

在 linux 上它产生：

www.example.com/index.html
www.example.com/index.html?page=about
www.example.com/index.html?page=contact
www.example.com/index.html?page=sitemap

This is problematic in linux because when I serve the mirror through Apache it will not distinguish between the 4 generated pages since the part after the questionmark (?) character is used as the query string to the file.

这在 linux 中是有问题的，因为当我通过 Apache 为镜像提供服务时，它不会区分生成的 4 个页面，因为问号 ( ?) 字符之后的部分用作文件的查询字符串。

Any ideas on how I can control this?

关于如何控制这个的任何想法？

thanks

谢谢

Answer 1

回答by Can Berk Güder

You can't use a question mark (?) in a filename on NTFS or FAT32. This is why wget uses the at symbol (@) instead.

您不能在 NTFS 或 FAT32 上的文件名中使用问号 (?)。这就是 wget 使用 at 符号 (@) 的原因。

In Linux, only a slash (/) is forbidden on most filesystems, so wget uses the question mark (since it's part of the URI).

在 Linux 中，大多数文件系统只禁止使用斜杠 (/)，因此 wget 使用问号（因为它是 URI 的一部分）。

You can force either behaviour by using --restrict-file-names=unixor --restrict-file-names=windows.

您可以使用--restrict-file-names=unix或强制执行任一行为--restrict-file-names=windows。

From the wget documentation:

从 wget 文档：

When mode is set to “unix”, Wget escapes the character ‘/' and the control characters in the ranges 0–31 and 128–159. This is the default on Unix-like OS'es.
When mode is set to “windows”, Wget escapes the characters ‘\', ‘|', ‘/', ‘:', ‘?', ‘"', ‘*', ‘<', ‘>', and the control characters in the ranges 0–31 and 128–159. In addition to this, Wget in Windows mode uses ‘+' instead of ‘:' to separate host and port in local file names, and uses ‘@' instead of ‘?' to separate the query portion of the file name from the rest. Therefore, a URL that would be saved as ‘www.xemacs.org:4300/search.pl?input=blah' in Unix mode would be saved as ‘www.xemacs.org+4300/search.pl@input=blah' in Windows mode. This mode is the default on Windows.

当 mode 设置为“unix”时，Wget 会转义字符 '/' 和 0-31 和 128-159 范围内的控制字符。这是类 Unix 操作系统的默认设置。
当 mode 设置为“windows”时，Wget 会转义字符 '\'、'|'、'/'、':'、'?'、'"'、'*'、'<'、'>' 和0-31和128-159范围内的控制字符。除此之外，Windows模式下的Wget使用'+'代替':'来分隔本地文件名中的主机和端口，并使用'@'代替' ？' 将文件名的查询部分与其余部分分开。因此，www.xemacs.org:4300/search.pl?input=blah在 Unix 模式下将被保存为“ www.xemacs.org+4300/search.pl@input=blah”的 URL在 Windows 模式下将被保存为“ ”。此模式是 Windows 上的默认模式。

Answer 2

回答by bobince

This is problematic in linux because when I serve the mirror through Apache it will not distinguish between the 4 generated pages since the part after the questionmark (?) character is used as the query string to the file.

这在 linux 中是有问题的，因为当我通过 Apache 为镜像提供服务时，它不会区分生成的 4 个页面，因为问号 (?) 字符之后的部分用作文件的查询字符串。

To include a question mark in a URL path part, you can escape it:

要在 URL 路径部分中包含问号，您可以将其转义：

www.example.com/index.html%3Fpage=about

--convert-links should be doing this for you, I'd think?—?may be a bug if not.

--convert-links 应该为你做这件事，我想？-？如果没有，可能是一个错误。

Answer 3

回答by ax.

see --restrict-file-names=windows

看 --restrict-file-names=windows

Answer 4

回答by Phaiax

This is problematic in linux because when I serve the mirror through Apache it will not distinguish between the 4 generated pages since the part after the questionmark (?) character is used as the query string to the file.

这在 linux 中是有问题的，因为当我通过 Apache 为镜像提供服务时，它不会区分生成的 4 个页面，因为问号 (?) 字符之后的部分用作文件的查询字符串。

If it is already to late this sed command helped me:

如果已经晚了，这个 sed 命令帮助了我：

find . -type f -name "*html*" -exec sed -i -r 's/(src|href)=(["\x27])(.*?)(\?)(.*?)/=%3F/g' {} +

It replaces ? in href= or src= tags with %3F. (\x27 is the single tick)

它取代 ? 在带有 %3F 的 href= 或 src= 标签中。（\x27 是单个刻度）

apache wget -k 在 Windows 和 Linux 上以不同的方式转换文件

提问by cherouvim

回答by Can Berk Güder

回答by bobince

回答by ax.

回答by Phaiax

相关推荐

最近更新

标签

apache wget -k 在 Windows 和 Linux 上以不同的方式转换文件

提问by cherouvim

回答by Can Berk Güder

回答by bobince

回答by ax.

回答by Phaiax

相关推荐

apache 您自己服务器上的动态 DNS

为什么 Apache 抱怨我的 mod_perl 程序“断开连接使 1 个活动语句句柄无效”？

apache .HTACCESS 文件导致内部服务器错误

apache 将 403 Forbidden 重定向到 404 Not Found 的问题

相关推荐

最近更新

标签