linux curl 保存为utf-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10172327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 05:48:43  来源:igfitidea点击:

linux curl save as utf-8

linuxcurl

提问by flyclassic

Trying to use linux curl to download an xml file from an url.

尝试使用 linux curl 从 url 下载 xml 文件。

Pretty sure that the xml is encoded in UTF-8,

很确定 xml 是用 UTF-8 编码的,

suspecting curl -o doesnt save as UTF-8.

怀疑 curl -o 没有保存为 UTF-8。

Is there anyway to force save to UTF-8 with curl ?

有没有用 curl 强制保存到 UTF-8 ?



Thanks for the suggestion, what i found out:

感谢您的建议,我发现了什么:

Because the xml feed is dynamic, not all the time it contain any utf-8 characters. Sometimes it doesnt have utf-8 character in the whole content at all even though it is set as utf-8 in the xml encoding and header content type: charset=utf-8. When it contain a utf-8 character at least, it will be save as utf-8.

因为 xml 提要是动态的,所以它并不总是包含任何 utf-8 字符。有时它在整个内容中根本没有 utf-8 字符,即使它在 xml 编码和标题内容类型中设置为 utf-8:charset=utf-8。当它至少包含一个 utf-8 字符时,它将被保存为 utf-8。

When this happen, curl doesn't download as utf-8, which makes sense as there are no utf-8 chars, why is there a need to store as utf-8.

发生这种情况时,curl 不会下载为 utf-8,这是有道理的,因为没有 utf-8 字符,为什么需要存储为 utf-8。

This is damn tricky, some validator has to valid against utf-8 hence i still need a solution to force it to utf8 because by default all my xml shld be in utf8-encoding.

这太棘手了,一些验证器必须对 utf-8 有效,因此我仍然需要一个解决方案来强制它使用 utf8,因为默认情况下我所有的 xml 都采用 utf8 编码。

tried the suggested by using iconv f iso8859-1 utf-8 doesnt work for this case as i am suspecting it is not in iso8859-1 either.

尝试使用 iconv f iso8859-1 utf-8 建议的方法不适用于这种情况,因为我怀疑它也不在 iso8859-1 中。

Still need a better solution.

仍然需要更好的解决方案。

回答by user1202136

curl does not do any conversion of the file it downloads. If the HTTP server serves you the XML in another encoding (e.g., ISO8859-1) that his how curl will save it to disk too.

curl 不会对其下载的文件进行任何转换。如果 HTTP 服务器以另一种编码(例如,ISO8859-1)为您提供 XML,那么他的 curl 也会将其保存到磁盘。

To workaround your problem, you can use "iconv" as follows:

要解决您的问题,您可以使用“iconv”,如下所示:

curl URL | iconv -f iso8859-1 -t utf-8 > output.xml

Hope this help.

希望这有帮助。

回答by Andrew Khoury

Have you tried adding the Accept-Charset header? I had a similar issue downloading a file which was downloading with the wrong encoding. When I set the Accept-Charset header it works:

您是否尝试过添加 Accept-Charset 标头?我在下载一个使用错误编码下载的文件时遇到了类似的问题。当我设置 Accept-Charset 标头时,它可以工作:

curl -H "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" URL | iconv -f iso8859-1 -t utf-8 > output.xml