linux curl 保存为utf-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10172327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
linux curl save as utf-8
提问by flyclassic
Trying to use linux curl to download an xml file from an url.
尝试使用 linux curl 从 url 下载 xml 文件。
Pretty sure that the xml is encoded in UTF-8,
很确定 xml 是用 UTF-8 编码的,
suspecting curl -o doesnt save as UTF-8.
怀疑 curl -o 没有保存为 UTF-8。
Is there anyway to force save to UTF-8 with curl ?
有没有用 curl 强制保存到 UTF-8 ?
Thanks for the suggestion, what i found out:
感谢您的建议,我发现了什么:
Because the xml feed is dynamic, not all the time it contain any utf-8 characters. Sometimes it doesnt have utf-8 character in the whole content at all even though it is set as utf-8 in the xml encoding and header content type: charset=utf-8. When it contain a utf-8 character at least, it will be save as utf-8.
因为 xml 提要是动态的,所以它并不总是包含任何 utf-8 字符。有时它在整个内容中根本没有 utf-8 字符,即使它在 xml 编码和标题内容类型中设置为 utf-8:charset=utf-8。当它至少包含一个 utf-8 字符时,它将被保存为 utf-8。
When this happen, curl doesn't download as utf-8, which makes sense as there are no utf-8 chars, why is there a need to store as utf-8.
发生这种情况时,curl 不会下载为 utf-8,这是有道理的,因为没有 utf-8 字符,为什么需要存储为 utf-8。
This is damn tricky, some validator has to valid against utf-8 hence i still need a solution to force it to utf8 because by default all my xml shld be in utf8-encoding.
这太棘手了,一些验证器必须对 utf-8 有效,因此我仍然需要一个解决方案来强制它使用 utf8,因为默认情况下我所有的 xml 都采用 utf8 编码。
tried the suggested by using iconv f iso8859-1 utf-8 doesnt work for this case as i am suspecting it is not in iso8859-1 either.
尝试使用 iconv f iso8859-1 utf-8 建议的方法不适用于这种情况,因为我怀疑它也不在 iso8859-1 中。
Still need a better solution.
仍然需要更好的解决方案。
回答by user1202136
curl does not do any conversion of the file it downloads. If the HTTP server serves you the XML in another encoding (e.g., ISO8859-1) that his how curl will save it to disk too.
curl 不会对其下载的文件进行任何转换。如果 HTTP 服务器以另一种编码(例如,ISO8859-1)为您提供 XML,那么他的 curl 也会将其保存到磁盘。
To workaround your problem, you can use "iconv" as follows:
要解决您的问题,您可以使用“iconv”,如下所示:
curl URL | iconv -f iso8859-1 -t utf-8 > output.xml
Hope this help.
希望这有帮助。
回答by Andrew Khoury
Have you tried adding the Accept-Charset header? I had a similar issue downloading a file which was downloading with the wrong encoding. When I set the Accept-Charset header it works:
您是否尝试过添加 Accept-Charset 标头?我在下载一个使用错误编码下载的文件时遇到了类似的问题。当我设置 Accept-Charset 标头时,它可以工作:
curl -H "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" URL | iconv -f iso8859-1 -t utf-8 > output.xml