如何在 Ruby 中将字符串转换为 UTF8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17022394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 06:01:30  来源:igfitidea点击:

How to convert a string to UTF8 in Ruby

rubyfileencodingutf-8dump

提问by ciembor

I'm writing a crawler which uses Hpricot. It downloads a list of strings from some webpage, then I try to write it to the file. Something is wrong with the encoding:

我正在编写一个使用 Hpricot 的爬虫。它从某个网页下载字符串列表,然后我尝试将其写入文件。编码有问题:

"\xC3" from ASCII-8BIT to UTF-8

I have items which are rendered on a webpage and printed this way:

我有在网页上呈现并以这种方式打印的项目:

D??veloppement

the str.encodingreturns UTF-8, so force_encoding('UTF-8')doesn't help. How may I convert this to readable UTF-8?

str.encoding回报UTF-8,所以force_encoding('UTF-8')没有帮助。如何将其转换为可读的 UTF-8?

回答by Stefan

Your string seems to have been encoded the wrong way round:

您的字符串似乎以错误的方式编码:

"D??veloppement".encode("iso-8859-1").force_encoding("utf-8")
#=> "Développement"

回答by knut

Seems your string thinks it is UTF-8, but in reality, it is something else, probably ISO-8859-1.

似乎你的字符串认为它是 UTF-8,但实际上,它是别的东西,可能是 ISO-8859-1。

Define (force) the correct encoding first, then convert it to UTF-8.

首先定义(强制)正确的编码,然后将其转换为 UTF-8。

In your example:

在你的例子中:

puts "D??veloppement".encode('iso-8859-1').encode('utf-8')

An alternative is:

另一种选择是:

puts "\xC3".force_encoding('iso-8859-1').encode('utf-8') #-> ?

If the ?makes no sense, then try another encoding.

如果?没有意义,则尝试另一种编码。

回答by kaleb4eg

"ruby 1.9: invalid byte sequence in UTF-8" described another good approach with less code:

ruby 1.9: invalid byte sequence in UTF-8”描述了另一种用较少代码的好方法:

file_contents.encode!('UTF-16', 'UTF-8')