Ruby-on-rails 将字符串从任何编码强制转换为 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12947910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Force strings to UTF-8 from any encoding
提问by Hayk Saakian
In my rails app I'm working with RSS feeds from all around the world, and some feeds have links that are not in UTF-8. The original feed links are out of my control, and in order to use them in other parts of the app, they need to be in UTF-8.
在我的 rails 应用程序中,我正在处理来自世界各地的 RSS 提要,并且一些提要具有非 UTF-8 格式的链接。原始提要链接不受我的控制,为了在应用程序的其他部分使用它们,它们需要采用 UTF-8。
How can I detect encoding and convert to UTF-8?
如何检测编码并转换为 UTF-8?
回答by kwarrick
Ruby 1.9
红宝石 1.9
"Forcing" an encoding is easy, however it won't convert the characters just change the encoding:
“强制”编码很容易,但它不会转换字符,只是更改编码:
str = str.force_encoding('UTF-8')
str.encoding.name # => 'UTF-8'
If you want to perform a conversion, use encode:
如果要执行转换,请使用encode:
begin
str.encode("UTF-8")
rescue Encoding::UndefinedConversionError
# ...
end
I would definitely read the following post for more information:
http://graysoftinc.com/character-encodings/ruby-19s-string
我肯定会阅读以下帖子以获取更多信息:http:
//graysoftinc.com/character-encodings/ruby-19s-string
回答by John Pollard
This will ensure you have the correct encoding and won't error out because it replaces any invalid or undefined character with a blank string.
这将确保您拥有正确的编码并且不会出错,因为它将用空白字符串替换任何无效或未定义的字符。
This will ensure no matter what, that you have a valid UTF-8 string
无论如何,这将确保您拥有有效的 UTF-8 字符串
str.encode(Encoding.find('UTF-8'), {invalid: :replace, undef: :replace, replace: ''})
回答by Dipak Panchal
require 'iconv'
i = Iconv.new('UTF-8','LATIN1')
a_with_hat = i.iconv("\xc2")
Summary: the iconv gem does all the work of converting encodings. Make sure it's installed with:
总结: iconv gem 完成转换编码的所有工作。确保它安装了:
gem install iconv
Now, you need to know what encoding your string is currently in as Ruby 1.8 treats Strings as an array of bytes (with no intrinsic encoding.) For example, say your string was in latin1 and you wanted to convert it to utf-8
现在,您需要知道您的字符串当前采用什么编码,因为 Ruby 1.8 将字符串视为字节数组(没有内在编码)。例如,假设您的字符串在 latin1 中,并且您想将其转换为 utf-8
require 'iconv'
string_in_utf8_encoding = Iconv.conv("UTF8", "LATIN1", string_in_latin1_encoding)

