Ruby-on-rails 将字符串从任何编码强制转换为 UTF-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12947910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 20:53:39  来源:igfitidea点击:

Force strings to UTF-8 from any encoding

ruby-on-railsrubyutf-8character-encoding

提问by Hayk Saakian

In my rails app I'm working with RSS feeds from all around the world, and some feeds have links that are not in UTF-8. The original feed links are out of my control, and in order to use them in other parts of the app, they need to be in UTF-8.

在我的 rails 应用程序中,我正在处理来自世界各地的 RSS 提要,并且一些提要具有非 UTF-8 格式的链接。原始提要链接不受我的控制,为了在应用程序的其他部分使用它们,它们需要采用 UTF-8。

How can I detect encoding and convert to UTF-8?

如何检测编码并转换为 UTF-8?

回答by kwarrick

Ruby 1.9

红宝石 1.9

"Forcing" an encoding is easy, however it won't convert the characters just change the encoding:

“强制”编码很容易,但它不会转换字符,只是更改编码:

str = str.force_encoding('UTF-8')

str.encoding.name # => 'UTF-8'

If you want to perform a conversion, use encode:

如果要执行转换,请使用encode

begin
  str.encode("UTF-8")
rescue Encoding::UndefinedConversionError
  # ...
end

I would definitely read the following post for more information:
http://graysoftinc.com/character-encodings/ruby-19s-string

我肯定会阅读以下帖子以获取更多信息:http:
//graysoftinc.com/character-encodings/ruby-19s-string

回答by John Pollard

This will ensure you have the correct encoding and won't error out because it replaces any invalid or undefined character with a blank string.

这将确保您拥有正确的编码并且不会出错,因为它将用空白字符串替换任何无效或未定义的字符。

This will ensure no matter what, that you have a valid UTF-8 string

无论如何,这将确保您拥有有效的 UTF-8 字符串

str.encode(Encoding.find('UTF-8'), {invalid: :replace, undef: :replace, replace: ''})

回答by Dipak Panchal

Iconv

图标

require 'iconv'
i = Iconv.new('UTF-8','LATIN1')
a_with_hat = i.iconv("\xc2")

Summary: the iconv gem does all the work of converting encodings. Make sure it's installed with:

总结: iconv gem 完成转换编码的所有工作。确保它安装了:

gem install iconv

Now, you need to know what encoding your string is currently in as Ruby 1.8 treats Strings as an array of bytes (with no intrinsic encoding.) For example, say your string was in latin1 and you wanted to convert it to utf-8

现在,您需要知道您的字符串当前采用什么编码,因为 Ruby 1.8 将字符串视为字节数组(没有内在编码)。例如,假设您的字符串在 latin1 中,并且您想将其转换为 utf-8

require 'iconv'

string_in_utf8_encoding = Iconv.conv("UTF8", "LATIN1", string_in_latin1_encoding)