ruby 编码::未定义转换错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13003287/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 05:28:04  来源:igfitidea点击:

Encoding::UndefinedConversionError

rubyencodingsinatrasequel

提问by martriay

I keep getting an Encoding::UndefinedConversionError - "\xC2" from ASCII-8BIT to UTF-8every time I try to convert a hash into a JSON string. I tried with [.encode | .force_encoding](["UTF-8" | "ASCII-8BIT" ]), chaining .encodewith .force_encoding, backwards, switching parameters but nothing seemed to work so I caught the error like this:

Encoding::UndefinedConversionError - "\xC2" from ASCII-8BIT to UTF-8每次尝试将哈希转换为 JSON 字符串时,我都会收到一个。我试着用[.encode | .force_encoding](["UTF-8" | "ASCII-8BIT" ]),链.encode.force_encoding,向后切换参数,但似乎没有任何工作,所以我赶上了错误是这样的:

begin
  menu.to_json
rescue Encoding::UndefinedConversionError
  puts $!.error_char.dump
  p $!.error_char.encoding
end

Where menu is a sequel's dataset.to_hash with content from a MySQL DB, utf8_general_ci encoding and returned this:

其中 menu 是续集的 dataset.to_hash ,内容来自 MySQL 数据库,utf8_general_ci 编码并返回:

"\xC2"

<#Encoding:ASCII-8BIT>

"\xC2"

<#Encoding:ASCII-8BIT>

The encoding never changes, no matter what .encode/.force_encodingI use. I've even tried to replace the string .gsub!(/\\\xC2/)without luck.

无论.encode/.force_encoding我使用什么,编码都不会改变。我什至试图在.gsub!(/\\\xC2/)没有运气的情况下替换字符串。

Any ideas?

有任何想法吗?

回答by martriay

menu.to_s.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?')

This worked perfectly, I had to replace some extra characters but there are no more errors.

这工作得很好,我不得不替换一些额外的字符,但没有更多的错误。

回答by knut

What do you expect for "\xC2"? Probably a ?

你对“\xC2”有什么期望?大概一个?

With ASCII-8BIT you have binary data, and ruby cant decide, what should be.

使用 ASCII-8BIT 你有二进制数据,而 ruby​​ 无法决定,应该是什么。

You must first set the encoding with force_encoding.

您必须首先使用force_encoding.

You may try the following code:

你可以试试下面的代码:

Encoding.list.each{|enc|
  begin
    print "%-10s\t" % [enc]
    print "\t\xC2".force_encoding(enc)
    print "\t\xC2".force_encoding(enc).encode('utf-8')
  rescue => err
    print "\t#{err}"
  end
  print "\n"
}

The result are the possible values in different encodings for your "\xC2".

结果是“\xC2”的不同编码的可能值。

The result may depend on your Output format, but I think you can make a good guess, which encoding you have.

结果可能取决于您的输出格式,但我认为您可以很好地猜测您拥有哪种编码。

When you defined the encoding you need (probably cp1251) you can

当您定义所需的编码(可能是 cp1251)时,您可以

menu.force_encoding('cp1252').to_json

See also Kashyaps comment.

另见 Kashyaps 评论。

回答by Ponny

If you don't care about losing the strange characters, you can blow them away:

如果你不在乎丢失奇怪的字符,你可以把它们吹走:

str.force_encoding("ASCII-8BIT").encode('UTF-8', undef: :replace, replace: '')

回答by gvo

Your auto-accepted solution doesn't work, there are effectively no errors, but it is NOT JSON.

您自动接受的解决方案不起作用,实际上没有错误,但它不是 JSON。

I solved the problem using the oj gem, it now works find. It is also faster than the standard JSON library.

我使用 oj gem 解决了这个问题,现在可以找到了。它也比标准 JSON 库更快。

Writting :

写作:

   menu_json = Oj.dump menu

Reading :

读 :

   menu2 = Oj.load menu_json

https://github.com/ohler55/ojfor more details. I hope it will help.

https://github.com/ohler55/oj了解更多详情。我希望它会有所帮助。

回答by user3430535

:fallback option can be useful if you know what chars you want to replace

如果您知道要替换的字符, :fallback 选项会很有用

"Text ".encode("ASCII", "UTF-8", fallback: {"" => ":)"})
#=> hello :)

From docs:

从文档:

Sets the replacement string by the given object for undefined character. The object should be a Hash, a Proc, a Method, or an object which has [] method. Its key is an undefined character encoded in the source encoding of current transcoder. Its value can be any encoding until it can be converted into the destination encoding of the transcoder.

通过给定对象为未定义字符设置替换字符串。对象应该是 Hash、Proc、Method 或具有 [] 方法的对象。它的关键是在当前转码器的源编码中编码的未定义字符。它的值可以是任何编码,直到它可以转换为转码器的目标编码。