ruby `encode': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23309669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 06:27:21  来源:igfitidea点击:

ruby `encode': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

rubyencodingutf-8

提问by

Hannibal episodes in tvdb have weird characters in them.

tvdb 中的汉尼拔剧集中有奇怪的角色。

For example:

例如:

?uf

So ruby spits out:

于是ruby吐了出来:

./manifesto.rb:19:in `encode': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
    from ./manifesto.rb:19:in `to_json'
    from ./manifesto.rb:19:in `<main>'

Line 19 is:

第 19 行是:

puts @tree.to_json

Is there a way to deal with these non utf characters? I'd rather not replace them, but convert them? Or ignore them? I don't know, any help appreciated.

有没有办法处理这些非 utf 字符?我宁愿不替换它们,而是转换它们?还是无视他们?我不知道,任何帮助表示赞赏。

Weird part is that script works fine via cron. Manually running it creates error.

奇怪的部分是脚本可以通过 cron 正常工作。手动运行它会产生错误。

采纳答案by Малъ Скрылевъ

It seems you should use another encoding for the object. You should set the proper codepage to the variable @tree, for instance, using iso-8859-1instead of ascii-8bitby using @tree.force_encoding('ISO-8859-1'). Because ASCII-8BITis used just for binary files.

看来您应该为该对象使用另一种编码。你应该适当的代码页设置为变量@tree,例如,使用ISO-8859-1,而不是ASCII的8位使用@tree.force_encoding('ISO-8859-1')。因为ASCII-8BIT仅用于二进制文件。

To find the current external encoding for ruby, issue:

要查找 ruby​​ 的当前外部编码,请发出:

Encoding.default_external

If sudosolves the problem, the problem was in default codepage (encoding), so to resolve it you have to set the proper default codepage (encoding), by either:

如果sudo解决了问题,则问题出在默认代码页(编码)中,因此要解决它,您必须通过以下任一方式设置正确的默认代码页(编码):

  1. In ruby to change encoding to utf-8or another proper one, do as follows:

    Encoding.default_external = Encoding::UTF_8
    
  2. In bash, grepcurrent valid set up:

    $ sudo env|grep UTF-8
    LC_ALL=ru_RU.UTF-8
    LANG=ru_RU.UTF-8
    

    Then set them in .bashrcproperly, in a similar way, but not exactly with ru_RUlanguage, such as the following:

    export LC_ALL=ru_RU.UTF-8
    export LANG=ru_RU.UTF-8
    
  1. 在 ruby中将编码更改为utf-8或其他适当的编码,请执行以下操作:

    Encoding.default_external = Encoding::UTF_8
    
  2. bash 中grep当前有效设置:

    $ sudo env|grep UTF-8
    LC_ALL=ru_RU.UTF-8
    LANG=ru_RU.UTF-8
    

    然后.bashrc以类似的方式正确设置它们,但不完全使用ru_RU语言,例如以下内容:

    export LC_ALL=ru_RU.UTF-8
    export LANG=ru_RU.UTF-8
    

回答by unplugandplay

File.open(yml_file, 'w') should be change to File.open(yml_file, 'wb')

File.open(yml_file, 'w') 应该改为 File.open(yml_file, 'w b')

回答by Samuel

I just suffered through a number of hours trying to fix a similar problem. I'd checked my locales, database encoding, everything I could think of and was still getting ASCII-8BIT encoded data from the database.

我只是花了几个小时试图解决类似的问题。我检查了我的语言环境、数据库编码以及我能想到的一切,但仍然从数据库中获取 ASCII-8BIT 编码的数据。

Well, it turns out that if you store text in a binary field, it will automatically be returned as ASCII-8BIT encoded text, which makes sense, however this can (obviously) cause problems in your application.

好吧,事实证明,如果您将文本存储在二进制字段中,它将自动作为 ASCII-8BIT 编码文本返回,这是有道理的,但是这(显然)会导致您的应用程序出现问题。

It can be fixed by changing the column encoding back to :textin your migrations.

可以通过:text在迁移中将列编码更改回来修复它。

回答by Игорь Хлебников

I had the same problems when saving to the database. I'll offer one thing that I use (perhaps, this will help someone).

保存到数据库时我遇到了同样的问题。我将提供我使用的一件事(也许,这会帮助某人)。

if you know that sometimes your text has strange characters, then before saving you can encode your text in some other format, and then decode the text again after it is returned from the database.

如果您知道有时您的文本有奇怪的字符,那么在保存之前您可以将文本编码为其他格式,然后在从数据库返回后再次解码文本。

example:

例子:

string = "?uf"

before save we encode string

在保存之前我们对字符串进行编码

text_to_save = CGI.escape(string)

(character "?" encoded in "%C5%92" and other characters remained the same)

(字符“?”编码为“%C5%92”,其他字符保持不变)

=> "%C5%92uf"

=> "%C5%92uf"

load from database and decode

从数据库加载并解码

CGI.unescape("%C5%92uf")

=> "?uf"

=> "?uf"