Ruby 2.0.0 String#Match ArgumentError:UTF-8 中的字节序列无效
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24036821/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8
提问by Tom Rossi
I see this a lot and haven't figured out a graceful solution. If user input contains invalid byte sequences, I need to be able to have it not raise an exception. For example:
我看到了很多,还没有想出一个优雅的解决方案。如果用户输入包含无效的字节序列,我需要能够让它不引发异常。例如:
# @raw_response comes from user and contains invalid UTF-8
# for example: @raw_response = "\xBF"
regex.match(@raw_response)
ArgumentError: invalid byte sequence in UTF-8
Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. Neither of these work for me however:
已经提出了许多类似的问题,结果似乎是对字符串进行编码或强制编码。然而,这些都不适合我:
regex.match(@raw_response.force_encoding("UTF-8"))
ArgumentError: invalid byte sequence in UTF-8
or
或者
regex.match(@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?"))
ArgumentError: invalid byte sequence in UTF-8
Is this a bug with Ruby 2.0.0 or am I missing something?
这是 Ruby 2.0.0 的错误还是我遗漏了什么?
What is strange is it appear to be encoding correctly, but match continues to raise an exception:
奇怪的是它似乎编码正确,但 match 继续引发异常:
@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding
=> #<Encoding:UTF-8>
回答by matt
In Ruby 2.0 the encodemethodis a no-op when encoding a string to its current encoding:
在 Ruby 2.0 中,当将字符串编码为其当前编码时,该encode方法是空操作:
Please note that conversion from an encoding
encto the same encodingencis a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes.
请注意,从一种编码
enc到相同编码的转换enc是无操作的,即接收器返回时没有任何更改,并且不会引发异常,即使存在无效字节也是如此。
This changed in 2.1, which also added the scrubmethodas an easier way to do this.
这在 2.1 中发生了变化,它还添加了scrub方法作为更简单的方法来做到这一点。
If you are unable to upgrade to 2.1, you'll have to encode into a different encoding and back in order to remove invalid bytes, something like:
如果您无法升级到 2.1,则必须编码为不同的编码并返回以删除无效字节,例如:
if ! s.valid_encoding?
s = s.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
end
回答by Yogh
Since you're using Rails and not just Ruby you can also use tidy_bytes. This works with Ruby 2.0 and also will probably give you back sensible data instead of just replacement characters.
由于您使用的是 Rails 而不仅仅是 Ruby,因此您也可以使用tidy_bytes。这适用于 Ruby 2.0,并且还可能会返回合理的数据,而不仅仅是替换字符。

