Ruby 2.0.0 String#Match ArgumentError：UTF-8 中的字节序列无效

Question

提问by Tom Rossi

I see this a lot and haven't figured out a graceful solution. If user input contains invalid byte sequences, I need to be able to have it not raise an exception. For example:

我看到了很多，还没有想出一个优雅的解决方案。如果用户输入包含无效的字节序列，我需要能够让它不引发异常。例如：

# @raw_response comes from user and contains invalid UTF-8
# for example: @raw_response = "\xBF"  
regex.match(@raw_response)
ArgumentError: invalid byte sequence in UTF-8

Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. Neither of these work for me however:

已经提出了许多类似的问题，结果似乎是对字符串进行编码或强制编码。然而，这些都不适合我：

regex.match(@raw_response.force_encoding("UTF-8"))
ArgumentError: invalid byte sequence in UTF-8

or

或者

regex.match(@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?"))
ArgumentError: invalid byte sequence in UTF-8

Is this a bug with Ruby 2.0.0 or am I missing something?

这是 Ruby 2.0.0 的错误还是我遗漏了什么？

What is strange is it appear to be encoding correctly, but match continues to raise an exception:

奇怪的是它似乎编码正确，但 match 继续引发异常：

@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding
 => #<Encoding:UTF-8>

Answer 1

回答by matt

In Ruby 2.0 the encodemethodis a no-op when encoding a string to its current encoding:

在 Ruby 2.0 中，当将字符串编码为其当前编码时，该encode方法是空操作：

Please note that conversion from an encoding encto the same encoding encis a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes.

请注意，从一种编码enc到相同编码的转换enc是无操作的，即接收器返回时没有任何更改，并且不会引发异常，即使存在无效字节也是如此。

This changed in 2.1, which also added the scrubmethodas an easier way to do this.

这在 2.1 中发生了变化，它还添加了scrub方法作为更简单的方法来做到这一点。

If you are unable to upgrade to 2.1, you'll have to encode into a different encoding and back in order to remove invalid bytes, something like:

如果您无法升级到 2.1，则必须编码为不同的编码并返回以删除无效字节，例如：

if ! s.valid_encoding?
  s = s.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
end

Answer 2

回答by Yogh

Since you're using Rails and not just Ruby you can also use tidy_bytes. This works with Ruby 2.0 and also will probably give you back sensible data instead of just replacement characters.

由于您使用的是 Rails 而不仅仅是 Ruby，因此您也可以使用tidy_bytes。这适用于 Ruby 2.0，并且还可能会返回合理的数据，而不仅仅是替换字符。

Ruby 2.0.0 String#Match ArgumentError：UTF-8 中的字节序列无效

提问by Tom Rossi

回答by matt

回答by Yogh

相关推荐

最近更新

标签

Ruby 2.0.0 String#Match ArgumentError：UTF-8 中的字节序列无效

提问by Tom Rossi

回答by matt

回答by Yogh

相关推荐

Ruby-on-rails 将 YAML 与变量一起使用

Ruby-on-rails 检查字符串是否包含多个子字符串之一

Ruby-on-rails 如何指定私有方法

Ruby-on-rails 返回 ID 数组

相关推荐

最近更新

标签