UTF-8 中的 Ruby 无效字节序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9607554/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Ruby Invalid Byte Sequence in UTF-8
提问by redgem
I have the following code, which gives me an invalid byte sequence error pointing to the scan method in initialize. Any ideas on how to fix this? For what it's worth, the error does not occur when the (.*)between the h1 tag and the closing >is not there.
我有以下代码,它给了我一个无效的字节序列错误,指向initialize. 有想法该怎么解决这个吗?值得一提的是,当(.*)h1 标签和结束之间不存在时,不会发生错误>。
#!/usr/bin/env ruby
class NewsParser
def initialize
Dir.glob("./**/index.htm") do |file|
@file = IO.read file
parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)
self.write(parsed)
end
end
def write output
@contents = output
open('output.txt', 'a') do |f|
f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n"
end
end
end
p = NewsParser.new
Edit: Here is the error message:
编辑:这是错误消息:
news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)
news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)
SOLVED: The combination of using:
@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)and
encoding: UTF-8solve the issue.
SOLVED:结合使用:
@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)和
encoding: UTF-8解决问题。
Thanks!
谢谢!
回答by redgem
The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)and #encoding: UTF-8solved the issue.
结合使用:@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)并#encoding: UTF-8解决了问题。

