UTF-8 中的 Ruby 无效字节序列

Question

提问by redgem

I have the following code, which gives me an invalid byte sequence error pointing to the scan method in initialize. Any ideas on how to fix this? For what it's worth, the error does not occur when the (.*)between the h1 tag and the closing >is not there.

我有以下代码，它给了我一个无效的字节序列错误，指向initialize. 有想法该怎么解决这个吗？值得一提的是，当(.*)h1 标签和结束之间不存在时，不会发生错误>。

#!/usr/bin/env ruby

class NewsParser

  def initialize
      Dir.glob("./**/index.htm") do |file|
        @file = IO.read file 
        parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)
        self.write(parsed)
      end
  end

  def write output
    @contents = output
    open('output.txt', 'a') do |f| 
      f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n" 
    end
  end

end

p = NewsParser.new

Edit: Here is the error message:

编辑：这是错误消息：

news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)

SOLVED: The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)and encoding: UTF-8solve the issue.

SOLVED：结合使用： @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)和 encoding: UTF-8解决问题。

Thanks!

谢谢！

Answer 1

回答by redgem

The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)and #encoding: UTF-8solved the issue.

结合使用：@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)并#encoding: UTF-8解决了问题。

UTF-8 中的 Ruby 无效字节序列

提问by redgem

回答by redgem

相关推荐

最近更新

标签

UTF-8 中的 Ruby 无效字节序列

提问by redgem

回答by redgem

相关推荐

ruby 不区分大小写的数组#include？

ruby rvm 安装不起作用：“RVM 不是功能”

如何在 Windows 7 上运行 ruby​​ 程序？

无法对 ruby​​ 哈希使用点语法

相关推荐

最近更新

标签

如何在 Windows 7 上运行 ruby 程序？

无法对 ruby 哈希使用点语法