Ruby/Rails CSV 解析，UTF-8 中的无效字节序列

Question

提问by rogeliog

I am trying to parse a CSV file generated from an Excel spreadsheet.

我正在尝试解析从 Excel 电子表格生成的 CSV 文件。

Here is my code

这是我的代码

require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)

But I get this error

但我收到这个错误

ArgumentError: invalid byte sequence in UTF-8

I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1)and not in UTF-8

我认为错误是因为 Excel 将文件编码为ISO 8859-1 (Latin-1)而不是UTF-8

Can someone help me with a workaround for this issue, please

有人可以帮我解决这个问题吗，请

Thanks in advance.

提前致谢。

Answer 1

回答by Linuxios

You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:

您需要告诉 Ruby 该文件在 ISO-8859-1 中。将您的文件打开行更改为：

file=File.open("input_file", "r:ISO-8859-1")

The second argument tells Ruby to open read only with the encoding ISO-8859-1.

第二个参数告诉 Ruby 使用编码 ISO-8859-1 以只读方式打开。

Answer 2

回答by Sudhir Vishwakarma

Specify the encoding with encodingoption:

使用encoding选项指定编码：

CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
  ...
end

Answer 3

回答by kixorz

You can supply source encoding straight in the file mode parameter:

您可以直接在文件模式参数中提供源编码：

CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
   <your code>
end

Answer 4

回答by Eliza A

Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file

以 utf-8 格式保存文件，除非出于某种原因您需要以不同方式保存它，在这种情况下，您可以在读取文件时指定编码集

Answer 5

回答by Gagan Gami

add second argument "r:ISO-8859-1"as File.open("input_file","r:ISO-8859-1" )

将第二个参数添加"r:ISO-8859-1"为File.open("input_file","r:ISO-8859-1" )

Answer 6

回答by user3787971

I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.

我遇到了同样的问题，只是使用谷歌电子表格，然后下载为 CSV。那是最简单的解决方案。

Then I came across this gem

然后我遇到了这个宝石

https://github.com/singlebrook/utf8-cleaner

Now I don't need to worry about this issue at all. Hope this helps!

现在我根本不需要担心这个问题。希望这可以帮助！

Answer 7

回答by ToTenMilan

If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csvextension manually, and paste the contents of your file there, then parse the contents like usual.

如果您只有一个（或几个）文件，那么当它不需要在您从输入中获得的任何文件上自动声明编码时，并且您可以在纯文本（txt、csv 等）中看到该文件的内容，并用分号分隔，您可以.csv手动创建带有扩展名的新文件，然后将文件内容粘贴到那里，然后像往常一样解析内容。

Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings

请记住，这是一种解决方法，但只需要在 linux 中解析一个大的 excel 文件，转换为某种形式的 csv，它就可以腾出时间来试验所有这些花哨的编码

Ruby/Rails CSV 解析，UTF-8 中的无效字节序列

提问by rogeliog

回答by Linuxios

回答by Sudhir Vishwakarma

回答by kixorz

回答by Eliza A

回答by Gagan Gami

回答by user3787971

回答by ToTenMilan

相关推荐

最近更新

标签

Ruby/Rails CSV 解析，UTF-8 中的无效字节序列

提问by rogeliog

回答by Linuxios

回答by Sudhir Vishwakarma

回答by kixorz

回答by Eliza A

回答by Gagan Gami

回答by user3787971

回答by ToTenMilan

相关推荐

Ruby-on-rails 覆盖 rails 的默认 rake 任务

在已经创建控制器和模型之后创建 Ruby on Rails 视图（仅）

捆绑安装/更新：libv8 (therubyracer) 安装失败（使用本机扩展）

Ruby-on-rails 如何在 rails 中指定和验证枚举？

相关推荐

最近更新

标签