Ruby/Rails CSV 解析,UTF-8 中的无效字节序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8380113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Ruby/Rails CSV parsing, invalid byte sequence in UTF-8
提问by rogeliog
I am trying to parse a CSV file generated from an Excel spreadsheet.
我正在尝试解析从 Excel 电子表格生成的 CSV 文件。
Here is my code
这是我的代码
require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)
But I get this error
但我收到这个错误
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1)and not in UTF-8
我认为错误是因为 Excel 将文件编码为ISO 8859-1 (Latin-1)而不是UTF-8
Can someone help me with a workaround for this issue, please
有人可以帮我解决这个问题吗,请
Thanks in advance.
提前致谢。
回答by Linuxios
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
您需要告诉 Ruby 该文件在 ISO-8859-1 中。将您的文件打开行更改为:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
第二个参数告诉 Ruby 使用编码 ISO-8859-1 以只读方式打开。
回答by Sudhir Vishwakarma
Specify the encoding with encodingoption:
使用encoding选项指定编码:
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
...
end
回答by kixorz
You can supply source encoding straight in the file mode parameter:
您可以直接在文件模式参数中提供源编码:
CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
<your code>
end
回答by Eliza A
Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file
以 utf-8 格式保存文件,除非出于某种原因您需要以不同方式保存它,在这种情况下,您可以在读取文件时指定编码集
回答by Gagan Gami
add second argument "r:ISO-8859-1"as File.open("input_file","r:ISO-8859-1" )
将第二个参数添加"r:ISO-8859-1"为File.open("input_file","r:ISO-8859-1" )
回答by user3787971
I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.
我遇到了同样的问题,只是使用谷歌电子表格,然后下载为 CSV。那是最简单的解决方案。
Then I came across this gem
然后我遇到了这个宝石
https://github.com/singlebrook/utf8-cleaner
https://github.com/singlebrook/utf8-cleaner
Now I don't need to worry about this issue at all. Hope this helps!
现在我根本不需要担心这个问题。希望这可以帮助!
回答by ToTenMilan
If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csvextension manually, and paste the contents of your file there, then parse the contents like usual.
如果您只有一个(或几个)文件,那么当它不需要在您从输入中获得的任何文件上自动声明编码时,并且您可以在纯文本(txt、csv 等)中看到该文件的内容,并用分号分隔,您可以.csv手动创建带有扩展名的新文件,然后将文件内容粘贴到那里,然后像往常一样解析内容。
Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings
请记住,这是一种解决方法,但只需要在 linux 中解析一个大的 excel 文件,转换为某种形式的 csv,它就可以腾出时间来试验所有这些花哨的编码

