ruby CSV.read 在第 x 行非法引用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9864064/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CSV.read Illegal quoting in line x
提问by JZ.
I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
我正在使用带有大量数据的 ruby CSV.read。库有时会遇到格式不正确的行,例如:
"Illegal quoting in line 53657."
It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?
忽略该行并跳过它会更容易,然后遍历每个 csv 并修复格式。我怎样才能做到这一点?
回答by Ray Baxter
I had this problem in a line like 123,456,a"b"c
我在一行中遇到了这个问题 123,456,a"b"c
The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.
问题是 CSV 解析器期望",如果它们出现,则完全包围以逗号分隔的文本。
Solution use a quote character besides "that I was sure would not appear in my data:
除了"我确定不会出现在我的数据中之外,解决方案还使用引号字符:
CSV.read(filename, :quote_char => "|")
CSV.read(filename, :quote_char => "|")
回答by Will Madden
The liberal_parsingoption is available starting in Ruby 2.4 for cases like this. From the documentation:
该liberal_parsing选项从 Ruby 2.4 开始适用于此类情况。从文档:
When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.
当设置为真值时,CSV 将尝试解析不符合 RFC 4180 的输入,例如未引用字段中的双引号。
To enable it, pass it as an option to the CSV read/parse/new methods:
要启用它,请将其作为选项传递给 CSV 读取/解析/新方法:
CSV.read(filename, liberal_parsing: true)
回答by DigitalRoss
Don't let CSV both read and parse the file.
不要让 CSV 读取和解析文件。
Just read the file yourself and hand each line to CSV.parse_line, and then rescueany exceptions it throws.
只需自己阅读文件并将每一行交给CSV.parse_line,然后rescue抛出任何异常。
回答by Tombart
Try forcing double quote character "as quote char:
尝试强制双引号字符"作为引号字符:
require 'csv'
CSV.foreach(file,{headers: :first_row, quote_char: "\x00"}) do |line|
p line
end
回答by allknowingfrog
Apparently this error can also be caused by unprintable BOM characters. This threadsuggests using a file mode to force a conversion, which is what finally worked for me.
显然这个错误也可能是由不可打印的 BOM 字符引起的。该线程建议使用文件模式强制转换,这最终对我有用。
require 'csv'
CSV.open(@filename, 'r:bom|utf-8') do |csv|
# do something
end

