带有转义引号的 Ruby CSV 解析字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14534522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 05:42:20  来源:igfitidea点击:

Ruby CSV parsing string with escaped quotes

rubycsv

提问by Andrew

I have a line in my CSV file that has some escaped quotes:

我的 CSV 文件中有一行包含一些转义引号:

173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"

When I try to parse it the the Ruby CSV parser:

当我尝试解析它时,Ruby CSV 解析器:

require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
  puts row
end

I get this error:

我收到此错误:

.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)

How can I get around this error?

我怎样才能解决这个错误?

回答by joelparkerhenderson

The \"is typical Unix whereas Ruby CSV expects ""

\"是典型的 Unix 而 Ruby CSV 期望""

To parse it:

要解析它:

require 'csv'
text = File.read('test.csv').gsub(/\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
  puts row
end

Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.

注意:如果您的 CSV 文件非常大,它会使用大量 RAM 来读取整个文件。考虑一次读取文件一行。

Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:

注意:如果您的 CSV 文件可能在斜线前面有斜线,请使用下面的 Andrew Grimm 建议来提供帮助:

gsub(/(?<!\)\"/,'""')

回答by the Tin Man

CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaceson all fields in a row.

CSV 支持“转换器”,我们通常可以使用它在将字段的内容传递回我们的代码之前对其进行处理。例如,这可用于去除一行中所有字段上的额外空格

Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.

不幸的是,转换器在将行拆分为字段后触发,并且在该步骤中 CSV 对嵌入的引号感到生气,因此我们必须介于“行读取”步骤和“将行解析为字段”之间“ 步。

This is my sample CSV file:

这是我的示例 CSV 文件:

ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"

Preserving your CSV.foreachmethod, this is my example code for parsing it without CSV getting mad:

保留您的CSV.foreach方法,这是我的示例代码,用于在不让 CSV 生气的情况下解析它:

require 'csv'
require 'pp'

header = []
File.foreach('test.csv') do |csv_line|

  row = CSV.parse(csv_line.gsub('\"', '""')).first

  if header.empty?
    header = row.map(&:to_sym)
    next
  end

  row = Hash[header.zip(row)]
  pp row
  puts row[:Name]

end

And the resulting hash and name value:

以及由此产生的哈希值和名称值:

{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto

I assumed you were wanting a hash back because you specified the :headersflag:

我假设你想要一个散列,因为你指定了:headers标志:

CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|

回答by Karan Verma

Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

在 MSExcel 中打开文件并另存为 MS-DOS 逗号分隔 (.csv)