Ruby 无法解析 CSV 文件:CSV::MalformedCSVError(第 1 行中的非法引用。)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16772830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 05:59:53  来源:igfitidea点击:

Ruby unable to parse a CSV file: CSV::MalformedCSVError (Illegal quoting in line 1.)

rubycsvmalformed

提问by Jignesh Gohel

Ubuntu 12.04 LTS

Ubuntu 12.04 LTS

Ruby ruby 1.9.3dev (2011-09-23 revision 33323) [i686-linux]

Ruby ruby​​ 1.9.3dev (2011-09-23 修订版 33323) [i686-linux]

Rails 3.2.9

导轨 3.2.9

Following is the content of my received CSV file:

以下是我收到的 CSV 文件的内容:

"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"

However when I am trying to parse the CSV file I am getting error:

但是,当我尝试解析 CSV 文件时,出现错误:

1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""} 

1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
    from (irb):22
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

Then I tried simplifying the data i.e.

然后我尝试简化数据即

"name","age","email"
"jignesh","30","[email protected]"

however still I am getting the same error:

但是我仍然遇到相同的错误:

      1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
  CSV::MalformedCSVError: Illegal quoting in line 1.
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
      from (irb):23
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'

Again I tried simplifying the data like this:

我再次尝试像这样简化数据:

name,age,email
jignesh,30,[email protected]

and it works.See the output below:

它有效。见下面的输出:

  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
  name
  age
  email
  jignesh
  30
  [email protected]
   => nil 

But I will be receiving the CSV files having quoted data so removing quotes solution is not actually I am looking for.I am unable to figure out what is causing the error: CSV::MalformedCSVError: Illegal quoting in line 1.in my earlier examples.

但是我将收到包含引用数据的 CSV 文件,因此删除引用解决方案实际上并不是我正在寻找的。我无法找出导致错误的原因:CSV::MalformedCSVError: Illegal quoting in line 1.在我之前的示例中.

I have verified that in the CSV there are no leading/trailing spaces by enabling "Show whitespace characters" and "Show Line Endings" in my text editor.Also I have verified the encoding using following.

我已经通过在文本编辑器中启用“显示空白字符”和“显示行尾”来验证在 CSV 中没有前导/尾随空格。此外,我还使用以下内容验证了编码。

  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
  => #<Encoding:UTF-8> 

Note: I tried using CSV.read too but same error with that method.

注意:我也尝试使用 CSV.read 但该方法出现相同的错误。

Can anybody please help me getting out of the problem and make me understand where it is going wrong?

任何人都可以帮助我解决问题并让我了解问题出在哪里吗?

=====================

======================

I just found following post at: http://www.ruby-forum.com/topic/448070and tried following:

我刚刚在以下位置找到了以下帖子:http: //www.ruby-forum.com/topic/448070并尝试了以下操作:

  file_data = file.read
  file_data.gsub!('"', "'")
  arr_of_arrs = CSV.parse(file_data)

  arr_of_arrs.each do |arr|
    Rails.logger.debug "=======#{arr}"
  end

and got the following output:

并得到以下输出:

   =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
    =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]

which messed up reading the data properly as the default col_sepused is a comma character. However I tried using quote_charoption like this:

由于使用的默认col_sep是逗号字符,因此无法正确读取数据。但是我尝试使用quote_char这样的选项:

  arr_of_arrs = CSV.parse(file_data, :quote_char => "'")

but it ended up the following error:

但它最终出现了以下错误:

   CSV::MalformedCSVError (Illegal quoting in line 1.):

Thanks, Jignesh

谢谢,吉格尼什

回答by Vadym Tyemirov

quote_chars = %w(" | ~ ^ & *)
begin
  @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift)
rescue CSV::MalformedCSVError
  quote_chars.empty? ? raise : retry 
end

it's not perfect but it works most of the time.

它并不完美,但大部分时间都有效。

N.B. CSV.parsetakes the same parameters as CSV.read, so either a file or data from memory can be used

NBCSV.parse采用与 相同的参数CSV.read,因此可以使用文件或内存中的数据

回答by theUtherSide

Anand, thank you for the encoding suggestion. This solved the illegal quoting problem for me.

Anand,谢谢你的编码建议。这为我解决了非法引用问题。

Note: If you want the iterator to skip over the header row add headers: :first_row, like so:

注意:如果您希望迭代器跳过标题行 add headers: :first_row,如下所示:

CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row)

回答by user2391694

I just had an issue like this and discovered that CSV does not like spaces between the col-sep and the quote character. Once I removed those everything went fine. So I had:

我刚刚遇到了这样的问题,发现 CSV 不喜欢 col-sep 和引号字符之间的空格。一旦我删除了那些一切都很好。所以我有:

12,  "N",  12, "Pacific/Majuro"

but once I gsubed out the spaces using

但是一旦我使用

.gsub(/,\s+\"/,',\"')

resulting in

导致

12,"N",  12,"Pacific/Majuro"

everything went fine.

一切顺利。

回答by Gilg Him

from this threadpass the option :quote_char => "|"

这个线程传递选项:quote_char => "|"

CSV.read(filename, :quote_char => "|")

回答by Elena Tanasoiu

I had a problem with the trademark character that was throwing this error.

我遇到了引发此错误的商标字符的问题。

The trademark character translates to \"! in UTF-8, so it was the open-ended quotation symbol that was throwing the error. So I did this:

商标字符转换为 \"! 在 UTF-8 中,所以它是引发错误的开放式引号。所以我这样做了:

.gsub!("\"!", "")

.gsub!("\"!", "")

And then I tried creating my CSV object and it worked fine.

然后我尝试创建我的 CSV 对象并且它工作正常。

回答by Donato

I attempted to read the file and get a string and then parse thes tring into a CSV table, but received an exception:

我试图读取文件并获取一个字符串,然后将这些字符串解析为一个 CSV 表,但收到一个异常:

CSV.read(File.read('file.csv'), headers: true)
CSV::MalformedCSVError: Unclosed quoted field on line 1794.

None of the answers provided here worked for me. In fact, the one with highest votes was taking so long to parse that eventually I terminated the execution. It most likely was raising many exceptions, and that time is costly on a large file.

这里提供的答案都不适合我。事实上,得票最高的那个解析时间太长了,最终我终止了执行。它很可能引发了许多异常,而大文件的时间成本很高。

Even more problematic, the error is not so helpful, since it is a large CSV file. Where exactly is line 1794? I opened up the file in LibreOffice which opened without any problems. Line 1794 was the last row of data of the csv file. So apparently the problem had to do with the end of the CSV file. I decided to inspect the contents as a string with File.read. I noticed the string ended with a carriage return character:

更有问题的是,该错误并没有太大帮助,因为它是一个大型 CSV 文件。第 1794 行到底在哪里?我在 LibreOffice 中打开了文件,打开时没有任何问题。第 1794 行是 csv 文件的最后一行数据。显然,问题与 CSV 文件的结尾有关。我决定使用 File.read 检查内容作为字符串。我注意到字符串以回车符结尾:

,\"\"\r

I decided to use chomp and remove the carriage return at the end of file. Note if $/ has not been changed from the default Ruby record separator, then chomp also removes carriage return characters (that is it will remove \n, \r, and \r\n).

我决定使用 chomp 并删除文件末尾的回车符。请注意,如果 $/ 未从默认的 Ruby 记录分隔符更改,则 chomp 还会删除回车符(即它将删除 \n、\r 和 \r\n)。

CSV.parse(File.read('file.csv' ).chomp, headers: true)
 => #<CSV::Table mode:col_or_row row_count:1794>

And it worked. The problem was the \r character at the end of the file.

它奏效了。问题是文件末尾的 \r 字符。

回答by Ravindra

Try this hint:

试试这个提示:

  1. Open your CSV file in a text editor
  2. Select the whole file and copy it
  3. Open a new text file
  4. Paste the CSV data into the new file and Save the new file
  5. Import your new CSV file
  1. 在文本编辑器中打开您的 CSV 文件
  2. 选择整个文件并复制它
  3. 打开一个新的文本文件
  4. 将 CSV 数据粘贴到新文件中并保存新文件
  5. 导入新的 CSV 文件