pandas.errors.ParserError:错误可能是由于使用多字符分隔符时忽略了引号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53066229/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:07:10  来源:igfitidea点击:

pandas.errors.ParserError: Error could possibly be due to quotes being ignored when a multi-char delimiter is used

pythonpandascsvparsing

提问by dark horse

I am getting a ParserError when I am trying to read a csv file using Pandas. Given below is the error and the data set that threw this error.

当我尝试使用 Pandas 读取 csv 文件时出现 ParserError。下面给出了错误和引发此错误的数据集。

pandas.errors.ParserError: Expected 10 fields in line 8, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

Given below is the line 8 that has this error

下面给出的是有这个错误的第 8 行

10/29/18 10:20,85505306,    Scott,20181029102023-file.csv,  22.49,-12.18,CITY,,12:15.0,51:00.0,ABCD,9898,320,D231

I am reading the csv using the below command:

我正在使用以下命令读取 csv:

df.to_csv('file.csv'), index = False)

Sample output of the csv file:

csv 文件的示例输出:

File_Received_Time  Label1  City    FileName    Label2  Label3  State   Unnamed: 12 cTimestamp  dTimestamp  Label4  Label5  Label6  Label7  Label8
10/29/18 10:20  56776   Paris   file1.csv   29  29  IL      29-10-2018 04:11:11     COL06   620 398 516 451
10/29/18 10:20  46069   Hongkong    file2.csv   61  58  VA      29-10-2018 04:03:17 28-10-2018 05:58:00 COL06   576 645 349 374
10/29/18 10:20  47240   Sydney  file3.csv   43  42  IL      29-10-2018 04:12:46     COL06   534 2047    56831   372
10/29/18 10:20  47432   NewYork file4.csv   55  61  OH          28-10-2018 09:01:00 COL06   514 2354    640 633
10/29/18 10:20  41794   London  file5.csv   39  29          29-10-2018 04:12:46 28-10-2018 09:01:00 COL06   470 2354    56831   550
10/29/18 10:20  49643   LA  file6.csv   55  43  TX      29-10-2018 04:05:18     COL06   523 2301    53942   403
10/29/18 10:20  54700   Shangai file7.csv   37  29  AZ      29-10-2018 04:12:15 28-10-2018 12:51:00 COL06   569 2683    53642   538
10/29/18 10:20  37134   Singapore   file8.csv   53  62  AZ      29-10-2018 04:09:16     COL06   560 391 54541   542
10/29/18 10:20  51144   Taiwan  file9.csv   43  33  TX      29-10-2018 04:12:15     COL06   469 472 458 481

回答by Mayank Porwal

I am able to read the error record you pasted above:

我能够阅读您在上面粘贴的错误记录:

For reading a csv through pandas, use read_csv:

要通过 Pandas 读取 csv,请使用read_csv

I pasted the error record in a csv:

我将错误记录粘贴到了一个csv

mayankp@mayank:~/Documents cat t1.csv 
10/29/18 10:20,85505306,    Scott,20181029102023-file.csv,  22.49,-12.18,CITY,,12:15.0,51:00.0,ABCD,9898,320,D231

Now, I read this in pandas like below:

现在,我在Pandas中阅读了如下内容:

In [114]: df = pd.read_csv('/home/mayankp/Documents/t1.csv', header=None)

In [115]: df
Out[115]: 
               0         1          2                        3      4      5     6   7        8        9     10    11   12    13
0  10/29/18 10:20  85505306      Scott  20181029102023-file.csv  22.49 -12.18  CITY NaN  12:15.0  51:00.0  ABCD  9898  320  D231

It works fine. Let me know if this helps.

它工作正常。如果这有帮助,请告诉我。

回答by TubasPandas

I have had the same error message. I have removed double quotes from the file and that has solved the problem. I have used the below line in the terminal:

我有同样的错误信息。我已经从文件中删除了双引号,这就解决了问题。我在终端中使用了以下行:

cat merged.csv | tr “”” “o” > merged.tsv

猫合并.csv | tr “”” “o” > 合并.tsv

Hope that it helps.

希望它有帮助。

回答by ryolait

So,

所以,

  • You are using to_csvinstead of read_csv. See Mayank Porwal comment & answer.
  • Your data may not be properly formatted. CSV means Comma Separated Values, so separe them with commas before using read_csv(not sure of the dataset you use in your own tests, your question is misleading on that point).
  • For the core problem, carefully check the number of fields you have on each row. You should have the same number on each row. This may explain why you get that error.
  • 您正在使用to_csv而不是read_csv. 请参阅 Mayank Porwal 评论和回答。
  • 您的数据格式可能不正确。CSV 表示逗号分隔值,因此在使用前用逗号分隔它们read_csv(不确定您在自己的测试中使用的数据集,您的问题在这一点上具有误导性)。
  • 对于核心问题,请仔细检查每行的字段数。每行应该有相同的数字。这可以解释为什么您会收到该错误。