Pandas 解析 csv 错误 - 预期找到 1 个字段 9

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49632641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:24:39  来源:igfitidea点击:

Pandas parsing csv error - expected 1 fields found 9

pythonpython-3.xpandascsvdata-analysis

提问by Baalateja Kataru

I'm trying to parse from a .csv file:

我正在尝试从 .csv 文件解析:

planets = pd.read_csv("planets.csv", sep=',')

But I always end up with this error:

但我总是以这个错误告终:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 9

This is how the first few lines of my csv file look like:

这是我的 csv 文件的前几行的样子:

# This file was produced by the test
# Tue Apr  3 06:03:27 2018
#
# COLUMN pl_hostname:    Host Name
# COLUMN pl_discmethod:  Discovery Method
# COLUMN pl_pnum:        Number of Planets in System
# COLUMN pl_orbper:      Orbital Period [days]
# COLUMN pl_orbsmax:     Orbit Semi-Major Axis [AU])
# COLUMN st_dist:        Distance [pc]
# COLUMN st_teff:        Effective Temperature [K]
# COLUMN st_mass:        Stellar Mass [Solar mass] 
#
loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orbsmax,st_dist,st_teff,st_mass
1,11 Com,Radial Velocity,1,326.03000000,1.290000,110.62,4742.00,2.70
2,11 UMi,Radial Velocity,1,516.22000000,1.540000,119.47,4340.00,1.80
3,14 And,Radial Velocity,1,185.84000000,0.830000,76.39,4813.00,2.20
4,14 Her,Radial Velocity,1,1773.40000000,2.770000,18.15,5311.00,0.90
5,16 Cyg B,Radial Velocity,1,798.50000000,1.681000,21.41,5674.00,0.99
6,18 Del,Radial Velocity,1,993.30000000,2.600000,73.10,4979.00,2.30
7,1RXS J160929.1-210524,Imaging,1,,330.000000,145.00,4060.00,0.85

Edit:this is line 13:

编辑:这是第 13 行:

loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orbsmax,st_dist,st_teff,st_mass

Edit:Thanks to @Rakesh, Skipping the first 12 lines solved the problem

编辑:感谢@Rakesh,跳过前 12 行解决了问题

planets = pd.read_csv("planets.csv", sep=',', skiprows=12)

行星 = pd.read_csv("行星.csv", sep=',', skiprows=12)

回答by OriolAbril

The function pandas.read_csv()gets the number of columns and their names from the first line. By default it does not consider the option of the first lines being comments.

该函数pandas.read_csv()从第一行获取列数及其名称。默认情况下,它不考虑第一行是注释的选项。

What is happening is that pandas reads the first line, splits it and finds there is only one column, insetad of doing this split to the line 13 which is the first not commented line. To solve this, the argument commentcan be used.

发生的事情是pandas 读取第一行,拆分它并发现只有一列,将这个拆分插入到第 13 行,这是第一个未注释的行。为了解决这个问题,comment可以使用参数。

planets = pd.read_csv("planets.csv", comment='#')

Compared to using skiprows, this allows the same code to load the planets.csvfile even if the number of comment lines vary.

与使用 相比,即使注释行数不同skiprows,这也允许相同的代码加载planets.csv文件。

回答by Rakesh

Looks like you need skiprows. You can skip all the comments.

看起来你需要skiprows. 您可以跳过所有评论。

Ex:

前任:

planets = pd.read_csv("planets.csv", sep=',', skiprows=12)

回答by Bill Armstrong

I've gotten this to work when I couldn't figure out the exact cause of the error:

当我无法找出错误的确切原因时,我已经开始工作了:

planets = pd.read_csv('planets.csv', sep=',', error_bad_lines=False)

回答by Harry_pb

In addition to the above answer, if you got problem only with row 13th, you may skip it .

除了上面的答案,如果你只遇到第 13 行的问题,你可以跳过它。

pd.read_csv("plants.csv", skiprows = 12, header=None)

回答by bruckerrlb

I just ran the following code using the csv data you provided and it ran without issues. I ran the following below

我只是使用您提供的 csv 数据运行了以下代码,并且运行没有问题。我在下面运行了以下内容

import pandas as pd planets = pd.read_csv("planets.csv", sep=',') print(planets)

import pandas as pd planets = pd.read_csv("planets.csv", sep=',') print(planets)

With that being said, there could be a few issues.

话虽如此,可能会有一些问题。

Firstly, you could set the delimiter to sniffing sep=Noneto let pandas figure out what the delimiter is. You could also set headers=NoneSo it would look like:

首先,您可以将分隔符设置为嗅探sep=None,让Pandas找出分隔符是什么。你也可以设置headers=None所以它看起来像:

pd.read_csv("planets.csv", sep=None, headers=None)

pd.read_csv("planets.csv", sep=None, headers=None)

There could be an encoding issue. You could try setting encoding to some of these values to see if the error exists https://docs.python.org/3/library/codecs.html#standard-encodings

可能存在编码问题。您可以尝试将编码设置为其中一些值以查看错误是否存在https://docs.python.org/3/library/codecs.html#standard-encodings