Pandas read_csv,读取缺少标题元素的csv文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36828348/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas read_csv, reading a csv file with a missing header element
提问by Eduardo Vieira
I'm trying to import a csv file with pandas.read_csv. The file is as follows:
我正在尝试使用 pandas.read_csv 导入一个 csv 文件。该文件如下:
"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"
in a first attempt I ran:
在第一次尝试中,我跑了:
data = pd.read_csv('broken.csv')
and I got:
我得到了:
COL_A COL_B COL_C
ROW1COLA ROW1COLB ROW1COLC ROW1COLD
ROW2COLA ROW2COLB ROW2COLC ROW2COLD
ROW3COLA ROW3COLB ROW3COLC ROW3COLD
ROW4COLA ROW4COLB ROW4COLC ROW4COLD
ROW5COLA ROW5COLB ROW5COLC ROW5COLD
ROW6COLA ROW6COLB ROW6COLC ROW6COLD
ROW7COLA ROW7COLB ROW7COLC ROW7COLD
Setting index_col=False
设置 index_col=False
data = pd.read_csv('broken.csv',index_col=False)
i got
我有
COL_A COL_B COL_C
0 ROW1COLA ROW1COLB ROW1COLC
1 ROW2COLA ROW2COLB ROW2COLC
2 ROW3COLA ROW3COLB ROW3COLC
3 ROW4COLA ROW4COLB ROW4COLC
4 ROW5COLA ROW5COLB ROW5COLC
5 ROW6COLA ROW6COLB ROW6COLC
6 ROW7COLA ROW7COLB ROW7COLC
if I add prefix = 'X'
如果我添加前缀 = 'X'
data = pd.read_csv('broken.csv',index_col=False,prefix='X')
i get
我明白了
COL_A COL_B COL_C
0 ROW1COLA ROW1COLB ROW1COLC
1 ROW2COLA ROW2COLB ROW2COLC
2 ROW3COLA ROW3COLB ROW3COLC
3 ROW4COLA ROW4COLB ROW4COLC
4 ROW5COLA ROW5COLB ROW5COLC
5 ROW6COLA ROW6COLB ROW6COLC
6 ROW7COLA ROW7COLB ROW7COLC
Same with read_table
与 read_table 相同
data = pd.read_table('broken.csv',index_col=True,sep=',')
I want to know if there is any way that pandas automatically assigns a header and take values of the missing header column
我想知道是否有任何方法可以让 Pandas 自动分配标题并获取缺少的标题列的值
采纳答案by jezrael
I think you can use read_csv
with parameters header=0
which first row set to columns and then is overwritten by parameter names
to custom column names. Parameter sep=','
is omited, because it is by default:
我想你可以使用read_csv
带参数header=0
其中第一行设置为列,然后是覆盖由参数names
自定义列名。参数sep=','
被省略,因为它默认是:
import pandas as pd
import io
temp=u'''"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=0, names=['a','b','c','d'])
print df
a b c d
0 ROW1COLA ROW1COLB ROW1COLC ROW1COLD
1 ROW2COLA ROW2COLB ROW2COLC ROW2COLD
2 ROW3COLA ROW3COLB ROW3COLC ROW3COLD
3 ROW4COLA ROW4COLB ROW4COLC ROW4COLD
4 ROW5COLA ROW5COLB ROW5COLC ROW5COLD
5 ROW6COLA ROW6COLB ROW6COLC ROW6COLD
6 ROW7COLA ROW7COLB ROW7COLC ROW7COLD
More generic solution with parameters header=None
for no columns names from header with skiprows=[0]
for skip first row with missing name of last column:
更通用的解决方案,其参数header=None
为标题中没有列名的参数,skiprows=[0]
用于跳过第一行但最后一列的名称缺失:
import pandas as pd
import io
temp=u'''"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None, skiprows=[0])
print df
0 1 2 3
0 ROW1COLA ROW1COLB ROW1COLC ROW1COLD
1 ROW2COLA ROW2COLB ROW2COLC ROW2COLD
2 ROW3COLA ROW3COLB ROW3COLC ROW3COLD
3 ROW4COLA ROW4COLB ROW4COLC ROW4COLD
4 ROW5COLA ROW5COLB ROW5COLC ROW5COLD
5 ROW6COLA ROW6COLB ROW6COLC ROW6COLD
6 ROW7COLA ROW7COLB ROW7COLC ROW7COLD
回答by MaxU
First column(s) without names/headers are considered as index column(s).
没有名称/标题的第一列被视为索引列。
You should also use index_col
parameter properly:
您还应该index_col
正确使用参数:
data = pd.read_table('broken.csv',index_col=[0],sep=',')
if your first column contains data instead of index, you can skip first row, specify names for your columns, and instruct read_csv
that you don't want to read headers:
如果您的第一列包含数据而不是索引,您可以跳过第一行,为您的列指定名称,并指示read_csv
您不想读取标题:
cols = ['col1','col2','col3','col4']
data = pd.read_table('broken.csv',sep=',', skiprows=[0], header=None, names=cols)