Python Pandas 不读取 csv 文件的第一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28382735/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas does not read the first row of csv file
提问by Tom
I have a problem with reading CSV(or txt file) on pandas module Because numpy's loadtxt function takes too much time, I decided to use pandas read_csv instead.
我在 pandas 模块上读取 CSV(或 txt 文件)时遇到问题由于 numpy 的 loadtxt 函数需要太多时间,我决定改用 pandas read_csv。
I want to make a numpy array from txt file with four columns separated by space, and has very large number of rows (like, 256^3. In this example, it is 64^3).
我想从 txt 文件中创建一个由空格分隔的四列的 numpy 数组,并且有非常多的行(例如,256^3。在这个例子中,它是 64^3)。
The problem is that I don't know why but it seems that pandas's read_csv always skips the first line (first row) of the csv (txt) file, resulting one less data.
问题是我不知道为什么,但似乎熊猫的 read_csv 总是跳过 csv (txt) 文件的第一行(第一行),导致数据减少。
here is the code.
这是代码。
from __future__ import division
import numpy as np
import pandas as pd
ngridx = 4
ngridy = 4
ngridz = 4
size = ngridx*ngridy*ngridz
f = np.zeros((size,4))
a = np.arange(size)
f[:, 0] = np.floor_divide(a, ngridy*ngridz)
f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy)
f[:, 2] = np.fmod(a, ngridz)
f[:, 3] = np.random.rand(size)
print f[0]
np.savetxt('Testarray.txt',f,fmt='%6.16f')
g = pd.read_csv('Testarray.txt',delimiter=' ').values
print g[0]
print len(g[:,3])
f[0] and g[0] that are displayed in the output have to match but it doesn't, indicating that pandas is skipping the first line of the Testarray.txt
.
Also, length of loaded file g
is less than the length of the array f
.
输出中显示的 f[0] 和 g[0] 必须匹配,但事实并非如此,这表明 pandas 正在跳过Testarray.txt
. 此外,加载文件g
的长度小于数组的长度f
。
I need help.
我需要帮助。
Thanks in advance.
提前致谢。
采纳答案by unutbu
By default, pd.read_csv
uses header=0
(when the names
parameter is also not specified) which means the first (i.e. 0th-indexed) line is interpreted as column names.
默认情况下,pd.read_csv
使用header=0
(当names
参数也未指定时)这意味着第一行(即第 0 索引)被解释为列名。
If your data has no header, then use
如果您的数据没有标题,则使用
pd.read_csv(..., header=None)
For example,
例如,
import io
import sys
import pandas as pd
if sys.version_info.major == 3:
# Python3
StringIO = io.StringIO
else:
# Python2
StringIO = io.BytesIO
text = '''\
1 2 3
4 5 6
'''
print(pd.read_csv(StringIO(text), sep=' '))
Without header
, the first line, 1 2 3
, sets the column names:
没有header
,第一行1 2 3
,设置列名:
1 2 3
0 4 5 6
With header=None
, the first line is treated as data:
使用header=None
,第一行被视为数据:
print(pd.read_csv(StringIO(text), sep=' ', header=None))
prints
印刷
0 1 2
0 1 2 3
1 4 5 6
回答by RustProof Labs
If your file doesn't have a header row you need to tell Pandas so by using header=None in your call to pd.read_csv().
如果您的文件没有标题行,您需要通过在调用 pd.read_csv() 时使用 header=None 来告诉 Pandas。