Python Pandas 不读取 csv 文件的第一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28382735/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:11:42  来源:igfitidea点击:

Python Pandas does not read the first row of csv file

pythonnumpypandasload

提问by Tom

I have a problem with reading CSV(or txt file) on pandas module Because numpy's loadtxt function takes too much time, I decided to use pandas read_csv instead.

我在 pandas 模块上读取 CSV(或 txt 文件)时遇到问题由于 numpy 的 loadtxt 函数需要太多时间,我决定改用 pandas read_csv。

I want to make a numpy array from txt file with four columns separated by space, and has very large number of rows (like, 256^3. In this example, it is 64^3).

我想从 txt 文件中创建一个由空格分隔的四列的 numpy 数组,并且有非常多的行(例如,256^3。在这个例子中,它是 64^3)。

The problem is that I don't know why but it seems that pandas's read_csv always skips the first line (first row) of the csv (txt) file, resulting one less data.

问题是我不知道为什么,但似乎熊猫的 read_csv 总是跳过 csv (txt) 文件的第一行(第一行),导致数据减少。

here is the code.

这是代码。

from __future__ import division
import numpy as np
import pandas as pd
ngridx = 4
ngridy = 4
ngridz = 4
size = ngridx*ngridy*ngridz
f = np.zeros((size,4))
a = np.arange(size)
f[:, 0] = np.floor_divide(a, ngridy*ngridz)
f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy)
f[:, 2] = np.fmod(a, ngridz)
f[:, 3] = np.random.rand(size)
print f[0]
np.savetxt('Testarray.txt',f,fmt='%6.16f')
g = pd.read_csv('Testarray.txt',delimiter=' ').values
print g[0]
print len(g[:,3])

f[0] and g[0] that are displayed in the output have to match but it doesn't, indicating that pandas is skipping the first line of the Testarray.txt. Also, length of loaded file gis less than the length of the array f.

输出中显示的 f[0] 和 g[0] 必须匹配,但事实并非如此,这表明 pandas 正在跳过Testarray.txt. 此外,加载文件g的长度小于数组的长度f

I need help.

我需要帮助。

Thanks in advance.

提前致谢。

采纳答案by unutbu

By default, pd.read_csvuses header=0(when the namesparameter is also not specified) which means the first (i.e. 0th-indexed) line is interpreted as column names.

默认情况下,pd.read_csv使用header=0(当names参数也未指定时)这意味着第一行(即第 0 索引)被解释为列名。

If your data has no header, then use

如果您的数据没有标题,则使用

pd.read_csv(..., header=None)


For example,

例如,

import io
import sys
import pandas as pd
if sys.version_info.major == 3:
    # Python3
    StringIO = io.StringIO 
else:
    # Python2
    StringIO = io.BytesIO

text = '''\
1 2 3
4 5 6
'''

print(pd.read_csv(StringIO(text), sep=' '))

Without header, the first line, 1 2 3, sets the column names:

没有header,第一行1 2 3,设置列名:

   1  2  3
0  4  5  6

With header=None, the first line is treated as data:

使用header=None,第一行被视为数据:

print(pd.read_csv(StringIO(text), sep=' ', header=None))

prints

印刷

   0  1  2
0  1  2  3
1  4  5  6

回答by RustProof Labs

If your file doesn't have a header row you need to tell Pandas so by using header=None in your call to pd.read_csv().

如果您的文件没有标题行,您需要通过在调用 pd.read_csv() 时使用 header=None 来告诉 Pandas。