Python Pandas 不读取 csv 文件的第一行

Question

提问by Tom

I have a problem with reading CSV(or txt file) on pandas module Because numpy's loadtxt function takes too much time, I decided to use pandas read_csv instead.

我在 pandas 模块上读取 CSV（或 txt 文件）时遇到问题由于 numpy 的 loadtxt 函数需要太多时间，我决定改用 pandas read_csv。

I want to make a numpy array from txt file with four columns separated by space, and has very large number of rows (like, 256^3. In this example, it is 64^3).

我想从 txt 文件中创建一个由空格分隔的四列的 numpy 数组，并且有非常多的行（例如，256^3。在这个例子中，它是 64^3）。

The problem is that I don't know why but it seems that pandas's read_csv always skips the first line (first row) of the csv (txt) file, resulting one less data.

问题是我不知道为什么，但似乎熊猫的 read_csv 总是跳过 csv (txt) 文件的第一行（第一行），导致数据减少。

here is the code.

这是代码。

from __future__ import division
import numpy as np
import pandas as pd
ngridx = 4
ngridy = 4
ngridz = 4
size = ngridx*ngridy*ngridz
f = np.zeros((size,4))
a = np.arange(size)
f[:, 0] = np.floor_divide(a, ngridy*ngridz)
f[:, 1] = np.fmod(np.floor_divide(a, ngridz), ngridy)
f[:, 2] = np.fmod(a, ngridz)
f[:, 3] = np.random.rand(size)
print f[0]
np.savetxt('Testarray.txt',f,fmt='%6.16f')
g = pd.read_csv('Testarray.txt',delimiter=' ').values
print g[0]
print len(g[:,3])

f[0] and g[0] that are displayed in the output have to match but it doesn't, indicating that pandas is skipping the first line of the Testarray.txt. Also, length of loaded file gis less than the length of the array f.

输出中显示的 f[0] 和 g[0] 必须匹配，但事实并非如此，这表明 pandas 正在跳过Testarray.txt. 此外，加载文件g的长度小于数组的长度f。

I need help.

我需要帮助。

Thanks in advance.

提前致谢。

Answer 1

采纳答案by unutbu

By default, pd.read_csvuses header=0(when the namesparameter is also not specified) which means the first (i.e. 0th-indexed) line is interpreted as column names.

默认情况下，pd.read_csv使用header=0（当names参数也未指定时）这意味着第一行（即第 0 索引）被解释为列名。

If your data has no header, then use

如果您的数据没有标题，则使用

pd.read_csv(..., header=None)

For example,

例如，

import io
import sys
import pandas as pd
if sys.version_info.major == 3:
    # Python3
    StringIO = io.StringIO 
else:
    # Python2
    StringIO = io.BytesIO

text = '''\
1 2 3
4 5 6
'''

print(pd.read_csv(StringIO(text), sep=' '))

Without header, the first line, 1 2 3, sets the column names:

没有header，第一行1 2 3，设置列名：

   1  2  3
0  4  5  6

With header=None, the first line is treated as data:

使用header=None，第一行被视为数据：

print(pd.read_csv(StringIO(text), sep=' ', header=None))

prints

印刷

   0  1  2
0  1  2  3
1  4  5  6

Answer 2

回答by RustProof Labs

If your file doesn't have a header row you need to tell Pandas so by using header=None in your call to pd.read_csv().

如果您的文件没有标题行，您需要通过在调用 pd.read_csv() 时使用 header=None 来告诉 Pandas。

Python Pandas 不读取 csv 文件的第一行

提问by Tom

采纳答案by unutbu

回答by RustProof Labs

相关推荐

最近更新

标签

Python Pandas 不读取 csv 文件的第一行

提问by Tom

采纳答案by unutbu

回答by RustProof Labs

相关推荐

Python 如何使用熊猫将图例置于情节之外

如何为 Python 3.4 安装 OpenCV？

Python 将下拉菜单中的值传递给 Flask 模板

Python 导入错误：没有名为“加密”的模块

相关推荐

最近更新

标签