pandas 熊猫读取没有标题或索引的数据

Question

提问by Василий Масов

Here is the .csvfile :

这是.csv文件：

0   0   1   1   1   0   1   1   0   1   1   1   1
0   1   1   0   1   0   1   1   0   1   0   0   1
0   0   1   1   0   0   1   1   1   0   1   1   1
0   1   1   1   1   1   1   1   1   1   1   1   2
0   1   1   1   0   1   1   1   1   1   1   1   1
0   0   0   1   1   1   0   1   0   0   0   1   1
0   0   0   0   1   1   0   0   1   0   1   0   2
0   1   1   0   1   1   1   1   0   1   1   1   1
0   0   1   0   0   0   0   0   0   1   1   0   1
0   1   1   1   0   1   1   0   0   0   0   1   1

where the first column must be indices like (0,1,2,3,4 ...)but due to some reasons they are zeros. Is there any way to make them normal when reading the csv file with pandas.read_csv ?

其中第一列必须是索引，(0,1,2,3,4 ...)但由于某些原因它们为零。使用 pandas.read_csv 读取 csv 文件时有什么方法可以使它们正常吗？

i use

我用

df = pd.read_csv(file,delimiter='\t',header=None,names=[1,2,3,4,5,6,7,8,9,10,11,12])

and getting something like:

并得到类似的东西：

    1   2   3   4   5   6   7   8   9   10  11  12
0   0   1   1   1   0   1   1   0   1   1   1   1
0   1   1   0   1   0   1   1   0   1   0   0   1
0   0   1   1   0   0   1   1   1   0   1   1   1
0   1   1   1   1   1   1   1   1   1   1   1   2
0   1   1   1   0   1   1   1   1   1   1   1   1
0   0   0   1   1   1   0   1   0   0   0   1   1
0   0   0   0   1   1   0   0   1   0   1   0   2
0   1   1   0   1   1   1   1   0   1   1   1   1
0   0   1   0   0   0   0   0   0   1   1   0   1
0   1   1   1   0   1   1   0   0   0   0   1   1

and it's nearly i need, but first column (indices) is still zeros. Can pandas for example ignore this first column of zeros and automatically generate new indices to get this:

它几乎是我需要的，但第一列（索引）仍然为零。例如，pandas 可以忽略第一列零并自动生成新索引以获取此信息：

  0 1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 0 1 1 0 0 0 1 1  1  0  1
1 0 1 0 1 1 0 0 0 1 1  1  1  2
2 0 1 1 1 0 0 1 1 1 1  1  1  2

Answer 1

采纳答案by cs95

Why fuss over read_csv? Use np.loadtxt:

为什么要大惊小怪read_csv？使用np.loadtxt：

pd.DataFrame(np.loadtxt(file, dtype=int))

   0   1   2   3   4   5   6   7   8   9   10  11  12
0   0   0   1   1   1   0   1   1   0   1   1   1   1
1   0   1   1   0   1   0   1   1   0   1   0   0   1
2   0   0   1   1   0   0   1   1   1   0   1   1   1
3   0   1   1   1   1   1   1   1   1   1   1   1   2
4   0   1   1   1   0   1   1   1   1   1   1   1   1
5   0   0   0   1   1   1   0   1   0   0   0   1   1
6   0   0   0   0   1   1   0   0   1   0   1   0   2
7   0   1   1   0   1   1   1   1   0   1   1   1   1
8   0   0   1   0   0   0   0   0   0   1   1   0   1
9   0   1   1   1   0   1   1   0   0   0   0   1   1

The default delimiter is whitespace, and no headers/indexes are read in by default. Column types are also not inferred, since the dtypeis specified to be int. All in all, this is a very succinct and powerful alternative.

默认分隔符是空格，默认情况下不读入任何标题/索引。也不会推断列类型，因为dtype被指定为int。总而言之，这是一个非常简洁和强大的替代方案。

Answer 2

回答by rafaelc

You might want index_col=False

你可能想要 index_col=False

df = pd.read_csv(file,delimiter='\t', 
                 header=None, 
                 index_col=False)

From the Docs,

从文档中，

If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to notuse the first column as the index

如果你有一个格式错误的文件，每行末尾都有分隔符，你可以考虑 index_col=False 强制Pandas不使用第一列作为索引

pandas 熊猫读取没有标题或索引的数据

提问by Василий Масов

采纳答案by cs95

回答by rafaelc

相关推荐

最近更新

标签

pandas 熊猫读取没有标题或索引的数据

提问by Василий Масов

采纳答案by cs95

回答by rafaelc

相关推荐

pandas 在 x 轴上带有索引的散点图表单数据框

迭代 Pandas 数据框的行

Pandas：从具有特定值的行下方开始读取 Excel 文件

如何根据日期时间索引对 Pandas Dataframe 进行切片

相关推荐

最近更新

标签