pandas 熊猫读取没有标题或索引的数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50142569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:31:30  来源:igfitidea点击:

Pandas read data without header or index

pythonpandascsvnumpy

提问by Василий Масов

Here is the .csvfile :

这是.csv文件:

0   0   1   1   1   0   1   1   0   1   1   1   1
0   1   1   0   1   0   1   1   0   1   0   0   1
0   0   1   1   0   0   1   1   1   0   1   1   1
0   1   1   1   1   1   1   1   1   1   1   1   2
0   1   1   1   0   1   1   1   1   1   1   1   1
0   0   0   1   1   1   0   1   0   0   0   1   1
0   0   0   0   1   1   0   0   1   0   1   0   2
0   1   1   0   1   1   1   1   0   1   1   1   1
0   0   1   0   0   0   0   0   0   1   1   0   1
0   1   1   1   0   1   1   0   0   0   0   1   1

where the first column must be indices like (0,1,2,3,4 ...)but due to some reasons they are zeros. Is there any way to make them normal when reading the csv file with pandas.read_csv ?

其中第一列必须是索引,(0,1,2,3,4 ...)但由于某些原因它们为零。使用 pandas.read_csv 读取 csv 文件时有什么方法可以使它们正常吗?

i use

我用

df = pd.read_csv(file,delimiter='\t',header=None,names=[1,2,3,4,5,6,7,8,9,10,11,12]) 

and getting something like:

并得到类似的东西:

    1   2   3   4   5   6   7   8   9   10  11  12
0   0   1   1   1   0   1   1   0   1   1   1   1
0   1   1   0   1   0   1   1   0   1   0   0   1
0   0   1   1   0   0   1   1   1   0   1   1   1
0   1   1   1   1   1   1   1   1   1   1   1   2
0   1   1   1   0   1   1   1   1   1   1   1   1
0   0   0   1   1   1   0   1   0   0   0   1   1
0   0   0   0   1   1   0   0   1   0   1   0   2
0   1   1   0   1   1   1   1   0   1   1   1   1
0   0   1   0   0   0   0   0   0   1   1   0   1
0   1   1   1   0   1   1   0   0   0   0   1   1

and it's nearly i need, but first column (indices) is still zeros. Can pandas for example ignore this first column of zeros and automatically generate new indices to get this:

它几乎是我需要的,但第一列(索引)仍然为零。例如,pandas 可以忽略第一列零并自动生成新索引以获取此信息:

  0 1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 0 1 1 0 0 0 1 1  1  0  1
1 0 1 0 1 1 0 0 0 1 1  1  1  2
2 0 1 1 1 0 0 1 1 1 1  1  1  2

采纳答案by cs95

Why fuss over read_csv? Use np.loadtxt:

为什么要大惊小怪read_csv?使用np.loadtxt

pd.DataFrame(np.loadtxt(file, dtype=int))

   0   1   2   3   4   5   6   7   8   9   10  11  12
0   0   0   1   1   1   0   1   1   0   1   1   1   1
1   0   1   1   0   1   0   1   1   0   1   0   0   1
2   0   0   1   1   0   0   1   1   1   0   1   1   1
3   0   1   1   1   1   1   1   1   1   1   1   1   2
4   0   1   1   1   0   1   1   1   1   1   1   1   1
5   0   0   0   1   1   1   0   1   0   0   0   1   1
6   0   0   0   0   1   1   0   0   1   0   1   0   2
7   0   1   1   0   1   1   1   1   0   1   1   1   1
8   0   0   1   0   0   0   0   0   0   1   1   0   1
9   0   1   1   1   0   1   1   0   0   0   0   1   1

The default delimiter is whitespace, and no headers/indexes are read in by default. Column types are also not inferred, since the dtypeis specified to be int. All in all, this is a very succinct and powerful alternative.

默认分隔符是空格,默认情况下不读入任何标题/索引。也不会推断列类型,因为dtype被指定为int。总而言之,这是一个非常简洁和强大的替代方案。

回答by rafaelc

You might want index_col=False

你可能想要 index_col=False

df = pd.read_csv(file,delimiter='\t', 
                 header=None, 
                 index_col=False) 

From the Docs,

文档中

If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to notuse the first column as the index

如果你有一个格式错误的文件,每行末尾都有分隔符,你可以考虑 index_col=False 强制Pandas使用第一列作为索引