pandas 熊猫读取没有标题或索引的数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50142569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas read data without header or index
提问by Василий Масов
Here is the .csvfile :
这是.csv文件:
0 0 1 1 1 0 1 1 0 1 1 1 1
0 1 1 0 1 0 1 1 0 1 0 0 1
0 0 1 1 0 0 1 1 1 0 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 2
0 1 1 1 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 0 1 0 0 0 1 1
0 0 0 0 1 1 0 0 1 0 1 0 2
0 1 1 0 1 1 1 1 0 1 1 1 1
0 0 1 0 0 0 0 0 0 1 1 0 1
0 1 1 1 0 1 1 0 0 0 0 1 1
where the first column must be indices like (0,1,2,3,4 ...)
but due to some reasons they are zeros. Is there any way to make them normal when reading the csv file with pandas.read_csv ?
其中第一列必须是索引,(0,1,2,3,4 ...)
但由于某些原因它们为零。使用 pandas.read_csv 读取 csv 文件时有什么方法可以使它们正常吗?
i use
我用
df = pd.read_csv(file,delimiter='\t',header=None,names=[1,2,3,4,5,6,7,8,9,10,11,12])
and getting something like:
并得到类似的东西:
1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 1 1 0 1 1 0 1 1 1 1
0 1 1 0 1 0 1 1 0 1 0 0 1
0 0 1 1 0 0 1 1 1 0 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 2
0 1 1 1 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 0 1 0 0 0 1 1
0 0 0 0 1 1 0 0 1 0 1 0 2
0 1 1 0 1 1 1 1 0 1 1 1 1
0 0 1 0 0 0 0 0 0 1 1 0 1
0 1 1 1 0 1 1 0 0 0 0 1 1
and it's nearly i need, but first column (indices) is still zeros. Can pandas for example ignore this first column of zeros and automatically generate new indices to get this:
它几乎是我需要的,但第一列(索引)仍然为零。例如,pandas 可以忽略第一列零并自动生成新索引以获取此信息:
0 1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 0 1 1 0 0 0 1 1 1 0 1
1 0 1 0 1 1 0 0 0 1 1 1 1 2
2 0 1 1 1 0 0 1 1 1 1 1 1 2
采纳答案by cs95
Why fuss over read_csv
? Use np.loadtxt
:
为什么要大惊小怪read_csv
?使用np.loadtxt
:
pd.DataFrame(np.loadtxt(file, dtype=int))
0 1 2 3 4 5 6 7 8 9 10 11 12
0 0 0 1 1 1 0 1 1 0 1 1 1 1
1 0 1 1 0 1 0 1 1 0 1 0 0 1
2 0 0 1 1 0 0 1 1 1 0 1 1 1
3 0 1 1 1 1 1 1 1 1 1 1 1 2
4 0 1 1 1 0 1 1 1 1 1 1 1 1
5 0 0 0 1 1 1 0 1 0 0 0 1 1
6 0 0 0 0 1 1 0 0 1 0 1 0 2
7 0 1 1 0 1 1 1 1 0 1 1 1 1
8 0 0 1 0 0 0 0 0 0 1 1 0 1
9 0 1 1 1 0 1 1 0 0 0 0 1 1
The default delimiter is whitespace, and no headers/indexes are read in by default. Column types are also not inferred, since the dtype
is specified to be int
. All in all, this is a very succinct and powerful alternative.
默认分隔符是空格,默认情况下不读入任何标题/索引。也不会推断列类型,因为dtype
被指定为int
。总而言之,这是一个非常简洁和强大的替代方案。
回答by rafaelc
You might want index_col=False
你可能想要 index_col=False
df = pd.read_csv(file,delimiter='\t',
header=None,
index_col=False)
From the Docs,
从文档中,
If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to notuse the first column as the index
如果你有一个格式错误的文件,每行末尾都有分隔符,你可以考虑 index_col=False 强制Pandas不使用第一列作为索引