在 Python Pandas DataFrame 中保留列顺序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15653688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:38:20  来源:igfitidea点击:

Preserving column order in Python Pandas DataFrame

pythonpandas

提问by Hernan

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

有没有办法在使用 Python Pandas 读取和写入时保留 csv 文件中列的顺序?例如,在这段代码中

import pandas as pd

data = pd.read_csv(filename)
data.to_csv(filename)

the output files might be different because the columns are not preserved.

输出文件可能会有所不同,因为未保留列。

采纳答案by CnrL

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

当前版本的 Pandas ('0.11.0') 中似乎存在一个错误,这意味着 Matti John 的回答将不起作用。如果您指定用于写入文件的列,它们将按字母顺序写入,但只需根据 cols 中的列表重新标记。例如,这段代码:

import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

results in this (incorrect) output:

导致这个(不正确的)输出:

    b   a   c
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12

You can check which version of pandas you have installed by executing:

您可以通过执行以下命令来检查您安装了哪个版本的 Pandas:

pandas.version.version

Documentation for to_csv is here

to_csv 的文档在这里

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

实际上,这似乎是一个已知错误,将在即将发布的版本 (0.11.1) 中修复:

https://github.com/pydata/pandas/issues/3489

https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

更新:仍然没有新版本的熊猫,但这里描述了一种解决方法,它不需要使用不同版本的熊猫:

github.com/pydata/pandas/issues/3454

github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:

因此,将上面代码块中的最后一行更改为以下内容将正常工作:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

更新似乎参数“cols”已重命名为“columns”,并且参数“engine”在最新版本的熊猫中已弃用(不再可用)。此外,此错误已在 0.19.0 版中修复。

回答by Matti John

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columnskeyword argument in to_csv.

在读取然后写入这样的 csv 文件时,通常应该保留列顺序,但是如果由于某种原因它们不是您想要的顺序,您可以columnsto_csv.

For example, if you have a csv with columns a, b, c, d:

例如,如果您有一个包含 a、b、c、d 列的 csv:

data = pd.read_csv(filename)
data.to_csv(filename, columns=['a', 'b', 'c', 'd'])

回答by Lawrence Chernin

Another workaround is to do this:

另一种解决方法是这样做:

import pandas as pd
data = pd.read_csv(filename)
data2 = df[['A','B','C']]  #put 'A' 'B' 'C' in the desired order
data2.to_csv(filename)