在 Python Pandas DataFrame 中保留列顺序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15653688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Preserving column order in Python Pandas DataFrame
提问by Hernan
Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code
有没有办法在使用 Python Pandas 读取和写入时保留 csv 文件中列的顺序?例如,在这段代码中
import pandas as pd
data = pd.read_csv(filename)
data.to_csv(filename)
the output files might be different because the columns are not preserved.
输出文件可能会有所不同,因为未保留列。
采纳答案by CnrL
There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:
当前版本的 Pandas ('0.11.0') 中似乎存在一个错误,这意味着 Matti John 的回答将不起作用。如果您指定用于写入文件的列,它们将按字母顺序写入,但只需根据 cols 中的列表重新标记。例如,这段代码:
import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])
results in this (incorrect) output:
导致这个(不正确的)输出:
b a c
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12
You can check which version of pandas you have installed by executing:
您可以通过执行以下命令来检查您安装了哪个版本的 Pandas:
pandas.version.version
Documentation for to_csv is here
to_csv 的文档在这里
Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):
实际上,这似乎是一个已知错误,将在即将发布的版本 (0.11.1) 中修复:
https://github.com/pydata/pandas/issues/3489
https://github.com/pydata/pandas/issues/3489
UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:
更新:仍然没有新版本的熊猫,但这里描述了一种解决方法,它不需要使用不同版本的熊猫:
github.com/pydata/pandas/issues/3454
github.com/pydata/pandas/issues/3454
So changing the last line in the block of code above to the following will work correctly:
因此,将上面代码块中的最后一行更改为以下内容将正常工作:
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')
UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.
更新似乎参数“cols”已重命名为“columns”,并且参数“engine”在最新版本的熊猫中已弃用(不再可用)。此外,此错误已在 0.19.0 版中修复。
回答by Matti John
The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columnskeyword argument in to_csv.
在读取然后写入这样的 csv 文件时,通常应该保留列顺序,但是如果由于某种原因它们不是您想要的顺序,您可以columns在to_csv.
For example, if you have a csv with columns a, b, c, d:
例如,如果您有一个包含 a、b、c、d 列的 csv:
data = pd.read_csv(filename)
data.to_csv(filename, columns=['a', 'b', 'c', 'd'])
回答by Lawrence Chernin
Another workaround is to do this:
另一种解决方法是这样做:
import pandas as pd
data = pd.read_csv(filename)
data2 = df[['A','B','C']] #put 'A' 'B' 'C' in the desired order
data2.to_csv(filename)

