在 Python Pandas DataFrame 中保留列顺序

Question

提问by Hernan

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

有没有办法在使用 Python Pandas 读取和写入时保留 csv 文件中列的顺序？例如，在这段代码中

import pandas as pd

data = pd.read_csv(filename)
data.to_csv(filename)

the output files might be different because the columns are not preserved.

输出文件可能会有所不同，因为未保留列。

Answer 1

采纳答案by CnrL

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

当前版本的 Pandas ('0.11.0') 中似乎存在一个错误，这意味着 Matti John 的回答将不起作用。如果您指定用于写入文件的列，它们将按字母顺序写入，但只需根据 cols 中的列表重新标记。例如，这段代码：

import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

results in this (incorrect) output:

导致这个（不正确的）输出：

    b   a   c
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12

You can check which version of pandas you have installed by executing:

您可以通过执行以下命令来检查您安装了哪个版本的 Pandas：

pandas.version.version

Documentation for to_csv is here

to_csv 的文档在这里

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

实际上，这似乎是一个已知错误，将在即将发布的版本 (0.11.1) 中修复：

https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

更新：仍然没有新版本的熊猫，但这里描述了一种解决方法，它不需要使用不同版本的熊猫：

github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:

因此，将上面代码块中的最后一行更改为以下内容将正常工作：

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

更新似乎参数“cols”已重命名为“columns”，并且参数“engine”在最新版本的熊猫中已弃用（不再可用）。此外，此错误已在 0.19.0 版中修复。

Answer 2

回答by Matti John

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columnskeyword argument in to_csv.

在读取然后写入这样的 csv 文件时，通常应该保留列顺序，但是如果由于某种原因它们不是您想要的顺序，您可以columns在to_csv.

For example, if you have a csv with columns a, b, c, d:

例如，如果您有一个包含 a、b、c、d 列的 csv：

data = pd.read_csv(filename)
data.to_csv(filename, columns=['a', 'b', 'c', 'd'])

Answer 3

回答by Lawrence Chernin

Another workaround is to do this:

另一种解决方法是这样做：

import pandas as pd
data = pd.read_csv(filename)
data2 = df[['A','B','C']]  #put 'A' 'B' 'C' in the desired order
data2.to_csv(filename)

在 Python Pandas DataFrame 中保留列顺序

提问by Hernan

采纳答案by CnrL

回答by Matti John

回答by Lawrence Chernin

相关推荐

最近更新

标签

在 Python Pandas DataFrame 中保留列顺序

提问by Hernan

采纳答案by CnrL

回答by Matti John

回答by Lawrence Chernin

相关推荐

Python Pymysql插入不工作

Ubuntu 12.04 中缺少 Python.h

使用请求模块发出 Python HTTPS 请求的正确方法？

使用选定的浏览器启动 IPython 笔记本

相关推荐

最近更新

标签