Python 为什么在附加 Pandas 数据框时列顺序会发生变化?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33797454/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:00:34  来源:igfitidea点击:

Why the column order is changing while appending pandas dataframes?

pythoncsvpandas

提问by kingmakerking

I want to append (merge) all the csv files in a folder using Python pandas.

我想使用 Python pandas 附加(合并)文件夹中的所有 csv 文件。

For example: Say folder has two csv files test1.csvand test2.csvas follows:

例如:假设文件夹有两个 csv 文件test1.csvtest2.csv如下所示:

A_Id    P_Id    CN1         CN2         CN3
AAA     111     702         709         740
BBB     222     1727        1734        1778

and

A_Id    P_Id    CN1         CN2         CN3
CCC     333     710        750          750
DDD     444     180        734          778

So the python script I wrote was as follows:

所以我写的python脚本如下:

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)

all_data.to_csv('testfolder/combined.csv')

Though the combined.csvseems to have all the appended rows, it looks as follows:

虽然combined.csv似乎有所有附加的行,但它看起来如下:

      CN1       CN2         CN3    A_Id    P_Id
  0   710      750         750     CCC     333
  1   180       734         778     DDD     444     
  0   702       709         740     AAA     111
  1  1727       1734        1778    BBB     222

Where as it should look like this:

它应该是这样的:

A_ID   P_Id   CN1    CN2    CN2
AAA    111    702    709    740
BBB    222    1727   1734   1778
CCC    333    110    356    123
DDD    444    220    256    223
  • Why are the first two columns moved to the end?
  • Why is it appending in the first line rather than at the last line?
  • 为什么前两列移到最后?
  • 为什么它附加在第一行而不是最后一行?

What am I missing? And how can I get get of 0s and 1s in the first column?

我错过了什么?我怎样才能在第一列中得到 0 和 1?

P.S: Since these are large csv files, I thought of using pandas.

PS:由于这些是大型csv文件,我想到了使用pandas。

采纳答案by kingmakerking

I tweaked the code as below. Comments in-line.

我调整了代码如下。在线评论。

#!/usr/bin/python
import pandas as pd
import glob

# Grab all the csv files in the folder to a list.
fileList = glob.glob('input_folder/*.csv')

#Initialize an empty dataframe to grab the csv content.
all_data = pd.DataFrame()

#Initialize an empty list to grab the dataframes.
dfList= []

for files in  fileList:
    df =  pd.read_csv(files, index_col = None, header= False)
    dfList.append(df)

#The frames will be in reverse order i.e last read file's content in the begining. So reverse it again
Reversed_dfList =  dfList[::-1]
CombinedFrame =  pd.concat(Reversed_dfList)

# The "Combined.csv" file will have combination of all the files.
CombinedFrame.to_csv('output_folder/Combined.csv', index=False)

回答by user6745154

Try this .....

尝试这个 .....

all_data = all_data.append(df)[df.columns.tolist()]

回答by Uzzy

I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:

我有同样的问题,这很痛苦。我设法通过在附加到最终数据帧后根据源数据帧重新组织列来解决它。它看起来像这样:

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)
    all_data = all_data[df.columns]

all_data.to_csv('testfolder/combined.csv') 

Since your issue was from almost two years ago, I'm posting solution which worked for me for enyone else who will also face similar issue.

由于您的问题来自近两年前,因此我发布了对我也将面临类似问题的其他人有用的解决方案。

回答by bubbassauro

You can use reindexto change to the original order:

您可以使用reindex更改为原始顺序:

all_data = all_data.append(df)
all_data = all_data.reindex(df.columns, axis=1)

I saw this here (more details in the link): https://github.com/pandas-dev/pandas/issues/4588#issuecomment-44421883

我在这里看到了(链接中有更多详细信息):https: //github.com/pandas-dev/pandas/issues/4588#issuecomment-44421883

回答by Shayan Amani

Starting from version 0.23.0, you can prevent the append()method to sort the final appended DataFrame. In your case:

从 0.23.0 版本开始,您可以阻止该append()方法对最终附加的 DataFrame 进行排序。在你的情况下:

all_data = all_data.append(df, sort=False)