如何将选定的列从具有不同列的 df 附加到 Pandas 数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29335857/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to append selected columns to pandas dataframe from df with different columns
提问by JPC
I want to be able to append df1 df2, df3 into one df_All , but since each of the dataframe has different column. How could I do this in for loop ( I have others stuff that i have to do in the for loop ) ?
我希望能够将 df1 df2, df3 附加到一个 df_All 中,但由于每个数据帧都有不同的列。我怎么能在 for 循环中做到这一点(我在 for 循环中还有其他事情要做)?
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
list = ['df1','df2','df3']
df_All = pd.DataFrame()
for i in list:
# doing something else as well ---
df_All = df_All.append(i)


I want my df_All to only have ( A & B ) only, is there a way to this in loop above ? something like append only this two columns ?
我希望我的 df_All 只有( A & B ),有没有办法在上面的循环中做到这一点?像只追加这两列之类的东西?
采纳答案by EdChum
If I understand what you want then you need to select just columns 'A' and 'B' from df3and then use pd.concat:
如果我了解您想要什么,那么您只需要从中选择“A”和“B”列df3,然后使用pd.concat:
In [35]:
df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
df_list = [df1,df2,df3[['A','B']]]
pd.concat(df_list, ignore_index=True)
Out[35]:
A B
0 1 4
1 2 5
2 3 6
3 8 5
4 9 6
5 10 7
6 1 4
7 2 5
8 3 7
Note that in your original code this is poor practice:
请注意,在您的原始代码中,这是不好的做法:
list = ['df1','df2','df3']
This shadows the built in type listplus even if it was actually a valid var name like df_listyou've created a list of strings and not a list of dfs.
list即使它实际上是一个有效的 var 名称,就像df_list您创建了一个字符串列表而不是 dfs 列表一样,这也会掩盖内置类型plus 。
If you want to determine the common columns then you can determine this using the np.intersectionmethod on the columns:
如果要确定公共列,则可以使用np.intersection列上的方法来确定:
In [39]:
common_cols = df1.columns.intersection(df2.columns).intersection(df3.columns)
common_cols
Out[39]:
Index(['A', 'B'], dtype='object')
回答by Alexander
You can also use set comprehension to join all common columns from an arbitrary list of DataFrames:
您还可以使用集合理解从任意数据帧列表中连接所有常见列:
df_list = [df1, df2, df3]
common_cols = list(set.intersection(*(set(c) for c in df_list)))
df_new = pd.concat([df[common_cols] for df in df_list], ignore_index=True)
>>> df_new
A B
0 1 4
1 2 5
2 3 6
3 8 5
4 9 6
5 10 7
6 1 4
7 2 5
8 3 7

