pandas 比较熊猫中数据帧的标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45482755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare headers of dataframes in pandas
提问by Mtheitroadaine24
I am trying to compare the headers of two pandas dataframes and filter the columns that match. df1 is my big dataframe with two headers, df2 is sort of a dictionary where I have saved every column header I will need from df1.
我正在尝试比较两个 Pandas 数据帧的标题并过滤匹配的列。df1 是我的大数据框,带有两个标题,df2 是一种字典,我在其中保存了 df1 中需要的每个列标题。
So if df1 is something like this:
所以如果 df1 是这样的:
A B C D
a b c d
0.469112 -0.282863 -1.509059 -1.135632
1.212112 -0.173215 0.119209 -1.044236
-0.861849 -2.104569 -0.494929 1.071804
0.721555 -0.706771 -1.039575 0.271860
-0.424972 0.567020 0.276232 -1.087401
-0.673690 0.113648 -1.478427 0.524988
and df2 is something like this:
和 df2 是这样的:
B D E
I need to get the output:
我需要得到输出:
B D
-0.282863 -1.135632
-0.173215 -1.044236
-2.104569 1.071804
-0.706771 0.271860
0.567020 -1.087401
0.113648 0.524988
and also a list of the header elements that were not matching:
以及不匹配的标题元素列表:
A C
as well as elements missing from df1:
以及 df1 中缺少的元素:
E
So far I have tried the iloc command and a lot of different suggestions here on stackoverflow for comparing rows. Since I am comparing the headers though it was not possible.
到目前为止,我已经在 stackoverflow 上尝试了 iloc 命令和许多不同的建议来比较行。由于我正在比较标题,尽管这是不可能的。
EDIT: I have tried
编辑:我试过了
df1.columns.intersection(df2.columns)
but the result is:
但结果是:
MultiIndex(levels=[[], []],
labels=[[], []])
Is this because of the multiple headers?
这是因为多个标题吗?
回答by Zero
Here's are couple of methods, for given df1
and df2
这里有几种方法,对于给定df1
和df2
In [1041]: df1.columns
Out[1041]: Index([u'A', u'B', u'C', u'D'], dtype='object')
In [1042]: df2.columns
Out[1042]: Index([u'B', u'D', u'E'], dtype='object')
Columns in both df1
and df2
在两列df1
和df2
In [1046]: df1.columns.intersection(df2.columns)
Out[1046]: Index([u'B', u'D'], dtype='object')
Columns in df1
not in df2
列在df1
不在df2
In [1047]: df1.columns.difference(df2.columns)
Out[1047]: Index([u'A', u'C'], dtype='object')
Columns in df2
not in df1
列在df2
不在df1
In [1048]: df2.columns.difference(df1.columns)
Out[1048]: Index([u'E'], dtype='object')