pandas 比较熊猫中数据帧的标题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45482755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:10:08  来源:igfitidea点击:

Compare headers of dataframes in pandas

pythonpython-3.xpandas

提问by Mtheitroadaine24

I am trying to compare the headers of two pandas dataframes and filter the columns that match. df1 is my big dataframe with two headers, df2 is sort of a dictionary where I have saved every column header I will need from df1.

我正在尝试比较两个 Pandas 数据帧的标题并过滤匹配的列。df1 是我的大数据框,带有两个标题,df2 是一种字典,我在其中保存了 df1 中需要的每个列标题。

So if df1 is something like this:

所以如果 df1 是这样的:

    A         B         C         D
    a         b         c         d
 0.469112 -0.282863 -1.509059 -1.135632
 1.212112 -0.173215  0.119209 -1.044236
-0.861849 -2.104569 -0.494929  1.071804
 0.721555 -0.706771 -1.039575  0.271860
-0.424972  0.567020  0.276232 -1.087401
-0.673690  0.113648 -1.478427  0.524988

and df2 is something like this:

和 df2 是这样的:

   B         D         E

I need to get the output:

我需要得到输出:

     B          D
 -0.282863  -1.135632
 -0.173215  -1.044236
 -2.104569   1.071804
 -0.706771   0.271860
  0.567020  -1.087401
  0.113648   0.524988

and also a list of the header elements that were not matching:

以及不匹配的标题元素列表:

A      C

as well as elements missing from df1:

以及 df1 中缺少的元素:

E

So far I have tried the iloc command and a lot of different suggestions here on stackoverflow for comparing rows. Since I am comparing the headers though it was not possible.

到目前为止,我已经在 stackoverflow 上尝试了 iloc 命令和许多不同的建议来比较行。由于我正在比较标题,尽管这是不可能的。

EDIT: I have tried

编辑:我试过了

df1.columns.intersection(df2.columns)

but the result is:

但结果是:

MultiIndex(levels=[[], []],
           labels=[[], []])

Is this because of the multiple headers?

这是因为多个标题吗?

回答by Zero

Here's are couple of methods, for given df1and df2

这里有几种方法,对于给定df1df2

In [1041]: df1.columns
Out[1041]: Index([u'A', u'B', u'C', u'D'], dtype='object')

In [1042]: df2.columns
Out[1042]: Index([u'B', u'D', u'E'], dtype='object')

Columns in both df1and df2

在两列df1df2

In [1046]: df1.columns.intersection(df2.columns)
Out[1046]: Index([u'B', u'D'], dtype='object')

Columns in df1not in df2

列在df1不在df2

In [1047]: df1.columns.difference(df2.columns)
Out[1047]: Index([u'A', u'C'], dtype='object')

Columns in df2not in df1

列在df2不在df1

In [1048]: df2.columns.difference(df1.columns)
Out[1048]: Index([u'E'], dtype='object')