pandas 比较熊猫中数据帧的标题

Question

提问by Mtheitroadaine24

I am trying to compare the headers of two pandas dataframes and filter the columns that match. df1 is my big dataframe with two headers, df2 is sort of a dictionary where I have saved every column header I will need from df1.

我正在尝试比较两个 Pandas 数据帧的标题并过滤匹配的列。df1 是我的大数据框，带有两个标题，df2 是一种字典，我在其中保存了 df1 中需要的每个列标题。

So if df1 is something like this:

所以如果 df1 是这样的：

    A         B         C         D
    a         b         c         d
 0.469112 -0.282863 -1.509059 -1.135632
 1.212112 -0.173215  0.119209 -1.044236
-0.861849 -2.104569 -0.494929  1.071804
 0.721555 -0.706771 -1.039575  0.271860
-0.424972  0.567020  0.276232 -1.087401
-0.673690  0.113648 -1.478427  0.524988

and df2 is something like this:

和 df2 是这样的：

   B         D         E

I need to get the output:

我需要得到输出：

     B          D
 -0.282863  -1.135632
 -0.173215  -1.044236
 -2.104569   1.071804
 -0.706771   0.271860
  0.567020  -1.087401
  0.113648   0.524988

and also a list of the header elements that were not matching:

以及不匹配的标题元素列表：

A      C

as well as elements missing from df1:

以及 df1 中缺少的元素：

So far I have tried the iloc command and a lot of different suggestions here on stackoverflow for comparing rows. Since I am comparing the headers though it was not possible.

到目前为止，我已经在 stackoverflow 上尝试了 iloc 命令和许多不同的建议来比较行。由于我正在比较标题，尽管这是不可能的。

EDIT: I have tried

编辑：我试过了

df1.columns.intersection(df2.columns)

but the result is:

但结果是：

MultiIndex(levels=[[], []],
           labels=[[], []])

Is this because of the multiple headers?

这是因为多个标题吗？

Answer 1

回答by Zero

Here's are couple of methods, for given df1and df2

这里有几种方法，对于给定df1和df2

In [1041]: df1.columns
Out[1041]: Index([u'A', u'B', u'C', u'D'], dtype='object')

In [1042]: df2.columns
Out[1042]: Index([u'B', u'D', u'E'], dtype='object')

Columns in both df1and df2

在两列df1和df2

In [1046]: df1.columns.intersection(df2.columns)
Out[1046]: Index([u'B', u'D'], dtype='object')

Columns in df1not in df2

列在df1不在df2

In [1047]: df1.columns.difference(df2.columns)
Out[1047]: Index([u'A', u'C'], dtype='object')

Columns in df2not in df1

列在df2不在df1

In [1048]: df2.columns.difference(df1.columns)
Out[1048]: Index([u'E'], dtype='object')

pandas 比较熊猫中数据帧的标题

提问by Mtheitroadaine24

回答by Zero

相关推荐

最近更新

标签

pandas 比较熊猫中数据帧的标题

提问by Mtheitroadaine24

回答by Zero

相关推荐

Pandas 的“扩展窗口”功能是什么？

pandas 为什么 %timeit 循环不同的次数？

如何保存用“pandas.DataFrame.plot”创建的图像？

pandas Python：用中值替换异常值

相关推荐

最近更新

标签