pandas 忽略 NaN 的两个数据帧的元素最大值

Question

提问by DrTRD

I have two dataframes (df1 and df2) that each have the same rows and columns. I would like to take the maximum of these two dataframes, element-by-element. In addition, the result of any element-wise maximum with a number and NaN should be the number. The approach I have implemented so far seems inefficient:

我有两个数据框（df1 和 df2），每个数据框都有相同的行和列。我想逐个元素地取这两个数据帧中的最大值。此外，任何带有数字和 NaN 的元素最大值的结果都应该是数字。到目前为止，我实施的方法似乎效率低下：

def element_max(df1,df2):
    import pandas as pd
    cond = df1 >= df2
    res = pd.DataFrame(index=df1.index, columns=df1.columns)
    res[(df1==df1)&(df2==df2)&(cond)]  = df1[(df1==df1)&(df2==df2)&(cond)]
    res[(df1==df1)&(df2==df2)&(~cond)] = df2[(df1==df1)&(df2==df2)&(~cond)]
    res[(df1==df1)&(df2!=df2)&(~cond)] = df1[(df1==df1)&(df2!=df2)]
    res[(df1!=df1)&(df2==df2)&(~cond)] = df2[(df1!=df1)&(df2==df2)]
    return res

Any other ideas? Thank you for your time.

还有其他想法吗？感谢您的时间。

Answer 1

回答by EdChum

You can use whereto test your df against another df, where the condition is True, the values from dfare returned, when false the values from df1are returned. Additionally in the case where NaNvalues are in df1then an additional call to fillna(df)will use the values from dfto fill those NaNand return the desired df:

您可以使用where另一个 df 来测试您的 df，其中条件为True，df则返回来自的值，当为 false 时，df1返回来自的值。此外，在NaN值在的情况下，df1另一个调用fillna(df)将使用来自的值df来填充这些值NaN并返回所需的 df：

In [178]:
df = pd.DataFrame(np.random.randn(5,3))
df.iloc[1,2] = np.NaN
print(df)
df1 = pd.DataFrame(np.random.randn(5,3))
df1.iloc[0,0] = np.NaN
print(df1)

          0         1         2
0  2.671118  1.412880  1.666041
1 -0.281660  1.187589       NaN
2 -0.067425  0.850808  1.461418
3 -0.447670  0.307405  1.038676
4 -0.130232 -0.171420  1.192321
          0         1         2
0       NaN -0.244273 -1.963712
1 -0.043011 -1.588891  0.784695
2  1.094911  0.894044 -0.320710
3 -1.537153  0.558547 -0.317115
4 -1.713988 -0.736463 -1.030797

In [179]:
df.where(df > df1, df1).fillna(df)

Out[179]:
          0         1         2
0  2.671118  1.412880  1.666041
1 -0.043011  1.187589  0.784695
2  1.094911  0.894044  1.461418
3 -0.447670  0.558547  1.038676
4 -0.130232 -0.171420  1.192321

Answer 2

回答by Andy Jones

A more readable way to do this in recent versions of pandas is concat-and-max:

在最新版本的 Pandas 中，一种更易读的方法是 concat-and-max：

import scipy as sp
import pandas as pd

A = pd.DataFrame([[1., 2., 3.]])
B = pd.DataFrame([[3., sp.nan, 1.]])

pd.concat([A, B]).max(level=0)
# 
#           0    1    2
#      0  3.0  2.0  3.0 
#

pandas 忽略 NaN 的两个数据帧的元素最大值

提问by DrTRD

回答by EdChum

回答by Andy Jones

相关推荐

最近更新

标签

pandas 忽略 NaN 的两个数据帧的元素最大值

提问by DrTRD

回答by EdChum

回答by Andy Jones

相关推荐

pandas 使用 iterrows() 时如何通过索引访问列

Python pandas .isnull() 不适用于对象 dtype 中的 NaT

pandas 将元组作为一行附加到数据帧

Pandas 数据框 - 运行总和并重置

相关推荐

最近更新

标签