pandas 如何在pandas中实现sql合并

Question

提问by Anoop

I have a data frame like

我有一个像

df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7

I want to add a new column 'D'. Expected output is

我想添加一个新列“D”。预期输出是

     A     B   C    D
0  1.0   NaN   5    1.0
1  2.0  10.0  10    2.0
2  NaN   NaN   7    7.0

Thanks in advance!

提前致谢！

Answer 1

回答by philshem

Another way is to explicitly fill column D with A,B,C in that order.

另一种方法是按 A、B、C 的顺序显式填充 D 列。

df['D'] = np.nan
df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C)

Answer 2

回答by jezrael

I think you need bfillwith selecting first column by iloc:

我认为您需要bfill通过以下方式选择第一列iloc：

df['D'] = df.bfill(axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

same as:

与...一样：

df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0]
print (df)
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

Answer 3

回答by piRSquared

option 1
pandas

选项1
pandas

df.assign(D=df.lookup(df.index, df.isnull().idxmin(1)))

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

option 2
numpy

选项 2
numpy

v = df.values
j = np.isnan(v).argmin(1)
df.assign(D=v[np.arange(len(v)), j])

     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

naive time test
over given data

对给定数据的幼稚时间测试

over larger data

在更大的数据上

Answer 4

回答by yardsale8

Another approach is to use the combine_firstmethod of a pd.Series. Using your example df,

另一种方法是使用 a 的combine_first方法pd.Series。使用你的例子df，

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
>>> df
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10
2  NaN   NaN   7

we have

我们有

>>> df.A.combine_first(df.B).combine_first(df.C)
0    1.0
1    2.0
2    7.0

We can use reduceto abstract this pattern to work with an arbitrary number of columns.

我们可以使用reduce抽象此模式来处理任意数量的列。

>>> cols = [df[c] for c in df.columns]
>>> reduce(lambda acc, col: acc.combine_first(col), cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64

Let's put this all together in a function.

让我们将所有这些放在一个函数中。

>>> def coalesce(*args):
...     return reduce(lambda acc, col: acc.combine_first(col), args)
...
>>> coalesce(*cols)
0    1.0
1    2.0
2    7.0
Name: A, dtype: float64

Answer 5

回答by Denys

There is already a method for Series in Pandas that does this:

Pandas 中已经有一个 Series 方法可以做到这一点：

df['D'] = df['A'].combine_first(df['C'])

Or just stack them if you want to look up values sequentially:

或者，如果您想按顺序查找值，只需将它们堆叠起来：

df['D'] = df['A'].combine_first(df['B']).combine_first(df['C'])

This outputs the following:

这将输出以下内容：

>>> df
     A     B   C    D
0  1.0   NaN   5  1.0
1  2.0  10.0  10  2.0
2  NaN   NaN   7  7.0

pandas 如何在pandas中实现sql合并

提问by Anoop

回答by philshem

回答by jezrael

回答by piRSquared

回答by yardsale8

回答by Denys

相关推荐

最近更新

标签

pandas 如何在pandas中实现sql合并

提问by Anoop

回答by philshem

回答by jezrael

回答by piRSquared

回答by yardsale8

回答by Denys

相关推荐

pandas AttributeError: 模块 'numpy' 没有属性 'version'

pandas 将日期从excel文件转换为pandas

pandas python数据帧写入R数据格式

Pandas - 基于条件的重复行

相关推荐

最近更新

标签