pandas 如何在pandas中实现sql合并
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43177685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement sql coalesce in pandas
提问by Anoop
I have a data frame like
我有一个像
df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
A B C
0 1.0 NaN 5
1 2.0 10.0 10
2 NaN NaN 7
I want to add a new column 'D'. Expected output is
我想添加一个新列“D”。预期输出是
A B C D
0 1.0 NaN 5 1.0
1 2.0 10.0 10 2.0
2 NaN NaN 7 7.0
Thanks in advance!
提前致谢!
回答by philshem
Another way is to explicitly fill column D with A,B,C in that order.
另一种方法是按 A、B、C 的顺序显式填充 D 列。
df['D'] = np.nan
df['D'] = df.D.fillna(df.A).fillna(df.B).fillna(df.C)
回答by jezrael
I think you need bfill
with selecting first column by iloc
:
df['D'] = df.bfill(axis=1).iloc[:,0]
print (df)
A B C D
0 1.0 NaN 5 1.0
1 2.0 10.0 10 2.0
2 NaN NaN 7 7.0
same as:
与...一样:
df['D'] = df.fillna(method='bfill',axis=1).iloc[:,0]
print (df)
A B C D
0 1.0 NaN 5 1.0
1 2.0 10.0 10 2.0
2 NaN NaN 7 7.0
回答by piRSquared
option 1pandas
选项1pandas
df.assign(D=df.lookup(df.index, df.isnull().idxmin(1)))
A B C D
0 1.0 NaN 5 1.0
1 2.0 10.0 10 2.0
2 NaN NaN 7 7.0
option 2numpy
选项 2numpy
v = df.values
j = np.isnan(v).argmin(1)
df.assign(D=v[np.arange(len(v)), j])
A B C D
0 1.0 NaN 5 1.0
1 2.0 10.0 10 2.0
2 NaN NaN 7 7.0
naive time test
over given data
对给定数据的幼稚时间测试
over larger data
在更大的数据上
回答by yardsale8
Another approach is to use the combine_first
method of a pd.Series
. Using your example df
,
另一种方法是使用 a 的combine_first
方法pd.Series
。使用你的例子df
,
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
>>> df
A B C
0 1.0 NaN 5
1 2.0 10.0 10
2 NaN NaN 7
we have
我们有
>>> df.A.combine_first(df.B).combine_first(df.C)
0 1.0
1 2.0
2 7.0
We can use reduce
to abstract this pattern to work with an arbitrary number of columns.
我们可以使用reduce
抽象此模式来处理任意数量的列。
>>> cols = [df[c] for c in df.columns]
>>> reduce(lambda acc, col: acc.combine_first(col), cols)
0 1.0
1 2.0
2 7.0
Name: A, dtype: float64
Let's put this all together in a function.
让我们将所有这些放在一个函数中。
>>> def coalesce(*args):
... return reduce(lambda acc, col: acc.combine_first(col), args)
...
>>> coalesce(*cols)
0 1.0
1 2.0
2 7.0
Name: A, dtype: float64
回答by Denys
There is already a method for Series in Pandas that does this:
Pandas 中已经有一个 Series 方法可以做到这一点:
df['D'] = df['A'].combine_first(df['C'])
Or just stack them if you want to look up values sequentially:
或者,如果您想按顺序查找值,只需将它们堆叠起来:
df['D'] = df['A'].combine_first(df['B']).combine_first(df['C'])
This outputs the following:
这将输出以下内容:
>>> df
A B C D
0 1.0 NaN 5 1.0
1 2.0 10.0 10 2.0
2 NaN NaN 7 7.0