Python 大熊猫相当于 np.where
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38579532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas equivalent of np.where
提问by max
np.where
has the semantics of a vectorized if/else (similar to Apache Spark's when
/otherwise
DataFrame method). I know that I can use np.where
on pandas Series
, but pandas
often defines its own API to use instead of raw numpy
functions, which is usually more convenient with pd.Series
/pd.DataFrame
.
np.where
具有矢量化 if/else 的语义(类似于 Apache Spark 的when
/ otherwise
DataFrame 方法)。我知道我可以np.where
在 pandas 上使用Series
,但pandas
经常定义自己的 API 来代替原始numpy
函数,这通常更方便使用pd.Series
/ pd.DataFrame
。
Sure enough, I found pandas.DataFrame.where
. However, at first glance, it has a completely different semantics. I could not find a way to rewrite the most basic example of np.where
using pandas where
:
果然,我找到了pandas.DataFrame.where
。但是,乍一看,它具有完全不同的语义。我找不到重写np.where
使用 pandas 的最基本示例的方法where
:
# df is pd.DataFrame
# how to write this using df.where?
df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])
Am I missing something obvious? Or is pandas where
intended for a completely different use case, despite same name as np.where
?
我错过了一些明显的东西吗?或者where
,尽管与np.where
?
采纳答案by Alex
Try:
尝试:
(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
The difference between the numpy
where
and DataFrame
where
is that the default values are supplied by the DataFrame
that the where
method is being called on (docs).
的之间的差numpy
where
和DataFrame
where
是,默认值是由所提供的DataFrame
是,where
正在被调用的方法上(文档)。
I.e.
IE
np.where(m, A, B)
is roughly equivalent to
大致相当于
A.where(m, B)
If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python:
如果您想要使用 Pandas 的类似调用签名,您可以利用Python 中方法调用的工作方式:
pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])
or without kwargs (Note: that the positional order of arguments is different from the numpy
where
argument order):
或不kwargs(注:该参数的位置顺序是从不同的numpy
where
参数顺序):
pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])