Python 大熊猫相当于 np.where
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38579532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas equivalent of np.where
提问by max
np.wherehas the semantics of a vectorized if/else (similar to Apache Spark's when/otherwiseDataFrame method). I know that I can use np.whereon pandas Series, but pandasoften defines its own API to use instead of raw numpyfunctions, which is usually more convenient with pd.Series/pd.DataFrame.
np.where具有矢量化 if/else 的语义(类似于 Apache Spark 的when/ otherwiseDataFrame 方法)。我知道我可以np.where在 pandas 上使用Series,但pandas经常定义自己的 API 来代替原始numpy函数,这通常更方便使用pd.Series/ pd.DataFrame。
Sure enough, I found pandas.DataFrame.where. However, at first glance, it has a completely different semantics. I could not find a way to rewrite the most basic example of np.whereusing pandas where:
果然,我找到了pandas.DataFrame.where。但是,乍一看,它具有完全不同的语义。我找不到重写np.where使用 pandas 的最基本示例的方法where:
# df is pd.DataFrame
# how to write this using df.where?
df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])
Am I missing something obvious? Or is pandas whereintended for a completely different use case, despite same name as np.where?
我错过了一些明显的东西吗?或者where,尽管与np.where?
采纳答案by Alex
Try:
尝试:
(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
The difference between the numpywhereand DataFramewhereis that the default values are supplied by the DataFramethat the wheremethod is being called on (docs).
的之间的差numpywhere和DataFramewhere是,默认值是由所提供的DataFrame是,where正在被调用的方法上(文档)。
I.e.
IE
np.where(m, A, B)
is roughly equivalent to
大致相当于
A.where(m, B)
If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python:
如果您想要使用 Pandas 的类似调用签名,您可以利用Python 中方法调用的工作方式:
pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])
or without kwargs (Note: that the positional order of arguments is different from the numpywhereargument order):
或不kwargs(注:该参数的位置顺序是从不同的numpywhere参数顺序):
pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])

