Python 大熊猫相当于 np.where

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38579532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:08:27  来源:igfitidea点击:

pandas equivalent of np.where

pythonnumpypandas

提问by max

np.wherehas the semantics of a vectorized if/else (similar to Apache Spark's when/otherwiseDataFrame method). I know that I can use np.whereon pandas Series, but pandasoften defines its own API to use instead of raw numpyfunctions, which is usually more convenient with pd.Series/pd.DataFrame.

np.where具有矢量化 if/else 的语义(类似于 Apache Spark 的when/ otherwiseDataFrame 方法)。我知道我可以np.where在 pandas 上使用Series,但pandas经常定义自己的 API 来代替原始numpy函数,这通常更方便使用pd.Series/ pd.DataFrame

Sure enough, I found pandas.DataFrame.where. However, at first glance, it has a completely different semantics. I could not find a way to rewrite the most basic example of np.whereusing pandas where:

果然,我找到了pandas.DataFrame.where。但是,乍一看,它具有完全不同的语义。我找不到重写np.where使用 pandas 的最基本示例的方法where

# df is pd.DataFrame
# how to write this using df.where?
df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])

Am I missing something obvious? Or is pandas whereintended for a completely different use case, despite same name as np.where?

我错过了一些明显的东西吗?或者where,尽管与np.where?

采纳答案by Alex

Try:

尝试:

(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])

The difference between the numpywhereand DataFramewhereis that the default values are supplied by the DataFramethat the wheremethod is being called on (docs).

的之间的差numpywhereDataFramewhere是,默认值是由所提供的DataFrame是,where正在被调用的方法上(文档)。

I.e.

IE

np.where(m, A, B)

is roughly equivalent to

大致相当于

A.where(m, B)

If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python:

如果您想要使用 Pandas 的类似调用签名,您可以利用Python 中方法调用的工作方式

pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])

or without kwargs (Note: that the positional order of arguments is different from the numpywhereargument order):

或不kwargs(注:该参数的位置顺序是从不同的numpywhere参数顺序):

pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])