相当于 Python/Pandas 中的 R/ifelse?比较字符串列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35666272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:46:58  来源:igfitidea点击:

Equivalent of R/ifelse in Python/Pandas? Compare string columns?

pythonrpandasnumpy

提问by zono

My goal is comparing between two columns and add the result column. R uses ifelse but I need to know pandas's way.

我的目标是比较两列并添加结果列。R 使用 ifelse 但我需要知道大Pandas的方式。

R

电阻

> head(mau.payment)
  log_month user_id install_month payment
1   2013-06       1       2013-04       0
2   2013-06       2       2013-04       0
3   2013-06       3       2013-04   14994

> mau.payment$user.type <-ifelse(mau.payment$install_month == mau.payment$log_month, "install", "existing")
> head(mau.payment)
  log_month user_id install_month payment user.type
1   2013-06       1       2013-04       0  existing
2   2013-06       2       2013-04       0  existing
3   2013-06       3       2013-04   14994  existing
4   2013-06       4       2013-04       0  existing
5   2013-06       6       2013-04       0  existing
6   2013-06       7       2013-04       0  existing

Pandas

Pandas

>>> maupayment
user_id  log_month  install_month
1        2013-06    2013-04              0
         2013-07    2013-04              0
2        2013-06    2013-04              0
3        2013-06    2013-04          14994

I tried some cases but did not work. It seems that string comparison does not work.

我尝试了一些情况,但没有奏效。似乎字符串比较不起作用。

>>>np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')

TypeError: 'str' object cannot be interpreted as an integer 

Could you help me please?

请问你能帮帮我吗?



Pandas and numpy version.

Pandas 和 numpy 版本。

>>> pd.version.version
'0.16.2'
>>> np.version.full_version
'1.9.2'


After update the versions, it worked!

更新版本后,它工作了!

>>> np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
array(['existing', 'install', 'existing', ..., 'install', 'install',
       'install'], 
      dtype='<U8')

采纳答案by jezrael

You have to upgrade pandas to last version, because in version 0.17.1it works very well.

您必须将 pandas 升级到最新版本,因为在版本中0.17.1它运行良好。

Sample (first value in column install_monthis changed for matching):

示例(列中的第一个值install_month已更改以进行匹配):

print maupayment
  log_month  user_id install_month  payment
1   2013-06        1       2013-06        0
2   2013-06        2       2013-04        0
3   2013-06        3       2013-04    14994

print np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
['install' 'existing' 'existing']

回答by Cybernetic

One option is to use an anonymous functionin combination with Pandas's apply function:

一种选择是将匿名函数Pandas 的 apply 函数结合使用:

Setup some branchinglogic in a function:

在函数中设置一些分支逻辑:

def if_this_else_that(x, list_of_checks, yes_label, no_label):
    if x in list_of_checks:
        res = yes_label
    else: 
        res = no_label
    return(res)

This takes the xfrom lambda (see below), a listof things to look for, the yes label, and the no label.

这需要来自 lambda的x(见下文)、要查找的内容列表yes labelno label

For example, say we are looking at the IMDB dataset (imdb_df):

例如,假设我们正在查看 IMDB 数据集 (imdb_df):

enter image description here

enter image description here

...and I want to add a new column called "new_rating" that shows whether the movie is mature or not.

...我想添加一个名为“new_rating”的新列,以显示电影是否成熟。

I can use Pandas applyfunction along with my branching logic above:

我可以使用 Pandas apply函数以及上面的分支逻辑:

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

enter image description here

enter image description here

There are also times we need to combine this with another check. For example, some entries in the IMDB dataset are NaN. I can check for both NaN and the maturity ratingas follows:

有时我们还需要将此与另一个检查结合起来。例如,IMDB 数据集中的某些条目是NaN。我可以检查 NaN 和成熟度等级,如下所示:

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: 'not provided' if x in ['nan'] else if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

In this case my NaN was first converted to a string, but you can obviously do this with genuine NaNs as well.

在这种情况下,我的 NaN 首先被转换为字符串,但您显然也可以使用真正的 NaN 来做到这一点。