相当于 Python/Pandas 中的 R/ifelse？比较字符串列？

Question

提问by zono

My goal is comparing between two columns and add the result column. R uses ifelse but I need to know pandas's way.

我的目标是比较两列并添加结果列。R 使用 ifelse 但我需要知道大Pandas的方式。

R

电阻

> head(mau.payment)
  log_month user_id install_month payment
1   2013-06       1       2013-04       0
2   2013-06       2       2013-04       0
3   2013-06       3       2013-04   14994

> mau.payment$user.type <-ifelse(mau.payment$install_month == mau.payment$log_month, "install", "existing")
> head(mau.payment)
  log_month user_id install_month payment user.type
1   2013-06       1       2013-04       0  existing
2   2013-06       2       2013-04       0  existing
3   2013-06       3       2013-04   14994  existing
4   2013-06       4       2013-04       0  existing
5   2013-06       6       2013-04       0  existing
6   2013-06       7       2013-04       0  existing

Pandas

>>> maupayment
user_id  log_month  install_month
1        2013-06    2013-04              0
         2013-07    2013-04              0
2        2013-06    2013-04              0
3        2013-06    2013-04          14994

I tried some cases but did not work. It seems that string comparison does not work.

我尝试了一些情况，但没有奏效。似乎字符串比较不起作用。

>>>np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')

TypeError: 'str' object cannot be interpreted as an integer

Could you help me please?

请问你能帮帮我吗？

Pandas and numpy version.

Pandas 和 numpy 版本。

>>> pd.version.version
'0.16.2'
>>> np.version.full_version
'1.9.2'

After update the versions, it worked!

更新版本后，它工作了！

>>> np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
array(['existing', 'install', 'existing', ..., 'install', 'install',
       'install'], 
      dtype='<U8')

Answer 1

采纳答案by jezrael

You have to upgrade pandas to last version, because in version 0.17.1it works very well.

您必须将 pandas 升级到最新版本，因为在版本中0.17.1它运行良好。

Sample (first value in column install_monthis changed for matching):

示例（列中的第一个值install_month已更改以进行匹配）：

print maupayment
  log_month  user_id install_month  payment
1   2013-06        1       2013-06        0
2   2013-06        2       2013-04        0
3   2013-06        3       2013-04    14994

print np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
['install' 'existing' 'existing']

Answer 2

回答by Cybernetic

One option is to use an anonymous functionin combination with Pandas's apply function:

一种选择是将匿名函数与Pandas 的 apply 函数结合使用：

Setup some branchinglogic in a function:

在函数中设置一些分支逻辑：

def if_this_else_that(x, list_of_checks, yes_label, no_label):
    if x in list_of_checks:
        res = yes_label
    else: 
        res = no_label
    return(res)

This takes the xfrom lambda (see below), a listof things to look for, the yes label, and the no label.

这需要来自 lambda的x（见下文）、要查找的内容列表、yes label和no label。

For example, say we are looking at the IMDB dataset (imdb_df):

例如，假设我们正在查看 IMDB 数据集 (imdb_df)：

...and I want to add a new column called "new_rating" that shows whether the movie is mature or not.

...我想添加一个名为“new_rating”的新列，以显示电影是否成熟。

I can use Pandas applyfunction along with my branching logic above:

我可以使用 Pandas apply函数以及上面的分支逻辑：

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

There are also times we need to combine this with another check. For example, some entries in the IMDB dataset are NaN. I can check for both NaN and the maturity ratingas follows:

有时我们还需要将此与另一个检查结合起来。例如，IMDB 数据集中的某些条目是NaN。我可以检查 NaN 和成熟度等级，如下所示：

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: 'not provided' if x in ['nan'] else if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

In this case my NaN was first converted to a string, but you can obviously do this with genuine NaNs as well.

在这种情况下，我的 NaN 首先被转换为字符串，但您显然也可以使用真正的 NaN 来做到这一点。

相当于 Python/Pandas 中的 R/ifelse？比较字符串列？

提问by zono

采纳答案by jezrael

回答by Cybernetic

相关推荐

最近更新

标签

相当于 Python/Pandas 中的 R/ifelse？比较字符串列？

提问by zono

采纳答案by jezrael

回答by Cybernetic

相关推荐

从 TimeDelta 到 Pandas 中的浮动天数

pandas 如果不为空，熊猫使用值，否则使用下一列的值

pandas 熊猫日期列减法

pandas 如何计算具有条件的连续熊猫数据帧行之间的天差

相关推荐

最近更新

标签