相当于 Python/Pandas 中的 R/ifelse?比较字符串列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35666272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Equivalent of R/ifelse in Python/Pandas? Compare string columns?
提问by zono
My goal is comparing between two columns and add the result column. R uses ifelse but I need to know pandas's way.
我的目标是比较两列并添加结果列。R 使用 ifelse 但我需要知道大Pandas的方式。
R
电阻
> head(mau.payment)
log_month user_id install_month payment
1 2013-06 1 2013-04 0
2 2013-06 2 2013-04 0
3 2013-06 3 2013-04 14994
> mau.payment$user.type <-ifelse(mau.payment$install_month == mau.payment$log_month, "install", "existing")
> head(mau.payment)
log_month user_id install_month payment user.type
1 2013-06 1 2013-04 0 existing
2 2013-06 2 2013-04 0 existing
3 2013-06 3 2013-04 14994 existing
4 2013-06 4 2013-04 0 existing
5 2013-06 6 2013-04 0 existing
6 2013-06 7 2013-04 0 existing
Pandas
Pandas
>>> maupayment
user_id log_month install_month
1 2013-06 2013-04 0
2013-07 2013-04 0
2 2013-06 2013-04 0
3 2013-06 2013-04 14994
I tried some cases but did not work. It seems that string comparison does not work.
我尝试了一些情况,但没有奏效。似乎字符串比较不起作用。
>>>np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
TypeError: 'str' object cannot be interpreted as an integer
Could you help me please?
请问你能帮帮我吗?
Pandas and numpy version.
Pandas 和 numpy 版本。
>>> pd.version.version
'0.16.2'
>>> np.version.full_version
'1.9.2'
After update the versions, it worked!
更新版本后,它工作了!
>>> np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
array(['existing', 'install', 'existing', ..., 'install', 'install',
'install'],
dtype='<U8')
采纳答案by jezrael
You have to upgrade pandas to last version, because in version 0.17.1
it works very well.
您必须将 pandas 升级到最新版本,因为在版本中0.17.1
它运行良好。
Sample (first value in column install_month
is changed for matching):
示例(列中的第一个值install_month
已更改以进行匹配):
print maupayment
log_month user_id install_month payment
1 2013-06 1 2013-06 0
2 2013-06 2 2013-04 0
3 2013-06 3 2013-04 14994
print np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
['install' 'existing' 'existing']
回答by Cybernetic
One option is to use an anonymous functionin combination with Pandas's apply function:
一种选择是将匿名函数与Pandas 的 apply 函数结合使用:
Setup some branchinglogic in a function:
在函数中设置一些分支逻辑:
def if_this_else_that(x, list_of_checks, yes_label, no_label):
if x in list_of_checks:
res = yes_label
else:
res = no_label
return(res)
This takes the xfrom lambda (see below), a listof things to look for, the yes label, and the no label.
这需要来自 lambda的x(见下文)、要查找的内容列表、yes label和no label。
For example, say we are looking at the IMDB dataset (imdb_df):
例如,假设我们正在查看 IMDB 数据集 (imdb_df):
...and I want to add a new column called "new_rating" that shows whether the movie is mature or not.
...我想添加一个名为“new_rating”的新列,以显示电影是否成熟。
I can use Pandas applyfunction along with my branching logic above:
我可以使用 Pandas apply函数以及上面的分支逻辑:
imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))
There are also times we need to combine this with another check. For example, some entries in the IMDB dataset are NaN. I can check for both NaN and the maturity ratingas follows:
有时我们还需要将此与另一个检查结合起来。例如,IMDB 数据集中的某些条目是NaN。我可以检查 NaN 和成熟度等级,如下所示:
imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: 'not provided' if x in ['nan'] else if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))
In this case my NaN was first converted to a string, but you can obviously do this with genuine NaNs as well.
在这种情况下,我的 NaN 首先被转换为字符串,但您显然也可以使用真正的 NaN 来做到这一点。