有条件地在 Pandas 中设置组的值 python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17102647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Set values of groups in pandas conditionally python
提问by ybb
I have a dataframe with the following columns:
我有一个包含以下列的数据框:
duration, cost, channel
2 180 TV1
1 200 TV2
2 300 TV3
1 nan TV1
2 nan TV2
2 nan TV3
2 nan TV1
1 40 TV2
1 nan TV3
Some of the cost values are nans, and to fill them I need to do the following:
一些成本值是 nans,要填充它们,我需要执行以下操作:
- group by channel
- within a channel, sum the available cost and divide by the number of * occurrences (average)
- reassign values for all rows within that channel:
- if duration = 1, cost = average * 1.5
- if duration = 2, cost = average
- 按频道分组
- 在一个通道内,将可用成本相加并除以 * 出现次数(平均)
- 为该通道内的所有行重新分配值:
- 如果持续时间 = 1,成本 = 平均 * 1.5
- 如果持续时间 = 2,成本 = 平均
Example: TV2 channel, we have 3 entries, with one entry having null cost. So I need to do the following:
示例:TV2 频道,我们有 3 个条目,其中一个条目的成本为零。所以我需要做以下事情:
average = 200+40/3 = 80
if duration = 1, cost = 80 * 1.5 = 120
duration, cost, channel
2 180 TV1
1 120 TV2
2 300 TV3
1 nan TV1
2 80 TV2
2 nan TV3
2 nan TV1
1 120 TV2
1 nan TV3
I know i should do df.groupby('channel') and then apply function to each group. The problem is that I need to modify not only null values, I need to modify all cost values within a group if 1 cost is null.
我知道我应该做 df.groupby('channel') 然后将函数应用于每个组。问题是我不仅需要修改空值,如果 1 个成本为空,我还需要修改组内的所有成本值。
Any tips help would be appreciated.
任何提示帮助将不胜感激。
Thanks!
谢谢!
回答by Rutger Kassies
If i understand your problem correctly, you want something like:
如果我正确理解您的问题,您需要以下内容:
def myfunc(group):
# only modify cost if there are nan's
if len(group) != group.cost.count():
# set all cost values to the mean
group['cost'] = group.cost.sum() / len(group)
# multiply by 1.5 if the duration equals 1
group['cost'][group.duration == 1] = group['cost'] * 1.5
return group
df.groupby('channel').apply(myfunc)
duration cost channel
0 2 60 TV1
1 1 120 TV2
2 2 100 TV3
3 1 90 TV1
4 2 80 TV2
5 2 100 TV3
6 2 60 TV1
7 1 120 TV2
8 1 150 TV3
回答by Y.C.
In the new version of Pandas, the code should change to
在新版本的 Pandas 中,代码应该改为
def myfunc(group):
# only modify cost if there are nan's
if len(group) != group.cost.count():
# set all cost values to the mean
group['cost'] = group.cost.sum() / len(group)
# multiply by 1.5 if the duration equals 1
_ = group.set_value(group[group.duration == 1].index, 'cost', group['cost'] * 1.5)
return group

