Python 在不使用索引的情况下替换 Pandas DataFrame 中选定单元格的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17729853/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace value for a selected cell in pandas DataFrame without using index
提问by LondonRob
this is a rather similar question to this questionbut with one key difference: I'm selecting the data I want to change not by its index but by some criteria.
这是一个与此问题相当相似的问题,但有一个关键区别:我不是通过索引而是通过某些标准来选择我想要更改的数据。
If the criteria I apply return a single row, I'd expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn't work:
如果我应用的条件返回单行,我希望能够以简单的方式设置该行中某个列的值,但我的第一次尝试不起作用:
>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009],
... 'flavour':['strawberry','strawberry','banana','banana',
... 'strawberry','strawberry','banana','banana'],
... 'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
... 'sales':[10,12,22,23,11,13,23,24]})
>>> d
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 24 2009
>>> d[d.sales==24]
day flavour sales year
7 sun banana 24 2009
>>> d[d.sales==24].sales = 100
>>> d
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 24 2009
So rather than setting 2009 Sunday's Banana sales to 100, nothing happens! What's the nicest way to do this? Ideally the solution should use the row number, as you normally don't know that in advance!
因此,与其将 2009 年周日的香蕉销量设置为 100,什么也没有发生!什么是最好的方法来做到这一点?理想情况下,解决方案应该使用行号,因为您通常事先不知道!
采纳答案by waitingkuo
Many ways to do that
有很多方法可以做到这一点
1
1
In [7]: d.sales[d.sales==24] = 100
In [8]: d
Out[8]:
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 12 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 100 2009
2
2
In [26]: d.loc[d.sales == 12, 'sales'] = 99
In [27]: d
Out[27]:
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 99 2008
2 sat banana 22 2008
3 sun banana 23 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 23 2009
7 sun banana 100 2009
3
3
In [28]: d.sales = d.sales.replace(23, 24)
In [29]: d
Out[29]:
day flavour sales year
0 sat strawberry 10 2008
1 sun strawberry 99 2008
2 sat banana 22 2008
3 sun banana 24 2008
4 sat strawberry 11 2009
5 sun strawberry 13 2009
6 sat banana 24 2009
7 sun banana 100 2009
回答by ram
Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.
不确定旧版本的熊猫,但在 0.16 中,可以根据多个列值设置特定单元格的值。
Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.
扩展@waitingkuo 提供的答案,同样的操作也可以基于多列的值来完成。
d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100
回答by elPastor
Old question, but I'm surprised nobody mentioned numpy's .where()
functionality (which can be called directly from the pandas module).
老问题,但我很惊讶没有人提到 numpy 的.where()
功能(可以直接从 pandas 模块调用)。
In this case the code would be:
在这种情况下,代码将是:
d.sales = pd.np.where(d.sales == 24, 100, d.sales)
To my knowledge, this is one of the fastest ways to conditionally change data across a series.
据我所知,这是有条件地更改一系列数据的最快方法之一。