Python 熊猫的 groupby 中的 as_index 是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41236370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is as_index in groupby in pandas?
提问by Haritha
What exactly is the function of as_index
in groupby
in Pandas?
Pandasas_index
中groupby
in的功能究竟是什么?
回答by MYGz
print()
is your friend when you don't understand a thing. It clears out doubts many times.
print()
当你什么都不懂的时候,就是你的朋友。无数次解开疑惑。
Take a look:
看一看:
import pandas as pd
df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
print(df)
print(df.groupby('books', as_index=True).sum())
print(df.groupby('books', as_index=False).sum())
Output:
输出:
books price
0 bk1 12
1 bk1 12
2 bk1 12
3 bk2 15
4 bk2 15
5 bk3 17
price
books
bk1 36
bk2 30
bk3 17
books price
0 bk1 36
1 bk2 30
2 bk3 17
When as_index=True
the key(s) you use in groupby()
will become an index in the new dataframe.
当as_index=True
您使用的键groupby()
将成为新数据框中的索引时。
The benefits you get when you set the column as index are:
将列设置为索引的好处是:
Speed.When you filter values based on the index column eg.
df.loc['bk1']
, it would be faster because of hashing of index column. It doesn't have to traverse the entirebooks
column to find'bk1'
. It will just calculate the hash value of'bk1'
and find it in 1 go.Ease.When
as_index=True
you can use this syntaxdf.loc['bk1']
which is shorter and faster as opposed todf.loc[df.books=='bk1']
which is longer and slower.
速度。当您根据索引列过滤值时,例如。
df.loc['bk1']
,由于索引列的散列,它会更快。它不必遍历整个books
列来查找'bk1'
. 它只会计算 的哈希值'bk1'
并在 1 步中找到它。舒适。当
as_index=True
您可以使用这种df.loc['bk1']
更短更快的语法而不是df.loc[df.books=='bk1']
更长更慢的语法时。
回答by Marc vT
When using the group by function, as_index can be set to true or false depending on if you want the column by which you grouped to be the index of the output.
使用 group by 函数时,可以将 as_index 设置为 true 或 false,具体取决于您是否希望分组所依据的列作为输出的索引。
import pandas as pd
table_r = pd.DataFrame({
'colors': ['orange', 'red', 'orange', 'red'],
'price': [1000, 2000, 3000, 4000],
'quantity': [500, 3000, 3000, 4000],
})
new_group = table_r.groupby('colors',as_index=True).count().sort('price', ascending=False)
print new_group
output:
输出:
price quantity
colors
orange 2 2
red 2 2
Now with as_index=False
现在 as_index=False
colors price quantity
0 orange 2 2
1 red 2 2
Note how colors is no longer an index when we change as_index=False
请注意,当我们更改 as_index=False 时,颜色如何不再是索引