Python 熊猫的 groupby 中的 as_index 是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41236370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:39:58  来源:igfitidea点击:

What is as_index in groupby in pandas?

pythonpandas

提问by Haritha

What exactly is the function of as_indexin groupbyin Pandas?

Pandasas_indexgroupbyin的功能究竟是什么?

回答by MYGz

print()is your friend when you don't understand a thing. It clears out doubts many times.

print()当你什么都不懂的时候,就是你的朋友。无数次解开疑惑。

Take a look:

看一看:

import pandas as pd

df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})

print(df)

print(df.groupby('books', as_index=True).sum())

print(df.groupby('books', as_index=False).sum())

Output:

输出:

  books  price
0   bk1     12
1   bk1     12
2   bk1     12
3   bk2     15
4   bk2     15
5   bk3     17

       price
books       
bk1       36
bk2       30
bk3       17

  books  price
0   bk1     36
1   bk2     30
2   bk3     17

When as_index=Truethe key(s) you use in groupby()will become an index in the new dataframe.

as_index=True您使用的键groupby()将成为新数据框中的索引时。

The benefits you get when you set the column as index are:

将列设置为索引的好处是:

  1. Speed.When you filter values based on the index column eg. df.loc['bk1'], it would be faster because of hashing of index column. It doesn't have to traverse the entire bookscolumn to find 'bk1'. It will just calculate the hash value of 'bk1'and find it in 1 go.

  2. Ease.When as_index=Trueyou can use this syntax df.loc['bk1']which is shorter and faster as opposed to df.loc[df.books=='bk1']which is longer and slower.

  1. 速度。当您根据索引列过滤值时,例如。df.loc['bk1'],由于索引列的散列,它会更快。它不必遍历整个books列来查找'bk1'. 它只会计算 的哈希值'bk1'并在 1 步中找到它。

  2. 舒适。as_index=True您可以使用这种df.loc['bk1']更短更快的语法而不是df.loc[df.books=='bk1']更长更慢的语法时。

回答by Marc vT

When using the group by function, as_index can be set to true or false depending on if you want the column by which you grouped to be the index of the output.

使用 group by 函数时,可以将 as_index 设置为 true 或 false,具体取决于您是否希望分组所依据的列作为输出的索引。

import pandas as pd
table_r = pd.DataFrame({
    'colors': ['orange', 'red', 'orange', 'red'],
    'price': [1000, 2000, 3000, 4000],
    'quantity': [500, 3000, 3000, 4000],
})
new_group = table_r.groupby('colors',as_index=True).count().sort('price', ascending=False)
print new_group

output:

输出:

        price  quantity
colors                 
orange      2         2
red         2         2

Now with as_index=False

现在 as_index=False

   colors  price  quantity
0  orange      2         2
1     red      2         2

Note how colors is no longer an index when we change as_index=False

请注意,当我们更改 as_index=False 时,颜色如何不再是索引