pandas groupby 中“as_index = False”和“reset_index()”的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51866908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between "as_index = False", and "reset_index()" in pandas groupby
提问by Rohith
I just wanted to know what is the difference in the function performed by these 2.
我只是想知道这两个执行的功能有什么区别。
Data:
数据:
import pandas as pd
df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]})
as_index=False :
as_index=False :
df_group1 = df.groupby("ID").sum().reset_index()
reset_index() :
重置索引():
df_group2 = df.groupby("ID", as_index=False).sum()
Both of them give the exact same output.
它们都给出了完全相同的输出。
ID value
0 A 18
1 B 6
2 C 6
Can anyone tell me what is the difference and any example illustrating the same?
谁能告诉我有什么区别和任何说明相同的例子?
回答by qmeeus
When you use as_index=False
, you indicate to groupby()
that you don't want to set the column ID as the index (duh!). When both implementation yield the same results, use as_index=False
because it will save you some typing and an unnecessary pandas operation ;)
当您使用 时as_index=False
,您表示groupby()
您不想将列 ID 设置为索引(废话!)。当两种实现产生相同的结果时,请使用,as_index=False
因为它会为您节省一些输入和不必要的Pandas操作;)
However, sometimes, you want to apply more complicated operations on your groups. In those occasions, you might find out that one is more suited than the other.
但是,有时,您希望对组应用更复杂的操作。在这些情况下,您可能会发现一个比另一个更适合。
Example 1:You want to sum the values of three variables (i.e. columns) in a group on both axes.
示例 1:您想对一组中两个轴上的三个变量(即列)的值求和。
Using as_index=True
allows you to apply a sum over axis=1
without specifying the names of the columns, then summing the value over axis 0. When the operation is finished, you can use reset_index(drop=True/False)
to get the dataframe under the right form.
Usingas_index=True
允许您在axis=1
不指定列名称的情况下应用求和,然后对轴 0 上的值求和。 操作完成后,您可以使用reset_index(drop=True/False)
正确形式获取数据框。
Example 2:You need to set a value for the group based on the columns in the groupby()
.
示例 2:您需要根据groupby()
.
Setting as_index=False
allow you to check the condition on a common column and not on an index, which is often way easier.
设置as_index=False
允许您检查公共列的条件而不是索引,这通常更容易。
At some point, you might come across KeyError
when applying operations on groups. In that case, it is often because you are trying to use a column in your aggregate function that is currently an index of your GroupBy object.
在某些时候,您可能会在KeyError
对组应用操作时遇到这种情况。在这种情况下,通常是因为您试图在聚合函数中使用当前是 GroupBy 对象索引的列。