Python 值错误:无法插入 ID,已存在
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41576242/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: cannot insert ID, already exists
提问by Dinosaurius
I have this data:
我有这个数据:
ID TIME
1 2
1 4
1 2
2 3
I want to group the data by ID
and calculate the mean time and the size of each group.
我想对数据进行分组ID
并计算每组的平均时间和大小。
ID MEAN_TIME COUNT
1 2.67 3
2 3.00 1
If I run this code, then I get an error "ValueError: cannot insert ID, already exists":
如果我运行此代码,则会收到错误“ValueError:无法插入 ID,已存在”:
result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index()
回答by jezrael
Use parameter drop=True
which not create new column with index
but remove it:
使用drop=True
不创建新列index
但将其删除的参数:
result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index(drop=True)
print (result)
ID TIME
0 3 2.666667
1 1 3.000000
But if need new column from index need rename
old column names first:
但是如果需要来自索引的新列rename
首先需要旧列名:
result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'})
.rename(columns={'ID':'COUNT','TIME':'MEAN_TIME'})
.reset_index()
print (result)
ID COUNT MEAN_TIME
0 1 3 2.666667
1 2 1 3.000000
Solution if need aggreagate by multiple columns:
如果需要按多列聚合的解决方案:
result = df.groupby(['ID']).agg({'TIME':{'MEAN_TIME': 'mean'}, 'ID': {'COUNT': 'count'}})
result.columns = result.columns.droplevel(0)
print (result.reset_index())
ID COUNT MEAN_TIME
0 1 3 2.666667
1 2 1 3.000000
回答by piRSquared
I'd limit my groupby
to just the TIME
column.
我会限制我groupby
的只是TIME
列。
df.groupby(['ID']).TIME.agg({'MEAN_TIME': 'mean', 'COUNT': 'count'}).reset_index()
ID MEAN_TIME COUNT
0 1 2.666667 3
1 2 3.000000 1