聚合行 Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51230581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Aggregating Rows Pandas
提问by Stefano Pozzi
I am quite new to pandas
. I need to aggregate 'Names'
if they have the same name and then make an average for 'Rating'
and 'NumsHelpful'
(without counting NaN
). 'Review'
should get concatenated whilst 'Weight(Pounds)'
should remain untouched:
我对pandas
. 我需要汇总'Names'
它们是否具有相同的名称,然后为'Rating'
和'NumsHelpful'
(不计算NaN
)求平均值。'Review'
应该被连接,而'Weight(Pounds)'
应该保持不变:
col names: ['Brand', 'Name', 'NumsHelpful', 'Rating', 'Weight(Pounds)', 'Review']
Name 'Brand' 'Name'
1534 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1535 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1536 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1537 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1538 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1539 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
1540 Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz
'NumsHelpful' 'Rating' 'Weight'
1534 NaN 2 4.5
1535 NaN 2 4.5
1536 NaN NaN 4.5
1537 NaN NaN 4.5
1538 2 NaN 4.5
1539 3 5 4.5
1540 5 NaN 4.5
'Review'
1534 Yummy - Delish
1535 The best Bloody Mary mix! - The best Bloody Ma...
1536 Best Taste by far - I've tried several if not ...
1537 Best bloody mary mix ever - This is also good ...
1538 Outstanding - Has a small kick to it but very ...
1539 OMG! So Good! - Spicy, terrific Bloody Mary mix!
1540 Good stuff - This is the best
So the output should be something like this:
所以输出应该是这样的:
'Brand' 'Name' 'NumsHelpful' 'Rating'
Zing Zang Zing Zang Bloody Mary Mix, 32 fl oz 3.33 3
'Weight' 'Review'
4.5 Review1 / Review2 / ... / ReviewN
How shall I procede? Thanks.
我该如何进行?谢谢。
回答by jezrael
Use DataFrameGroupBy.agg
with dictionary of columns and aggregated functions - columns Weight
and Brand
are agregated by first
- it means first values per groups:
使用DataFrameGroupBy.agg
的列的字典和聚合函数-列Weight
,并Brand
通过agregated first
-这意味着每个组第一值:
d = {'NumsHelpful':'mean',
'Review':'/'.join,
'Weight':'first',
'Brand':'first',
'Rating':'mean'}
df = df.groupby('Name').agg(d).reset_index()
print (df)
Name NumsHelpful \
0 Zing Zang Bloody Mary Mix, 32 fl oz 3.333333
Review Weight Brand \
0 Yummy - Delish/The best Bloody Mary mix! - The... 4.5 Zing Zang
Rating
0 3.0
Also in pandas 0.23.1 pandas version get:
同样在Pandas 0.23.1 Pandas版本中获得:
FutureWarning: 'Name' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version
FutureWarning: 'Name' 既是索引级别又是列标签。默认为列,但这会在未来版本中引发歧义错误
Solution is remove index name Name
:
解决方案是删除索引名称Name
:
df.index.name = None
Or:
或者:
df = df.rename_axis(None)
Another possible solution is not aggregate by first
, but add these column to groupby
:
另一种可能的解决方案不是由 聚合first
,而是将这些列添加到groupby
:
d = {'NumsHelpful':'mean', 'Review':'/'.join, 'Rating':'mean'}
df = df.groupby(['Name', 'Weight','Brand']).agg(d).reset_index()
Both solutions return same output if per groups there are same values.
如果每组有相同的值,两种解决方案都会返回相同的输出。
EDIT:
编辑:
If need convert string (object) column to numeric first try convert by astype
:
如果需要将字符串(对象)列转换为数字,请先尝试通过astype
以下方式转换:
df['Weight(Pounds)'] = df['Weight(Pounds)'].astype(float)
And if it failed use to_numeric
with parameter errors='coerce'
for convert non parseable strings to NaN
s:
如果它使用to_numeric
参数errors='coerce'
将不可解析的字符串转换为NaN
s失败:
df['Weight(Pounds)'] = pd.to_numeric(df['Weight(Pounds)'], errors='coerce')
回答by jpp
You can aggregate with a different function for each column using groupby
+ agg
, together with a dictionary mapping series to functions. For example:
您可以使用groupby
+为每一列聚合不同的函数agg
,以及将系列映射到函数的字典。例如:
d = {'Rating': 'mean',
'NumsHelpful': 'mean',
'Review': ' | '.join,
'Weight(Pounds)': 'first'}
res = df.groupby('Name').agg(d)