聚合行 Pandas

Question

提问by Stefano Pozzi

I am quite new to pandas. I need to aggregate 'Names'if they have the same name and then make an average for 'Rating'and 'NumsHelpful'(without counting NaN). 'Review'should get concatenated whilst 'Weight(Pounds)'should remain untouched:

我对pandas. 我需要汇总'Names'它们是否具有相同的名称，然后为'Rating'和'NumsHelpful'（不计算NaN）求平均值。'Review'应该被连接，而'Weight(Pounds)'应该保持不变：

col names: ['Brand', 'Name', 'NumsHelpful', 'Rating', 'Weight(Pounds)', 'Review']

Name             'Brand'                             'Name'
1534             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1535             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1536             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1537             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1538             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1539             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   
1540             Zing Zang                Zing Zang Bloody Mary Mix, 32 fl oz   

        'NumsHelpful'     'Rating'       'Weight'
1534          NaN            2              4.5   
1535          NaN            2              4.5   
1536          NaN            NaN            4.5   
1537          NaN            NaN            4.5   
1538          2              NaN            4.5   
1539          3              5              4.5   
1540          5              NaN            4.5   

                        'Review'
1534                                     Yummy - Delish  
1535  The best Bloody Mary mix! - The best Bloody Ma...  
1536  Best Taste by far - I've tried several if not ...  
1537  Best bloody mary mix ever - This is also good ...  
1538  Outstanding - Has a small kick to it but very ...  
1539   OMG! So Good! - Spicy, terrific Bloody Mary mix!  
1540                      Good stuff - This is the best

So the output should be something like this:

所以输出应该是这样的：

 'Brand'                'Name'                   'NumsHelpful'    'Rating' 
Zing Zang    Zing Zang Bloody Mary Mix, 32 fl oz     3.33             3

 'Weight'               'Review'
   4.5      Review1 / Review2 / ... / ReviewN

How shall I procede? Thanks.

我该如何进行？谢谢。

Answer 1

回答by jezrael

Use DataFrameGroupBy.aggwith dictionary of columns and aggregated functions - columns Weightand Brandare agregated by first- it means first values per groups:

使用DataFrameGroupBy.agg的列的字典和聚合函数-列Weight，并Brand通过agregated first-这意味着每个组第一值：

d = {'NumsHelpful':'mean', 
     'Review':'/'.join, 
     'Weight':'first',
     'Brand':'first', 
     'Rating':'mean'}
df = df.groupby('Name').agg(d).reset_index()
print (df)
                                  Name  NumsHelpful  \
0  Zing Zang Bloody Mary Mix, 32 fl oz     3.333333   

                                              Review  Weight      Brand  \
0  Yummy - Delish/The best Bloody Mary mix! - The...     4.5  Zing Zang   

   Rating  
0     3.0

Also in pandas 0.23.1 pandas version get:

同样在Pandas 0.23.1 Pandas版本中获得：

FutureWarning: 'Name' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version

FutureWarning: 'Name' 既是索引级别又是列标签。默认为列，但这会在未来版本中引发歧义错误

Solution is remove index name Name:

解决方案是删除索引名称Name：

df.index.name = None

Or:

或者：

df = df.rename_axis(None)

Another possible solution is not aggregate by first, but add these column to groupby:

另一种可能的解决方案不是由聚合first，而是将这些列添加到groupby：

d = {'NumsHelpful':'mean',  'Review':'/'.join, 'Rating':'mean'}
df = df.groupby(['Name', 'Weight','Brand']).agg(d).reset_index()

Both solutions return same output if per groups there are same values.

如果每组有相同的值，两种解决方案都会返回相同的输出。

EDIT:

编辑：

If need convert string (object) column to numeric first try convert by astype:

如果需要将字符串（对象）列转换为数字，请先尝试通过astype以下方式转换：

df['Weight(Pounds)'] = df['Weight(Pounds)'].astype(float)

And if it failed use to_numericwith parameter errors='coerce'for convert non parseable strings to NaNs:

如果它使用to_numeric参数errors='coerce'将不可解析的字符串转换为NaNs失败：

df['Weight(Pounds)'] = pd.to_numeric(df['Weight(Pounds)'], errors='coerce')

Answer 2

回答by jpp

You can aggregate with a different function for each column using groupby+ agg, together with a dictionary mapping series to functions. For example:

您可以使用groupby+为每一列聚合不同的函数agg，以及将系列映射到函数的字典。例如：

d = {'Rating': 'mean',
     'NumsHelpful': 'mean',
     'Review': ' | '.join,
     'Weight(Pounds)': 'first'}

res = df.groupby('Name').agg(d)

聚合行 Pandas

提问by Stefano Pozzi

回答by jezrael

回答by jpp

相关推荐

最近更新

标签

聚合行 Pandas

提问by Stefano Pozzi

回答by jezrael

回答by jpp

相关推荐

pandas 如何选择数据框中的特定列？

使用 for 循环替换 pandas 列的每一行中的单元格值

pandas 基于具有特定值的行创建一个新的数据框

pandas 如何将列中的值更改为二进制？

相关推荐

最近更新

标签