Pandas - 可以使用两种不同的聚合来聚合两列吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/18837659/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - possible to aggregate two columns using two different aggregations?
提问by marcus adamski
I'm loading a csv file, which has the following columns: date, textA, textB, numberA, numberB
我正在加载一个 csv 文件,其中包含以下列:date、textA、textB、numberA、numberB
I want to group by the columns: date, textA and textB - but want to apply "sum" to numberA, but "min" to numberB.
我想按列分组:date、textA 和 textB - 但想将“sum”应用于 numberA,而将“min”应用于 numberB。
data = pd.read_table("file.csv", sep=",", thousands=',')
grouped = data.groupby(["date", "textA", "textB"], as_index=False)
...but I cannot see how to then apply two different aggregate functions, to two different columns?
I.e. sum(numberA), min(numberB)
...但我看不到如何将两个不同的聚合函数应用于两个不同的列?IEsum(numberA), min(numberB)
回答by unutbu
The aggmethod can accept a dict, in which case the keys indicate the column to which the function is applied:
该agg方法可以接受一个 dict,在这种情况下,键指示应用该函数的列:
grouped.agg({'numberA':'sum', 'numberB':'min'})
For example,
例如,
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'number A': np.arange(8),
                   'number B': np.arange(8) * 2})
grouped = df.groupby('A')
print(grouped.agg({
    'number A': 'sum',
    'number B': 'min'}))
yields
产量
     number B  number A
A                      
bar         2         9
foo         0        19
This also shows that Pandas can handle spaces in column names. I'm not sure what the origin of the problem was, but literal spaces should not have posed a problem. If you wish to investigate this further,
这也表明 Pandas 可以处理列名中的空格。我不确定问题的根源是什么,但文字空间不应该造成问题。如果你想进一步调查,
print(df.columns)
without reassigning the column names, will show show us the reprof the names. Maybe there was a hard-to-see character in the column name that looked like a space (or some other character) but was actually a u'\xa0'(NO-BREAK SPACE), for example.
不重新分配列名,将向我们展示repr名称。例如,列名中可能有一个难以看到的字符,它看起来像一个空格(或其他一些字符),但实际上是一个u'\xa0'(NO-BREAK SPACE)。

