Pandas:对不同的列应用不同的函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26434123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: apply different functions to different columns
提问by pbreach
When using df.mean()I get a result where the mean for each column is given. Now let's say I want the mean of the first column, and the sum of the second. Is there a way to do this? I don't want to have to disassemble and reassemble the DataFrame.
使用时,df.mean()我得到一个结果,其中给出了每列的平均值。现在假设我想要第一列的平均值和第二列的总和。有没有办法做到这一点?我不想拆卸和重新组装DataFrame.
My initial idea was to do something along the lines of pandas.groupby.agg()like so:
我最初的想法是做一些pandas.groupby.agg()类似这样的事情:
df = pd.DataFrame(np.random.random((10,2)), columns=['A','B'])
df.apply({'A':np.mean, 'B':np.sum}, axis=0)
Traceback (most recent call last):
File "<ipython-input-81-265d3e797682>", line 1, in <module>
df.apply({'A':np.mean, 'B':np.sum}, axis=0)
File "C:\Users\Patrick\Anaconda\lib\site-packages\pandas\core\frame.py", line 3471, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "C:\Users\Patrick\Anaconda\lib\site-packages\pandas\core\frame.py", line 3560, in _apply_standard
results[i] = func(v)
TypeError: ("'dict' object is not callable", u'occurred at index A')
But clearly this doesn't work. It seems like passing a dict would be an intuitive way of doing this, but is there another way (again without disassembling and reassembling the DataFrame)?
但显然这行不通。似乎传递 dict 将是一种直观的方式来做到这一点,但还有另一种方式(同样无需拆卸和重新组装DataFrame)?
采纳答案by rocarvaj
I think you can use the aggmethod with a dictionary as the argument. For example:
我认为您可以使用agg带有字典作为参数的方法。例如:
df = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]})
df =
A B
0 0 3
1 1 4
2 2 5
df.agg({'A': 'mean', 'B': sum})
A 1.0
B 12.0
dtype: float64
回答by Bill Letson
You can try a closure:
您可以尝试关闭:
def multi_func(functions):
def f(col):
return functions[col.name](col)
return f
df = pd.DataFrame(np.random.random((10, 2)), columns=['A', 'B'])
result = df.apply(multi_func({'A': np.mean, 'B': np.sum}))
回答by Pedro M Duarte
Just faced this situation myself and came up with the following:
刚刚自己面对这种情况,并提出以下几点:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([['one', 'two'], ['three', 'four'], ['five', 'six']],
...: columns=['A', 'B'])
In [3]: df
Out[3]:
A B
0 one two
1 three four
2 five six
In [4]: converters = {'A': lambda x: x[:1], 'B': lambda x: x.replace('o', '')}
In [5]: new = pd.DataFrame.from_dict({col: series.apply(converters[col])
...: if col in converters else series
...: for col, series in df.iteritems()})
In [6]: new
Out[6]:
A B
0 o tw
1 t fur
2 f six

