Pandas 如何将多个函数应用于数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22128218/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas how to apply multiple functions to dataframe
提问by devan0
Is there a way to apply a list of functions to each column in a DataFrame like the DataFrameGroupBy.agg function does? I found an ugly way to do it like this:
有没有办法像 DataFrameGroupBy.agg 函数那样将函数列表应用于 DataFrame 中的每一列?我找到了一种丑陋的方法来做到这一点:
df=pd.DataFrame(dict(one=np.random.uniform(0,10,100), two=np.random.uniform(0,10,100)))
df.groupby(np.ones(len(df))).agg(['mean','std'])
        one                 two
       mean       std      mean       std
1  4.802849  2.729528  5.487576  2.890371
回答by unutbu
For Pandas 0.20.0 or newer, use df.agg(thanks to ayhan for pointing this out):
对于 Pandas 0.20.0 或更新版本,使用df.agg(感谢 ayhan指出这一点):
In [11]: df.agg(['mean', 'std'])
Out[11]: 
           one       two
mean  5.147471  4.964100
std   2.971106  2.753578
For older versions, you could use
对于旧版本,您可以使用
In [61]: df.groupby(lambda idx: 0).agg(['mean','std'])
Out[61]: 
        one               two          
       mean       std    mean       std
0  5.147471  2.971106  4.9641  2.753578
Another way would be:
另一种方法是:
In [68]: pd.DataFrame({col: [getattr(df[col], func)() for func in ('mean', 'std')] for col in df}, index=('mean', 'std'))
Out[68]: 
           one       two
mean  5.147471  4.964100
std   2.971106  2.753578
回答by Doctor J
In the general case where you have arbitrary functions and column names, you could do this:
在您拥有任意函数和列名的一般情况下,您可以这样做:
df.apply(lambda r: pd.Series({'mean': r.mean(), 'std': r.std()})).transpose()
         mean       std
one  5.366303  2.612738
two  4.858691  2.986567
回答by Souvik Daw
I tried to apply three functions into a column and it works
我试图将三个函数应用到一个列中,它起作用了
#removing new line character
rem_newline = lambda x : re.sub('\n',' ',x).strip()
#character lower and removing spaces
lower_strip = lambda x : x.lower().strip()
df = df['users_name'].apply(lower_strip).apply(rem_newline).str.split('(',n=1,expand=True)
回答by Sergio Lucero
I am using pandas to analyze Chilean legislation drafts. In my dataframe, the list of authors are stored as a string. The answer above did not work for me (using pandas 0.20.3). So I used my own logic and came up with this:
我正在使用Pandas来分析智利的立法草案。在我的数据框中,作者列表存储为字符串。上面的答案对我不起作用(使用Pandas 0.20.3)。所以我使用了我自己的逻辑并想出了这个:
df.authors.apply(eval).apply(len).sum()
Concatenated applies! A pipeline!! The first apply transforms
串联适用!一个管道!!第一次应用转换
"['Barros Montero: Ramón', 'Bellolio Avaria: Jaime', 'Gahona Salazar: Sergio']"
into the obvious list, the second apply counts the number of lawmakers involved in the project. I want the size of every pair (lawmaker, project number) (so I can presize an array where I will study which parties work on what).
进入明显的名单,第二个申请计算参与项目的立法者人数。我想要每一对的大小(立法者,项目编号)(所以我可以预先确定一个数组,我将研究哪些政党在做什么)。
Interestingly, this works! Even more interestingly, that last call fails if one gets too ambitious and does this instead:
有趣的是,这有效!更有趣的是,如果一个人过于雄心勃勃并且这样做,最后一次调用就会失败:
df.autores.apply(eval).apply(len).apply(sum)
with an error:
有错误:
TypeError: 'int' object is not iterable
coming from deep within /site-packages/pandas/core/series.py in apply
来自 /site-packages/pandas/core/series.py 的深层应用

