Pandas 如何将多个函数应用于数据框

Question

提问by devan0

Is there a way to apply a list of functions to each column in a DataFrame like the DataFrameGroupBy.agg function does? I found an ugly way to do it like this:

有没有办法像 DataFrameGroupBy.agg 函数那样将函数列表应用于 DataFrame 中的每一列？我找到了一种丑陋的方法来做到这一点：

df=pd.DataFrame(dict(one=np.random.uniform(0,10,100), two=np.random.uniform(0,10,100)))
df.groupby(np.ones(len(df))).agg(['mean','std'])

        one                 two
       mean       std      mean       std
1  4.802849  2.729528  5.487576  2.890371

Answer 1

回答by unutbu

For Pandas 0.20.0 or newer, use df.agg(thanks to ayhan for pointing this out):

对于 Pandas 0.20.0 或更新版本，使用df.agg（感谢 ayhan指出这一点）：

In [11]: df.agg(['mean', 'std'])
Out[11]: 
           one       two
mean  5.147471  4.964100
std   2.971106  2.753578

For older versions, you could use

对于旧版本，您可以使用

In [61]: df.groupby(lambda idx: 0).agg(['mean','std'])
Out[61]: 
        one               two          
       mean       std    mean       std
0  5.147471  2.971106  4.9641  2.753578

Another way would be:

另一种方法是：

In [68]: pd.DataFrame({col: [getattr(df[col], func)() for func in ('mean', 'std')] for col in df}, index=('mean', 'std'))
Out[68]: 
           one       two
mean  5.147471  4.964100
std   2.971106  2.753578

Answer 2

回答by Doctor J

In the general case where you have arbitrary functions and column names, you could do this:

在您拥有任意函数和列名的一般情况下，您可以这样做：

df.apply(lambda r: pd.Series({'mean': r.mean(), 'std': r.std()})).transpose()

         mean       std
one  5.366303  2.612738
two  4.858691  2.986567

Answer 3

回答by Souvik Daw

I tried to apply three functions into a column and it works

我试图将三个函数应用到一个列中，它起作用了

#removing new line character
rem_newline = lambda x : re.sub('\n',' ',x).strip()

#character lower and removing spaces
lower_strip = lambda x : x.lower().strip()

df = df['users_name'].apply(lower_strip).apply(rem_newline).str.split('(',n=1,expand=True)

Answer 4

回答by Sergio Lucero

I am using pandas to analyze Chilean legislation drafts. In my dataframe, the list of authors are stored as a string. The answer above did not work for me (using pandas 0.20.3). So I used my own logic and came up with this:

我正在使用Pandas来分析智利的立法草案。在我的数据框中，作者列表存储为字符串。上面的答案对我不起作用（使用Pandas 0.20.3）。所以我使用了我自己的逻辑并想出了这个：

df.authors.apply(eval).apply(len).sum()

Concatenated applies! A pipeline!! The first apply transforms

串联适用！一个管道！！第一次应用转换

"['Barros Montero: Ramón', 'Bellolio Avaria: Jaime', 'Gahona Salazar: Sergio']"

into the obvious list, the second apply counts the number of lawmakers involved in the project. I want the size of every pair (lawmaker, project number) (so I can presize an array where I will study which parties work on what).

进入明显的名单，第二个申请计算参与项目的立法者人数。我想要每一对的大小（立法者，项目编号）（所以我可以预先确定一个数组，我将研究哪些政党在做什么）。

Interestingly, this works! Even more interestingly, that last call fails if one gets too ambitious and does this instead:

有趣的是，这有效！更有趣的是，如果一个人过于雄心勃勃并且这样做，最后一次调用就会失败：

df.autores.apply(eval).apply(len).apply(sum)

with an error:

有错误：

TypeError: 'int' object is not iterable

coming from deep within /site-packages/pandas/core/series.py in apply

来自 /site-packages/pandas/core/series.py 的深层应用

Pandas 如何将多个函数应用于数据框

提问by devan0

回答by unutbu

回答by Doctor J

回答by Souvik Daw

回答by Sergio Lucero

相关推荐

最近更新

标签

Pandas 如何将多个函数应用于数据框

提问by devan0

回答by unutbu

回答by Doctor J

回答by Souvik Daw

回答by Sergio Lucero

相关推荐

将 datetime64 列拆分为 Pandas 数据框中的日期和时间列

pandas 相当于 Python 熊猫的 R 视图

pandas dtype 从对象到字符串的转换

Pandas：使用数据帧的多列作为另一个的索引

相关推荐

最近更新

标签