Pandas:同名列的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40311987/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Mean of columns with the same names
提问by user3635284
I have a dataframe with columns like:
我有一个包含如下列的数据框:
['id','name','foo1', 'foo1', 'foo1', 'foo2','foo2', 'foo3']
I would like to get a new dataframe where columns sharing the same name are averaged:
我想获得一个新的数据框,其中共享相同名称的列被平均:
['id','name','foo1', 'foo2','foo3']
Here column foo1 would be the average of the three columns named foo1 in the original dataframe, foo2 would be the average of the two columns named foo2 and foo3 would be just foo3
这里 foo1 列将是原始数据帧中名为 foo1 的三列的平均值,foo2 将是名为 foo2 的两列的平均值,而 foo3 将只是 foo3
Note: id and name are not numeric and I have to keep them.
注意:id 和 name 不是数字,我必须保留它们。
采纳答案by rojeeer
The basic idea is that you can group by your columns names and do mean operations for each group.
基本思想是您可以按列名称分组并对每个组进行平均操作。
I saw some comments for your question and tried to give you different ways to achieve the goal. (Solution (3) is the best I found!)
我看到了您的问题的一些评论,并试图为您提供实现目标的不同方法。(解决方案(3)是我发现的最好的!)
(1) Quick solution. If you have very limited columns that are non-numeric, and own unique names, e.g., columns id
and name
. What you can do is:
(1) 快速解决。如果您的非数字列非常有限,并且拥有唯一名称,例如列id
和name
. 你可以做的是:
First set index ['id', 'name']
to preserve them,
首先设置索引['id', 'name']
以保留它们,
df = df.set_index(['id', 'name'])
then use DataFrame.groupby
function on columns
, set axis=1
(iterate over each column), apply mean
function for each group.
然后使用DataFrame.groupby
函数 on columns
,设置axis=1
(迭代每列),mean
为每个组应用函数。
df.groupby(by=df.columns, axis=1).mean()
And finally, reset index to recover ['id', 'name']
columns
最后,重置索引以恢复['id', 'name']
列
df = df.reset_index()
Here is a sample code:
这是一个示例代码:
In [35]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))
In [36]: df = df.set_index(['id', 'name'])
In [37]: df = df.groupby(by=df.columns, axis=1).mean()
In [38]: df = df.reset_index()
In [39]: df
Out[39]:
id name c1 c2 c3
0 001 a 1 55 1000
1 002 b 2 110 2000
(2) Complete solution. If you have lots of columns that are non-numeric and unique named, what you can do is:
(2)完整的解决方案。如果您有许多非数字且唯一命名的列,您可以做的是:
First transpose you dataframe,
首先转置你的数据帧,
df2 = df.transpose()
Then you do group by operations (on its index and axis=0
), but carefully handle each groups: for these numeric groups, return their mean value; and for these non-numeric groups, return their first row:
然后你按操作(在它的索引和axis=0
)上进行分组,但仔细处理每个组:对于这些数字组,返回它们的平均值;对于这些非数字组,返回它们的第一行:
df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])
And finally, transpose back:
最后,转回:
df = df2.transpose()
Here is sample of code:
这是代码示例:
In [98]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))
In [99]: df2 = df.transpose()
In [100]: df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])
In [101]: df3 = df2.transpose()
In [102]: df3
Out[102]:
c1 c2 c3 id name
A 1 55 1000 001 a
B 2 110 2000 002 b
In [103]: df
Out[103]:
id name c1 c2 c2 c3
A 001 a 1 10 100 1000
B 002 b 2 20 200 2000
You need to import numbers
你需要 import numbers
More notes:
更多笔记:
(3) All in one! This solution is the best I found:
(3) 合而为一!这个解决方案是我发现的最好的:
df.groupby(by=df.columns, axis=1).apply(lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0])
I tried to handle each group for the un-transposed groups, that is,
我尝试处理未转置组的每个组,即
df.groupby(by=df.columns, axis=1).apply(gf)
And
和
gf = lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0]
I failed before, because I do not carefully hand the axis. You must set axis=1
for mean
function, and return columns for non-numeric groups.
我之前失败了,因为我没有仔细地手轴。您必须axis=1
为mean
函数设置,并为非数字组返回列。
Thanks!
谢谢!