Python 将列添加到包含先前数据平均值的 Pandas DataFrame 的末尾
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31698861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Add column to the end of Pandas DataFrame containing average of previous data
提问by LinnK
I have a DataFrame ave_data
that contains the following:
我有一个ave_data
包含以下内容的数据帧:
ave_data
Time F7 F8 F9
00:00:00 43.005593 -56.509746 25.271271
01:00:00 55.114918 -59.173852 31.849262
02:00:00 63.990762 -64.699492 52.426017
I want to add another column to this dataframe, containing the average of the values under column F7, F8 and F9 for each row.
我想向此数据框中添加另一列,其中包含每行 F7、F8 和 F9 列下值的平均值。
The ave_data
DataFrame might change size as my code reads from different Excel files later, so the method needs to be generic (i.e add the column containing the average always as the last column in the DataFrame, not in column number 4)
在ave_data
我的代码不同的Excel文件中读取数据帧后可能会改变大小,因此该方法需要是通用的(即添加包含平均列总是在数据帧的最后一列,不列号4)
desired output
Time F7 F8 F9 Average
00:00:00 43.005593 -56.509746 25.271271 4.25
01:00:00 55.114918 -59.173852 31.849262 9.26
02:00:00 63.990762 -64.699492 52.426017 17.24
采纳答案by EdChum
You can take a copy of your df using copy()
and then just call mean
and pass params axis=1
and numeric_only=True
so that the mean is calculated row-wise and to ignore non-numeric columns, when you do the following the column is always added at the end:
您可以使用 df 的副本,copy()
然后调用mean
并传递参数axis=1
,numeric_only=True
以便按行计算平均值并忽略非数字列,当您执行以下操作时,列始终添加到末尾:
In [68]:
summary_ave_data = df.copy()
summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Out[68]:
Time F7 F8 F9 average
0 2015-07-29 00:00:00 43.005593 -56.509746 25.271271 3.922373
1 2015-07-29 01:00:00 55.114918 -59.173852 31.849262 9.263443
2 2015-07-29 02:00:00 63.990762 -64.699492 52.426017 17.239096
回答by kt-0
@LaangeHaare or anyone else who is curious, I just tested it and the copy part of the accepted answer seems unnecessary (maybe I am missing something...)
@LaangeHaare 或其他任何好奇的人,我只是对其进行了测试,已接受答案的副本部分似乎没有必要(也许我遗漏了一些东西...)
so you could simplify this with:
所以你可以简化这个:
df['average'] = df.mean(numeric_only=True, axis=1)
I would have simply added this as a comment but don't have the reputation
我会简单地将其添加为评论,但没有声誉
回答by Sergey Zaitsev
In common case if you would like to use specific columns, you can use:
通常情况下,如果您想使用特定的列,您可以使用:
df['average'] = df[['F7','F8']].mean(axis=1)
where axis=1 stands for rowwise action (using column values for each row to calculate the mean in 'average' column)
其中axis=1 代表rowwise action(使用每一行的列值来计算'average'列中的平均值)
Then you may want to sort by this column:
那么您可能希望按此列排序:
df.sort_values(by='average',ascending=False, inplace=True)
where inplace=True stands for applying action to dataframe instead of calculating on the copy.
其中 inplace=True 代表对数据帧应用操作而不是对副本进行计算。
回答by johnDanger
df.assign
is specifically for this purpose. It returns a copy to avoid changing the original dataframe and/or raising SettingWithCopyWarning
. It works as follows:
df.assign
专门用于此目的。它返回一个副本以避免更改原始数据帧和/或提高SettingWithCopyWarning
. 它的工作原理如下:
data_with_ave = ave_data.assign(average = ave_data.mean(axis=1, numeric_only=True))
This function can also create multiple columns at the same time:
该函数还可以同时创建多个列:
data_with_ave = ave_data.assign(
average = ave_data.mean(axis=1, numeric_only=True),
median = ave_data.median(axis=1, numeric_only=True)
)
As of pandas 0.36, you can even reference a column just created to create another:
从 pandas 0.36 开始,您甚至可以引用刚刚创建的列来创建另一个列:
data_with_ave = ave_data.assign(
average = ave_data.mean(axis=1, numeric_only=True),
isLarge = lambda df: df['average'] > 10
)