Python 将列添加到包含先前数据平均值的 Pandas DataFrame 的末尾

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31698861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:26:43  来源:igfitidea点击:

Add column to the end of Pandas DataFrame containing average of previous data

pythonpandasdataframecalculated-columns

提问by LinnK

I have a DataFrame ave_datathat contains the following:

我有一个ave_data包含以下内容的数据帧:

ave_data

Time        F7           F8            F9  
00:00:00    43.005593    -56.509746    25.271271  
01:00:00    55.114918    -59.173852    31.849262  
02:00:00    63.990762    -64.699492    52.426017

I want to add another column to this dataframe, containing the average of the values under column F7, F8 and F9 for each row.

我想向此数据框中添加另一列,其中包含每行 F7、F8 和 F9 列下值的平均值。

The ave_dataDataFrame might change size as my code reads from different Excel files later, so the method needs to be generic (i.e add the column containing the average always as the last column in the DataFrame, not in column number 4)

ave_data我的代码不同的Excel文件中读取数据帧后可能会改变大小,因此该方法需要是通用的(即添加包含平均列总是在数据帧的最后一列,不列号4)

desired output

Time        F7           F8            F9           Average
00:00:00    43.005593    -56.509746    25.271271    4.25  
01:00:00    55.114918    -59.173852    31.849262    9.26
02:00:00    63.990762    -64.699492    52.426017    17.24

采纳答案by EdChum

You can take a copy of your df using copy()and then just call meanand pass params axis=1and numeric_only=Trueso that the mean is calculated row-wise and to ignore non-numeric columns, when you do the following the column is always added at the end:

您可以使用 df 的副本,copy()然后调用mean并传递参数axis=1numeric_only=True以便按行计算平均值并忽略非数字列,当您执行以下操作时,列始终添加到末尾:

In [68]:

summary_ave_data = df.copy()
summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Out[68]:
                 Time         F7         F8         F9    average
0 2015-07-29 00:00:00  43.005593 -56.509746  25.271271   3.922373
1 2015-07-29 01:00:00  55.114918 -59.173852  31.849262   9.263443
2 2015-07-29 02:00:00  63.990762 -64.699492  52.426017  17.239096

回答by kt-0

@LaangeHaare or anyone else who is curious, I just tested it and the copy part of the accepted answer seems unnecessary (maybe I am missing something...)

@LaangeHaare 或其他任何好奇的人,我只是对其进行了测试,已接受答案的副本部分似乎没有必要(也许我遗漏了一些东西...)

so you could simplify this with:

所以你可以简化这个:

df['average'] = df.mean(numeric_only=True, axis=1)

I would have simply added this as a comment but don't have the reputation

我会简单地将其添加为评论,但没有声誉

回答by Sergey Zaitsev

In common case if you would like to use specific columns, you can use:

通常情况下,如果您想使用特定的列,您可以使用:

df['average'] = df[['F7','F8']].mean(axis=1)

where axis=1 stands for rowwise action (using column values for each row to calculate the mean in 'average' column)

其中axis=1 代表rowwise action(使用每一行的列值来计算'average'列中的平均值)

Then you may want to sort by this column:

那么您可能希望按此列排序:

df.sort_values(by='average',ascending=False, inplace=True)

where inplace=True stands for applying action to dataframe instead of calculating on the copy.

其中 inplace=True 代表对数据帧应用操作而不是对副本进行计算。

回答by johnDanger

df.assignis specifically for this purpose. It returns a copy to avoid changing the original dataframe and/or raising SettingWithCopyWarning. It works as follows:

df.assign专门用于此目的。它返回一个副本以避免更改原始数据帧和/或提高SettingWithCopyWarning. 它的工作原理如下:

data_with_ave = ave_data.assign(average = ave_data.mean(axis=1, numeric_only=True))

This function can also create multiple columns at the same time:

该函数还可以同时创建多个列:

data_with_ave = ave_data.assign(
                    average = ave_data.mean(axis=1, numeric_only=True),
                    median = ave_data.median(axis=1, numeric_only=True)
)

As of pandas 0.36, you can even reference a column just created to create another:

从 pandas 0.36 开始,您甚至可以引用刚刚创建的列来创建另一个列:

data_with_ave = ave_data.assign(
                    average = ave_data.mean(axis=1, numeric_only=True),
                    isLarge = lambda df: df['average'] > 10
)