pandas 某些列的熊猫平均值

Question

提问by Keithx

I have a pandas dataframe like that:

我有一个像这样的Pandas数据框：

How can I able to calculate mean (min/max, median) for specific column if Cluster==1 or CLuster==2?

如果 Cluster==1 或 CLuster==2，我如何能够计算特定列的平均值（最小值/最大值、中位数）？

Thanks!

谢谢！

Answer 1

回答by Yaron

You can create new df with only the relevant rows, using:

您可以使用以下方法创建仅包含相关行的新 df：

newdf = df[df['cluster'].isin([1,2)]

newdf.mean(axis=1)

In order to calc mean of a specfic column you can:

为了计算特定列的平均值，您可以：

newdf["page"].mean(axis=1)

Answer 2

回答by sparc_spread

If you meant take the mean only where Cluster is 1 or 2, then the other answers here address your issue. If you meant take a separate mean for each value of Cluster, you can use pandas' aggregation functions, including groupybyand agg:

如果您的意思是仅在 Cluster 为 1 或 2 时取平均值，那么此处的其他答案可以解决您的问题。如果您的意思是对 Cluster 的每个值采用单独的平均值，则可以使用 pandas 的聚合函数，包括groupyby和agg：

df.groupby("Cluster").mean()

is the simplest and will take means of all columns, grouped by Cluster.

是最简单的，将采用按集群分组的所有列。

df.groupby("Cluster").agg({"duration" : np.mean})

is an example where you are taking the mean of just one specific column, grouped by cluster. You can also use np.min, np.max, np.median, etc.

是一个示例，您只取一个特定列的平均值，按集群分组。你也可以使用np.min，np.max，np.median，等。

The groupbymethod produces a GroupByobject, which is something like but not like a DataFrame. Think of it as the DataFramegrouped, waiting for aggregation to be applied to it. The GroupByobject has simple built-in aggregation functions that apply to all columns (the mean()in the first example), and also a more general aggregation function (the agg()in the second example) that you can use to apply specific functions in a variety of ways. One way of using it is passing a dictof column names keyed to functions, so specific functions can be applied to specific columns.

该groupby方法产生一个GroupBy对象，它类似于但不像 a DataFrame。将其视为DataFrame分组，等待对其应用聚合。该GroupBy对象具有适用于所有列的简单内置聚合函数（mean()第一个示例中的），以及更通用的聚合函数（agg()第二个示例中的），您可以使用它以多种方式应用特定函数。使用它的一种方法是将 a dictof 列名传递给函数，因此可以将特定函数应用于特定列。

Answer 3

回答by evan54

Simple intuitive answer

简单直观的答案

First pick the rows of interest, then average then pick the columns of interest.

首先选择感兴趣的行，然后平均然后选择感兴趣的列。

clusters_of_interest = [1, 2]
columns_of_interest = ['page']

# rows of interest
newdf = df[ df.CLUSTER.isin(clusters_of_interest) ]
# average and pick columns of interest
newdf.mean(axis=0)[ columns_of_interest ]

More advanced

更先进

# Create groups object according to the value in the 'cluster' column
grp = df.groupby('CLUSTER')
# apply functions of interest to all cluster groupings
data_agg = grp.agg( ['mean' , 'max' , 'min' ] )

This is also a good linkwhich describes aggregation techniques. It should be noted that the "simple answer" averages over clusters 1 AND 2 or whatever is specified in the clusters_of_interestwhile the .aggfunction averages over each group of values having the same CLUSTERvalue.

这也是一个很好的链接，它描述了聚合技术。应该注意的是，“简单答案”对集群 1 和 2 或任何指定的集群clusters_of_interest求.agg平均值，而函数对具有相同CLUSTER值的每组值求平均值。

Answer 4

回答by jotasi

You can do it in one line, using boolean indexing. For example you can do something like:

您可以使用boolean indexing在一行中完成。例如，您可以执行以下操作：

import numpy as np
import pandas as pd

# This will just produce an example DataFrame
df = pd.DataFrame({'a':np.arange(30), 'Cluster':np.ones(30,dtype=np.int)})
df.loc[10:19, "Cluster"] *= 2
df.loc[20:,   "Cluster"] *= 3

# This line is all you need
df.loc[(df['Cluster']==1)|(df['Cluster']==2), 'a'].mean()

The boolean indexing array is Truefor the correct clusters. ais just the name of the column to compute the mean over.

布尔索引数组True用于正确的集群。a只是计算平均值的列的名称。

pandas 某些列的熊猫平均值

提问by Keithx

回答by Yaron

回答by sparc_spread

回答by evan54

Simple intuitive answer

简单直观的答案

More advanced

更先进

回答by jotasi

相关推荐

最近更新

标签

pandas 某些列的熊猫平均值

提问by Keithx

回答by Yaron

回答by sparc_spread

回答by evan54

Simple intuitive answer

简单直观的答案

More advanced

更先进

回答by jotasi

相关推荐

Pandas 数据框：ValueError: num must be 1 <= num <= 0, not 1

在 Pandas 数据框中的列子集中查找具有非零值的行

pandas 熊猫数据框 fillna() 不起作用？

使用 Pandas 的欧几里德距离矩阵

相关推荐

最近更新

标签