Python 计算熊猫数据框中选定列的选定行的平均值

Question

提问by impossible

I have pandas df with say, 100 rows, 10 columns, (actual data is huge). I also have row_index list which contains, which rows to be considered to take mean. I want to calculate mean on say columns 2,5,6,7 and 8. Can we do it with some function for dataframe object?

我有熊猫 df 说，100 行，10 列，（实际数据很大）。我也有 row_index 列表，其中包含哪些行被认为是平均的。我想计算第 2、5、6、7 和 8 列的平均值。我们可以用一些数据框对象的函数来做吗？

What I know is do a for loop, get value of row for each element in row_index and keep doing mean. Do we have some direct function where we can pass row_list, and column_list and axis, for ex df.meanAdvance(row_list,column_list,axis=0)?

我所知道的是做一个 for 循环，获取 row_index 中每个元素的行值并保持平均。我们是否有一些直接的函数，我们可以在其中传递 row_list、column_list 和轴，例如df.meanAdvance(row_list,column_list,axis=0)？

I have seen DataFrame.mean() but it didn't help I guess.

我见过 DataFrame.mean() 但我猜它没有帮助。

  a b c d q 
0 1 2 3 0 5
1 1 2 3 4 5
2 1 1 1 6 1
3 1 0 0 0 0

I want mean of 0, 2, 3rows for each a, b, dcolumns

我想要0, 2, 3每a, b, d列的行数

  a b d
0 1 1 2

Answer 1

采纳答案by PdevG

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

要选择数据框的行，您可以使用 iloc，然后您可以使用方括号选择所需的列。

For example:

例如：

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

gives the following dataframe:

给出以下数据框：

to select only the 3d and fifth row you can do:

要仅选择 3d 和第五行，您可以执行以下操作：

df.iloc[[2,4]]

which returns:

返回：

   a  b  c
5  1  2  3
7  1  2  3

if you then want to select only columns b and c you use the following command:

如果您只想选择列 b 和 c，则使用以下命令：

df[['b', 'c']].iloc[[2,4]]

which yields:

产生：

   b  c
5  2  3
7  2  3

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

然后，您可以使用 df.mean 函数来获得数据帧的这个子集的平均值。如果你想要列的平均值，你可以指定axis=0，如果你想要行的平均值，你可以指定axis=1

thus:

因此：

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

returns:

返回：

b    2
c    3

As we should expect from the input dataframe.

正如我们对输入数据帧所期望的那样。

For your code you can then do:

对于您的代码，您可以执行以下操作：

 df[column_list].iloc[row_index_list].mean(axis=0)

EDIT after comment: New question in comment: I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

评论后编辑：评论中的新问题：我必须将这些方法存储在另一个 df/matrix 中。我有 L1、L2、L3、L4...LX 列表，它告诉我我需要列 C[1,2,3] 的平均值的索引。例如： L1 = [0, 2, 3] ，意味着我需要行 0,2,3 的平均值并将其存储在新 df/matrix 的第一行中。然后 L2 = [1,4] 为此我将再次计算平均值并将其存储在新 df/matrix 的第二行中。同样，直到 LX，我希望新的 df 具有 X 行和 len(C) 列。L1..LX 的列将保持不变。你能帮我解决这个问题吗？

Answer:

回答：

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

如果我理解正确，下面的代码应该可以解决问题（与上面的 df 相同，因为我采用了 'a' 和 'b' 列：

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

首先循环遍历所有行列表，将所有均值收集为 pd.series，然后在轴 = 1 上连接结果序列列表，然后进行转置以获得正确的格式。

dfs = list()
for l in L:
    dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T

Answer 2

回答by mfitzp

You can select specific columns from a DataFrame by passing a list of indices to .iloc, for example:

您可以通过将索引列表传递给.iloc，从 DataFrame 中选择特定列，例如：

df.iloc[:, [2,5,6,7,8]]

Will return a DataFrame containing those numbered columns (note: This uses 0-based indexing, so 2refers to the 3rd column.)

将返回一个包含这些编号列的 DataFrame（注意：这使用基于 0 的索引，因此2指的是第 3 列。）

To take a mean down of that column, you could use:

要降低该列的平均值，您可以使用：

# Mean along 0 (vertical) axis: return mean for specified columns, calculated across all rows
df.iloc[:, [2,5,6,7,8]].mean(axis=0)

To take a mean across that column, you could use:

要在该列中取平均值，您可以使用：

# Mean along 1 (horizontal) axis: return mean for each row, calculated across specified columns
df.iloc[:, [2,5,6,7,8]].mean(axis=1)

You can also supply specific indices for both axes to return a subset of the table:

您还可以为两个轴提供特定索引以返回表的子集：

df.iloc[[1,2,3,4], [2,5,6,7,8]]

For your specific example, you would do:

对于您的具体示例，您将执行以下操作：

import pandas as pd
import numpy as np

df = pd.DataFrame( 
np.array([[1,2,3,0,5],[1,2,3,4,5],[1,1,1,6,1],[1,0,0,0,0]]),
columns=["a","b","c","d","q"],
index = [0,1,2,3]
)

#I want mean of 0, 2, 3 rows for each a, b, d columns
#. a b d
#0 1 1 2

df.iloc[ [0,2,3], [0,1,3] ].mean(axis=0)

Which outputs:

哪些输出：

a    1.0
b    1.0
d    2.0
dtype: float64

Alternatively, to access via column names, first select on those:

或者，要通过列名访问，首先选择那些：

df[ ['a','b','d'] ].iloc[ [0,1,3] ].mean(axis=0)

To answer the second part of your question (from the comments) you can join multiple DataFrames together using pd.concat. It is faster to accumulate the frames in a list and then pass to pd.concatin one go, e.g.

要回答问题的第二部分（来自评论），您可以使用pd.concat. 将帧累积在列表中然后一次性传递给它会更快pd.concat，例如

dfs = []
for ix in idxs:
    dfm = df.iloc[ [0,2,3], ix ].mean(axis=0)
    dfs.append(dfm)

dfm_summary = pd.concat(dfs, axis=1) # Stack horizontally

Python 计算熊猫数据框中选定列的选定行的平均值

提问by impossible

采纳答案by PdevG

回答by mfitzp

相关推荐

最近更新

标签

Python 计算熊猫数据框中选定列的选定行的平均值

提问by impossible

采纳答案by PdevG

回答by mfitzp

相关推荐

Python：按字符位置拆分字符串

导入错误：没有名为 sklearn 的模块（Python）

Python ValueError：无法从手动字段规范切换到自动字段编号

Python ValueError：未正确调用 DataFrame 构造函数！与熊猫

相关推荐

最近更新

标签