pandas 熊猫数据框所有列的平均值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33835926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:15:48  来源:igfitidea点击:

mean of all the columns of a panda dataframe?

python-3.xpandaspython-3.4

提问by Michael

I'm trying to calculate the mean of all the columns of a DataFrame but it looks like having a value in the B column of row 6 prevents from calculating the mean on the C column. Why?

我正在尝试计算 DataFrame 的所有列的平均值,但看起来第 6 行的 B 列中有一个值阻止计算 C 列的平均值。为什么?

import pandas as pd
from decimal import Decimal
d = [
    {'A': 2, 'B': None, 'C': Decimal('628.00')},
    {'A': 1, 'B': None, 'C': Decimal('383.00')},
    {'A': 3, 'B': None, 'C': Decimal('651.00')},
    {'A': 2, 'B': None, 'C': Decimal('575.00')},
    {'A': 4, 'B': None, 'C': Decimal('1114.00')},
    {'A': 1, 'B': 'TEST', 'C': Decimal('241.00')},
    {'A': 2, 'B': None, 'C': Decimal('572.00')},
    {'A': 4, 'B': None, 'C': Decimal('609.00')},
    {'A': 3, 'B': None, 'C': Decimal('820.00')},
    {'A': 5, 'B': None, 'C': Decimal('1223.00')}
]

df = pd.DataFrame(d)

In : df
Out:
   A     B        C
0  2  None   628.00
1  1  None   383.00
2  3  None   651.00
3  2  None   575.00
4  4  None  1114.00
5  1  TEST   241.00
6  2  None   572.00
7  4  None   609.00
8  3  None   820.00
9  5  None  1223.00

Tests:

测试:

# no mean for C column
In : df.mean()
Out:
A    2.7
dtype: float64

# mean for C column when row 6 is left out of the DF
In : df.head(5).mean()
Out:
A      2.4
B      NaN
C    670.2
dtype: float64

# no mean for C column when row 6 is part of the DF
In : df.head(6).mean()
Out:
A    2.166667
dtype: float64

dtypes:

数据类型:

In : df.dtypes
Out:
A     int64
B    object
C    object
dtype: object

In : df.head(5).dtypes
Out:
A     int64
B    object
C    object
dtype: object

采纳答案by Anton Protopopov

You could use particular columns if you need only columns with numbers:

如果您只需要带有数字的列,则可以使用特定的列:

In [90]: df[['A','C']].mean()
Out[90]: 
A      2.7
C    681.6
dtype: float64

or to change type as @jezrael advice in comment:

或者在评论中将类型更改为@jezrael 建议:

df['C'] = df['C'].astype(float)

Probably df.meantrying to convert all object to numeric and if it's fall then it's roll back and calculate only for actual numbers

可能df.mean试图将所有对象转换为数字,如果它下降,那么它会回滚并仅计算实际数字