pandas 熊猫数据框所有列的平均值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33835926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
mean of all the columns of a panda dataframe?
提问by Michael
I'm trying to calculate the mean of all the columns of a DataFrame but it looks like having a value in the B column of row 6 prevents from calculating the mean on the C column. Why?
我正在尝试计算 DataFrame 的所有列的平均值,但看起来第 6 行的 B 列中有一个值阻止计算 C 列的平均值。为什么?
import pandas as pd
from decimal import Decimal
d = [
{'A': 2, 'B': None, 'C': Decimal('628.00')},
{'A': 1, 'B': None, 'C': Decimal('383.00')},
{'A': 3, 'B': None, 'C': Decimal('651.00')},
{'A': 2, 'B': None, 'C': Decimal('575.00')},
{'A': 4, 'B': None, 'C': Decimal('1114.00')},
{'A': 1, 'B': 'TEST', 'C': Decimal('241.00')},
{'A': 2, 'B': None, 'C': Decimal('572.00')},
{'A': 4, 'B': None, 'C': Decimal('609.00')},
{'A': 3, 'B': None, 'C': Decimal('820.00')},
{'A': 5, 'B': None, 'C': Decimal('1223.00')}
]
df = pd.DataFrame(d)
In : df
Out:
A B C
0 2 None 628.00
1 1 None 383.00
2 3 None 651.00
3 2 None 575.00
4 4 None 1114.00
5 1 TEST 241.00
6 2 None 572.00
7 4 None 609.00
8 3 None 820.00
9 5 None 1223.00
Tests:
测试:
# no mean for C column
In : df.mean()
Out:
A 2.7
dtype: float64
# mean for C column when row 6 is left out of the DF
In : df.head(5).mean()
Out:
A 2.4
B NaN
C 670.2
dtype: float64
# no mean for C column when row 6 is part of the DF
In : df.head(6).mean()
Out:
A 2.166667
dtype: float64
dtypes:
数据类型:
In : df.dtypes
Out:
A int64
B object
C object
dtype: object
In : df.head(5).dtypes
Out:
A int64
B object
C object
dtype: object
采纳答案by Anton Protopopov
You could use particular columns if you need only columns with numbers:
如果您只需要带有数字的列,则可以使用特定的列:
In [90]: df[['A','C']].mean()
Out[90]:
A 2.7
C 681.6
dtype: float64
or to change type as @jezrael advice in comment:
或者在评论中将类型更改为@jezrael 建议:
df['C'] = df['C'].astype(float)
Probably df.mean
trying to convert all object to numeric and if it's fall then it's roll back and calculate only for actual numbers
可能df.mean
试图将所有对象转换为数字,如果它下降,那么它会回滚并仅计算实际数字