pandas 如何在Python中对数据框的特定行求和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33700529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to sum over certain row of a data frame in Python
提问by Ana
I have a data frame A
, and I would like to sum over the rows that their row index value has a number greater or equal 10.
If this is not possible, I can live with a code that sums over rows 2-3 too.
我有一个数据框A
,我想对它们的行索引值大于或等于 10 的行求和。如果这是不可能的,我也可以使用对第 2-3 行求和的代码。
import pandas as pd
import numpy as np
A = """
Tier Oct Nov Dec
0 up to 2M 4 5 10
1 5M 3 2 7
2 10M 6 0 2
3 15M 1 3 5
"""
tenplus = pd.Series(A(axis=0),index=A.columns[1:])
But this sums over the whole table. One thing I could do is to build another data frame from rows 2-3 and sume over them, but I prefer to learn the best practice!
但这总结了整个表格。我可以做的一件事是从第 2-3 行构建另一个数据框并总结它们,但我更喜欢学习最佳实践!
Thanks!
谢谢!
采纳答案by ali_m
You can use normal slice indexing to select the rows you want to sum over:
您可以使用普通切片索引来选择要求和的行:
print(df)
# Tier Oct Nov Dec
# 0 up to 2M 4 5 10
# 1 5M 3 2 7
# 2 10M 6 0 2
# 3 15M 1 3 5
# select the last two rows
print(df[2:4])
# Tier Oct Nov Dec
# 2 10M 6 0 2
# 3 15M 1 3 5
# sum over them
print(df[2:4].sum())
# Tier 10M15M
# Oct 7
# Nov 3
# Dec 7
# dtype: object
As you can see, summing the Tier
column gives a meaningless result, since "summing" strings just concatenates them. It would make more sense to sum over only the last three columns:
如您所见,对Tier
列求和会产生无意义的结果,因为“求和”字符串只是将它们连接起来。仅对最后三列求和会更有意义:
# select the last two rows and the last 3 columns
print(df.loc[2:4, ['Oct', 'Nov', 'Dec']])
# Oct Nov Dec
# 2 6 0 2
# 3 1 3 5
# sum over them
print(df.loc[2:4, ['Oct', 'Nov', 'Dec']].sum())
# Oct 7
# Nov 3
# Dec 7
# dtype: int64
# alternatively, use df.iloc[2:4, 1:] to select by column index rather than name
You can read more about how indexing works in pandas in the documentation here.
回答by Andy Hayden
sum has an axis argument, pass axis=1 to sum over rows:
sum 有一个轴参数,通过 axis=1 对行求和:
In [11]: df
Out[11]:
Tier Oct Nov Dec
0 up to 2M 4 5 10
1 5M 3 2 7
2 10M 6 0 2
3 15M 1 3 5
In [12]: df.sum(axis=1)
Out[12]:
0 19
1 12
2 8
3 9
dtype: int64
Note: This is discarding the non-numeric columns, you can filter these out explicitly before summing:
注意:这是丢弃非数字列,您可以在求和之前明确过滤掉这些列:
In [13]: df[['Oct', 'Nov', 'Dec']].sum(axis=1)
Out[13]:
0 19
1 12
2 8
3 9
dtype: int64