总结 Pandas DataFrames 的列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45983321/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sum a list of Pandas DataFrames
提问by blahblahblah
Is there a way to sum multiple pandas DataFrames using syntax similar to pd.concat([df1, df2, df3, df4])
. I understand from documentation that I can do df1.sum(df2, fill_value=0)
, but I have a long list of DataFrames I need to sum and was wondering if I could do it without writing a loop.
有没有办法使用类似于pd.concat([df1, df2, df3, df4])
. 我从文档中了解到我可以做df1.sum(df2, fill_value=0)
,但我有一长串需要总结的数据帧,我想知道我是否可以在不编写循环的情况下做到这一点。
Somewhat related question/answer: Pandas sum multiple dataframes(Stack Overflow)
有点相关的问题/答案:Pandas sum multiple dataframes(Stack Overflow)
Example of what the result should look like:
结果应该是什么样子的示例:
idx1 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'B'), ('b', 'A'), ('b', 'D')])
idx2 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'C'), ('b', 'A'), ('b', 'C')])
idx3 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'D'), ('b', 'A'), ('b', 'C')])
np.random.seed([3,1415])
df1 = pd.DataFrame(np.random.randn(4, 1), idx1, ['val'])
df2 = pd.DataFrame(np.random.randn(4, 1), idx2, ['val'])
df3 = pd.DataFrame(np.random.randn(4, 1), idx3, ['val'])
df1
df1
df2
df2
df3
df3
The result should look like:
结果应如下所示:
回答by jezrael
Use reduce
with add
with parameter fill_value=0
:
使用reduce
具有add
与参数fill_value=0
:
np.random.seed(12)
a = pd.DataFrame(np.random.randint(3, size=(5,3)), columns=list('abc'))
b = pd.DataFrame(np.random.randint(3, size=(5,2)), columns=list('ab'))
c = pd.DataFrame(np.random.randint(3, size=(5,2)), columns=list('ac'))
print(a)
a b c
0 2 1 1
1 2 0 0
2 2 1 0
3 1 1 1
4 2 2 2
print(b)
a b
0 0 1
1 0 0
2 1 2
3 1 2
4 0 1
print(c)
a c
0 2 0
1 2 2
2 2 0
3 0 2
4 1 1
from functools import reduce
dfs = [a,b, c]
d = reduce(lambda x, y: x.add(y, fill_value=0), dfs)
print (d)
a b c
0 4 2.0 1.0
1 4 0.0 2.0
2 5 3.0 0.0
3 2 3.0 3.0
4 3 3.0 3.0