pandas 熊猫总结多个数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38472276/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas sum multiple dataframes
提问by cppgnlearner
I have multiple dataframes each with a multi-level-index and a value column. I want to add up all the dataframes on the value columns.
我有多个数据框,每个数据框都有一个多级索引和一个值列。我想将值列上的所有数据框加起来。
df1 + df2
df1 + df2
Not all the indexes are complete in each dataframe, hence I am getting nan
on a row which is not present in all the dataframes.
并非每个数据帧中的所有索引都完整,因此我得到nan
的行并不存在于所有数据帧中。
How can I overcome this and treat rows which are not present in any dataframe as having a value of 0?
如何克服这个问题并将任何数据框中不存在的行视为值为 0?
Eg. I want to get
例如。我想得到
val
a 2
b 4
c 3
d 3
from pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) + pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})
instead of
从pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) + pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})
而不是
val
a 2
b 4
c NaN
d NaN
回答by piRSquared
use the add
method with fill_value=0
parameter.
使用add
带fill_value=0
参数的方法。
df1 = pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}})
df2 = pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})
df1.add(df2, fill_value=0)
MultiIndex example
多索引示例
idx1 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'B'), ('b', 'A'), ('b', 'D')])
idx2 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'C'), ('b', 'A'), ('b', 'C')])
np.random.seed([3,1415])
df1 = pd.DataFrame(np.random.randn(4, 1), idx1, ['val'])
df2 = pd.DataFrame(np.random.randn(4, 1), idx2, ['val'])
df1
df2
df1.add(df2, fill_value=0)