pandas 熊猫:求和两行数据帧而不重新排列数据帧?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37947479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:26:04  来源:igfitidea点击:

pandas: sum two rows of dataframe without rearranging dataframe?

pythonpandas

提问by ale19

I have a dataframe and I'm trying to sum two rows without messing up the order of the rows.

我有一个数据框,我试图在不弄乱行顺序的情况下对两行求和。

> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}

> testdf = pd.DataFrame(test)

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      736      1.61
Uncoded & errors      18      0.04
Total              45690    100.00

I want this output:

我想要这个输出:

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      754      1.65   <-- sum of 'other/unknown' and 'uncoded & errors'
Total              45690    100.00

This is as close as I've been able to get:

这是我所能得到的最接近的:

> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()

     counts   percents
0    754.00   1.65       

> sum_ = sum_.rename(index={0: 'Other / unknown'})

                counts   percents
Other / unknown 754.00   1.65   

> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)

Daylight         10541  23.07
Dawn             4143   9.07
Total            45690  100
Other / unknown  754    1.65

But this does not preserve the order of the original rows

但这不会保留原始行的顺序

I could insert the row by slicing the dataframe and inserting the sum_ row between 'Dawn' and 'Total', but that will not work if the row labels ever change, or if the order of the rows change, etc. (this is an annual brochure so the table design might change from year to year), so I'm trying to do this robustly.

我可以通过切片数据框并在 'Dawn' 和 'Total' 之间插入 sum_ 行来插入行,但是如果行标签发生变化,或者行的顺序发生变化等,这将不起作用(这是一个年度小册子,因此表格设计可能会逐年变化),所以我正在努力做到这一点。

回答by MaxU

use groupby(..., sort=False).sum():

使用groupby(..., sort=False).sum()

In [84]: (testdf.reset_index()
   ....:        .replace({'index': {'Uncoded & errors':'Other / unknown'}})
   ....:        .groupby('index', sort=False).sum()
   ....: )
Out[84]:
                 counts  percents
index
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00

回答by peterfields

Although I prefer MaxU's answer, you can also try summing in-place:

虽然我更喜欢 MaxU 的答案,但您也可以尝试就地求和:

testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']

And then deleting the row by index:

然后按索引删除行:

testdf.drop(['Uncoded & errors'], inplace=True)

In [28]: testdf
Out[28]: 
                 counts  percents
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00