Pandas:对多列求和并在多列中获得结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46891001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas : Sum multiple columns and get results in multiple columns
提问by Akio Omi
I have a "sample.txt" like this.
我有一个这样的“sample.txt”。
idx A B C D cat
J 1 2 3 1 x
K 4 5 6 2 x
L 7 8 9 3 y
M 1 2 3 4 y
N 4 5 6 5 z
O 7 8 9 6 z
With this dataset, I want to get sum in row and column. In row, it is not a big deal. I made result like this.
有了这个数据集,我想得到行和列的总和。在行中,这没什么大不了的。我做了这样的结果。
### MY CODE ###
import pandas as pd
df = pd.read_csv('sample.txt',sep="\t",index_col='idx')
df.info()
df2 = df.groupby('cat').sum()
print( df2 )
The result is like this.
结果是这样的。
A B C D
cat
x 5 7 9 3
y 8 10 12 7
z 11 13 15 11
But I don't know how to write a code to get result like this. (simply add values in column A and B as well as column C and D)
但我不知道如何编写代码来获得这样的结果。(只需在 A 列和 B 列以及 C 列和 D 列中添加值)
AB CD
J 3 4
K 9 8
L 15 12
M 3 7
N 9 11
O 15 15
Could anybody help how to write a code?
有人可以帮助如何编写代码吗?
By the way, I don't want to do like this. (it looks too dull, but if it is the only way, I'll deem it)
顺便说一句,我不想这样做。(看起来太沉闷了,但如果是唯一的方法,我会认为它)
df2 = df['A'] + df['B']
df3 = df['C'] + df['D']
df = pd.DataFrame([df2,df3],index=['AB','CD']).transpose()
print( df )
回答by piRSquared
When you pass a dictionary or callable to groupby
it gets applied to an axis. I specified axis one which is columns.
当您传递字典或可调用对象时,groupby
它会应用于轴。我指定了第一个轴,即列。
d = dict(A='AB', B='AB', C='CD', D='CD')
df.groupby(d, axis=1).sum()
回答by jezrael
回答by Alex S
Does this do what you need? By using axis=1 with DataFrame.apply, you can use the data that you want in a row to construct a new column. Then you can drop the columns that you don't want anymore.
这能满足您的需求吗?通过将axis=1 与DataFrame.apply 结合使用,您可以使用一行中所需的数据来构建新列。然后您可以删除不再需要的列。
In [1]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A', 'B', 'C', 'D'], data=[[1, 2, 3, 4], [1, 2, 3, 4]])
In [6]: df
Out[6]:
A B C D
0 1 2 3 4
1 1 2 3 4
In [7]: df['CD'] = df.apply(lambda x: x['C'] + x['D'], axis=1)
In [8]: df
Out[8]:
A B C D CD
0 1 2 3 4 7
1 1 2 3 4 7
In [13]: df.drop(['C', 'D'], axis=1)
Out[13]:
A B CD
0 1 2 7
1 1 2 7