pandas 根据另一列中的值将值添加到熊猫数据框的一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22195127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:46:25  来源:igfitidea点击:

Add values to one column of a pandas dataframe based on the values in another

pythonnumpypandas

提问by juniper-

Say I have two matrices, an original and a reference:

假设我有两个矩阵,一个原始矩阵和一个参考矩阵:

import pandas as pa
print "Original Data Frame"
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
print "Original Table:"
print a

print "Reference Table:"
b = pa.DataFrame({'col1':['x','x'], 'col2':['c','d'], 'col3':[10,20]})
print b

Where the tables look like this:

表格如下所示:

Original Data Frame
Original Table:
  col1 col2  col3
0    a    c     1
1    a    d     2
2    b    c     3
3    b    d     4

Reference Table:
  col1 col2  col3
0    x    c    10
1    x    d    20

Now I want to subtract from the third column (col3) of the original table (a), the value in the reference table (c) in the row where the second columns of the two tables match. So the first row of table two should have the value 10 added to the third column, because the row of table b where the column is col2 is 'c' has a value of 10 in col3. Make sense? Here's some code that does that:

现在我想从原始表 (a) 的第三列 (col3) 中减去参考表 (c) 中两个表的第二列匹配的行中的值。因此,表二的第一行应该将值 10 添加到第三列,因为表 b 中列为 col2 的行是 'c' 在 col3 中的值为 10。有道理?这是一些执行此操作的代码:

col3 = []
for ix, row in a.iterrows():
    col3 += [row[2] + b[b['col2'] == row[1]]['col3']]

a['col3'] = col3
print "Output Table:"
print a

Yielding the following output:

产生以下输出:

Output Table:
  col1 col2  col3
0    a    c  [11]
1    a    d  [22]
2    b    c  [13]
3    b    d  [24]

My question is, is there a more elegant way to do this? Also, the results in 'col3' should not be lists. Solutions using numpy are also welcome.

我的问题是,有没有更优雅的方法来做到这一点?此外,'col3' 中的结果不应是列表。也欢迎使用 numpy 的解决方案。

采纳答案by behzad.nouri

I did not quite understand your description of what you are trying to do, but the output you have shown can be generated by first merging the two data frames and then some simple operations;

我不太明白你对你要做什么的描述,但是你显示的输出可以通过首先合并两个数据框然后进行一些简单的操作来生成;

>>> df = a.merge(b.filter(['col2', 'col3']), how='left',
                 left_on='col2', right_on='col2', suffixes=('', '_'))
>>> df
  col1 col2  col3  col3_
0    a    c     1     10
1    b    c     3     10
2    a    d     2     20
3    b    d     4     20

[4 rows x 4 columns]
>>> df.col3_.fillna(0, inplace=True) # in case there are no matches
>>> df.col3 += df.col3_
>>> df
  col1 col2  col3  col3_
0    a    c    11     10
1    b    c    13     10
2    a    d    22     20
3    b    d    24     20

[4 rows x 4 columns]
>>> df.drop('col3_', axis=1, inplace=True)
>>> df
  col1 col2  col3
0    a    c    11
1    b    c    13
2    a    d    22
3    b    d    24

[4 rows x 3 columns]

If values in col2in bare not unique, then probably you also need something like:

如果 incol2中的值b不是唯一的,那么您可能还需要以下内容:

>>> b.groupby('col2', as_index=False)['col3'].aggregate(sum)