pandas 根据另一列中的值将值添加到熊猫数据框的一列

Question

提问by juniper-

Say I have two matrices, an original and a reference:

假设我有两个矩阵，一个原始矩阵和一个参考矩阵：

import pandas as pa
print "Original Data Frame"
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
print "Original Table:"
print a

print "Reference Table:"
b = pa.DataFrame({'col1':['x','x'], 'col2':['c','d'], 'col3':[10,20]})
print b

Where the tables look like this:

表格如下所示：

Original Data Frame
Original Table:
  col1 col2  col3
0    a    c     1
1    a    d     2
2    b    c     3
3    b    d     4

Reference Table:
  col1 col2  col3
0    x    c    10
1    x    d    20

Now I want to subtract from the third column (col3) of the original table (a), the value in the reference table (c) in the row where the second columns of the two tables match. So the first row of table two should have the value 10 added to the third column, because the row of table b where the column is col2 is 'c' has a value of 10 in col3. Make sense? Here's some code that does that:

现在我想从原始表 (a) 的第三列 (col3) 中减去参考表 (c) 中两个表的第二列匹配的行中的值。因此，表二的第一行应该将值 10 添加到第三列，因为表 b 中列为 col2 的行是 'c' 在 col3 中的值为 10。有道理？这是一些执行此操作的代码：

col3 = []
for ix, row in a.iterrows():
    col3 += [row[2] + b[b['col2'] == row[1]]['col3']]

a['col3'] = col3
print "Output Table:"
print a

Yielding the following output:

产生以下输出：

Output Table:
  col1 col2  col3
0    a    c  [11]
1    a    d  [22]
2    b    c  [13]
3    b    d  [24]

My question is, is there a more elegant way to do this? Also, the results in 'col3' should not be lists. Solutions using numpy are also welcome.

我的问题是，有没有更优雅的方法来做到这一点？此外，'col3' 中的结果不应是列表。也欢迎使用 numpy 的解决方案。

Answer 1

采纳答案by behzad.nouri

I did not quite understand your description of what you are trying to do, but the output you have shown can be generated by first merging the two data frames and then some simple operations;

我不太明白你对你要做什么的描述，但是你显示的输出可以通过首先合并两个数据框然后进行一些简单的操作来生成；

>>> df = a.merge(b.filter(['col2', 'col3']), how='left',
                 left_on='col2', right_on='col2', suffixes=('', '_'))
>>> df
  col1 col2  col3  col3_
0    a    c     1     10
1    b    c     3     10
2    a    d     2     20
3    b    d     4     20

[4 rows x 4 columns]
>>> df.col3_.fillna(0, inplace=True) # in case there are no matches
>>> df.col3 += df.col3_
>>> df
  col1 col2  col3  col3_
0    a    c    11     10
1    b    c    13     10
2    a    d    22     20
3    b    d    24     20

[4 rows x 4 columns]
>>> df.drop('col3_', axis=1, inplace=True)
>>> df
  col1 col2  col3
0    a    c    11
1    b    c    13
2    a    d    22
3    b    d    24

[4 rows x 3 columns]

If values in col2in bare not unique, then probably you also need something like:

如果 incol2中的值b不是唯一的，那么您可能还需要以下内容：

>>> b.groupby('col2', as_index=False)['col3'].aggregate(sum)

pandas 根据另一列中的值将值添加到熊猫数据框的一列

提问by juniper-

采纳答案by behzad.nouri

相关推荐

最近更新

标签

pandas 根据另一列中的值将值添加到熊猫数据框的一列

提问by juniper-

采纳答案by behzad.nouri

相关推荐

pandas 如何按中值对熊猫中的箱线图进行排序？

如何用 Pandas,Python 中的几个指定单词替换系列中的所有单词？

pandas 合并一个 numpy 数组

pandas 从熊猫列中提取单个值

相关推荐

最近更新

标签