Python 如何在迭代熊猫数据框时创建新列并插入行值

Question

提问by sequence_hard

I am trying to create a function that iterates through a pandas dataframe row by row. I want to create a new column based on row values of other columns. My original dataframe could look like this:

我正在尝试创建一个逐行遍历 Pandas 数据帧的函数。我想根据其他列的行值创建一个新列。我的原始数据框可能如下所示：

Now I want to create a new column filled with the row values of Column A - Column B at each index position, so that the result looks like this:

现在我想创建一个新列，在每个索引位置填充 A 列 - B 列的行值，以便结果如下所示：

 df:

       A   B   A-B
    0  1   2   -1
    1  3   4   -1
    2  2   2    0

the solution I have works, but only when I do NOT use it in a function:

我的解决方案有效，但仅当我不在函数中使用它时：

for index, row in df.iterrows():
        print index
        df['A-B']=df['A']-df['B']

This gives me the desired output, but when I try to use it as a function, I get an error.

这为我提供了所需的输出，但是当我尝试将其用作函数时，出现错误。

def test(x):
    for index, row in df.iterrows():
        print index
        df['A-B']=df['A']-df['B']
    return df
df.apply(test)

ValueError: cannot copy sequence with size 4 to array axis with dimension 3

What am I doing wrong here and how can I get it to work?

我在这里做错了什么，我怎样才能让它工作？

Answer 1

采纳答案by Anton Protopopov

It's because applymethod works for column by default, change axisto 1 if you'd like through rows:

这是因为apply方法默认适用于列，axis如果您希望通过行，请更改为 1：

axis: {0 or ‘index', 1 or ‘columns'}, default 0
0 or ‘index': apply function to each column
1 or ‘columns': apply function to each row

轴：{0 或 'index'，1 或 'columns'}，默认为 0
0 或 'index'：对每一列应用函数
1 或“列”：将函数应用于每一行

df.apply(test, axis=1)

EDIT

编辑

I thought that you need to do something complex manupulation with each row. If you need just substract columns from each other:

我认为您需要对每一行进行复杂的操作。如果您只需要从彼此减去列：

df['A-B'] = df.A - df.B

Answer 2

回答by agold

Like indicated by Anton you should execute the applyfunction with axis=1parameter. However it is not necessary to then loop through the rows as you did in the function test, since the applydocumentationmentions:

就像 Anton 指出的那样，您应该使用参数执行apply函数axis=1。但是，没有必要像在函数测试中那样循环遍历行，因为apply文档中提到：

Objects passed to functions are Series objects

传递给函数的对象是 Series 对象

So you could simplify the function to:

因此，您可以将函数简化为：

def test(x):
    x['A-B']=x['A']-x['B']
    return x

and then run:

然后运行：

df.apply(test,axis=1)

Note that in fact you named the parameter of test x, while not using xin the function testat all.

请注意，实际上您命名了 test 的参数x，而根本没有x在函数test中使用。

Finally I should comment that you can do column wise operations with pandas (i.e. without for loop) doing simply this:

最后，我应该评论一下，您可以使用 Pandas（即没有 for 循环）执行列明智的操作，只需执行以下操作：

df['A-B']=df['A']-df['B']

Also see:

另见：

Python 如何在迭代熊猫数据框时创建新列并插入行值

提问by sequence_hard

采纳答案by Anton Protopopov

回答by agold

相关推荐

最近更新

标签

Python 如何在迭代熊猫数据框时创建新列并插入行值

提问by sequence_hard

采纳答案by Anton Protopopov

回答by agold

相关推荐

Python Django：如何将日志级别设置为 INFO 或 DEBUG

Python Matplotlib.Pyplot 不显示输出；没有错误

Python Matplotlib，水平条形图（barh）是倒置的

Python 如何使 Pandas 数据框列标题全部小写？

相关推荐

最近更新

标签