Python 如何在迭代熊猫数据框时创建新列并插入行值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34128984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:29:38  来源:igfitidea点击:

How to create new column and insert row values while iterating through pandas data frame

pythonpandasiterationdataframe

提问by sequence_hard

I am trying to create a function that iterates through a pandas dataframe row by row. I want to create a new column based on row values of other columns. My original dataframe could look like this:

我正在尝试创建一个逐行遍历 Pandas 数据帧的函数。我想根据其他列的行值创建一个新列。我的原始数据框可能如下所示:

df:

   A   B
0  1   2
1  3   4
2  2   2

Now I want to create a new column filled with the row values of Column A - Column B at each index position, so that the result looks like this:

现在我想创建一个新列,在每个索引位置填充 A 列 - B 列的行值,以便结果如下所示:

 df:

       A   B   A-B
    0  1   2   -1
    1  3   4   -1
    2  2   2    0

the solution I have works, but only when I do NOT use it in a function:

我的解决方案有效,但仅当我不在函数中使用它时:

for index, row in df.iterrows():
        print index
        df['A-B']=df['A']-df['B']

This gives me the desired output, but when I try to use it as a function, I get an error.

这为我提供了所需的输出,但是当我尝试将其用作函数时,出现错误。

def test(x):
    for index, row in df.iterrows():
        print index
        df['A-B']=df['A']-df['B']
    return df
df.apply(test)

ValueError: cannot copy sequence with size 4 to array axis with dimension 3

What am I doing wrong here and how can I get it to work?

我在这里做错了什么,我怎样才能让它工作?

采纳答案by Anton Protopopov

It's because applymethod works for column by default, change axisto 1 if you'd like through rows:

这是因为apply方法默认适用于列,axis如果您希望通过行,请更改为 1:

axis: {0 or ‘index', 1 or ‘columns'}, default 0

  • 0 or ‘index': apply function to each column
  • 1 or ‘columns': apply function to each row

:{0 或 'index',1 或 'columns'},默认为 0

  • 0 或 'index':对每一列应用函数
  • 1 或“列”:将函数应用于每一行
df.apply(test, axis=1)

EDIT

编辑

I thought that you need to do something complex manupulation with each row. If you need just substract columns from each other:

我认为您需要对每一行进行复杂的操作。如果您只需要从彼此减去列:

df['A-B'] = df.A - df.B

回答by agold

Like indicated by Anton you should execute the applyfunction with axis=1parameter. However it is not necessary to then loop through the rows as you did in the function test, since the applydocumentationmentions:

就像 Anton 指出的那样,您应该使用参数执行apply函数axis=1。但是,没有必要像在函数测试中那样循环遍历行,因为apply文档中提到:

Objects passed to functions are Series objects

传递给函数的对象是 Series 对象

So you could simplify the function to:

因此,您可以将函数简化为:

def test(x):
    x['A-B']=x['A']-x['B']
    return x

and then run:

然后运行:

df.apply(test,axis=1)

Note that in fact you named the parameter of test x, while not using xin the function testat all.

请注意,实际上您命名了 test 的参数x,而根本没有x在函数test中使用。

Finally I should comment that you can do column wise operations with pandas (i.e. without for loop) doing simply this:

最后,我应该评论一下,您可以使用 Pandas(即没有 for 循环)执行列明智的操作,只需执行以下操作:

df['A-B']=df['A']-df['B']

Also see:

另见: