Python 如何在迭代熊猫数据框时创建新列并插入行值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34128984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create new column and insert row values while iterating through pandas data frame
提问by sequence_hard
I am trying to create a function that iterates through a pandas dataframe row by row. I want to create a new column based on row values of other columns. My original dataframe could look like this:
我正在尝试创建一个逐行遍历 Pandas 数据帧的函数。我想根据其他列的行值创建一个新列。我的原始数据框可能如下所示:
df:
A B
0 1 2
1 3 4
2 2 2
Now I want to create a new column filled with the row values of Column A - Column B at each index position, so that the result looks like this:
现在我想创建一个新列,在每个索引位置填充 A 列 - B 列的行值,以便结果如下所示:
df:
A B A-B
0 1 2 -1
1 3 4 -1
2 2 2 0
the solution I have works, but only when I do NOT use it in a function:
我的解决方案有效,但仅当我不在函数中使用它时:
for index, row in df.iterrows():
print index
df['A-B']=df['A']-df['B']
This gives me the desired output, but when I try to use it as a function, I get an error.
这为我提供了所需的输出,但是当我尝试将其用作函数时,出现错误。
def test(x):
for index, row in df.iterrows():
print index
df['A-B']=df['A']-df['B']
return df
df.apply(test)
ValueError: cannot copy sequence with size 4 to array axis with dimension 3
What am I doing wrong here and how can I get it to work?
我在这里做错了什么,我怎样才能让它工作?
采纳答案by Anton Protopopov
It's because apply
method works for column by default, change axis
to 1 if you'd like through rows:
这是因为apply
方法默认适用于列,axis
如果您希望通过行,请更改为 1:
axis: {0 or ‘index', 1 or ‘columns'}, default 0
- 0 or ‘index': apply function to each column
- 1 or ‘columns': apply function to each row
轴:{0 或 'index',1 或 'columns'},默认为 0
- 0 或 'index':对每一列应用函数
- 1 或“列”:将函数应用于每一行
df.apply(test, axis=1)
EDIT
编辑
I thought that you need to do something complex manupulation with each row. If you need just substract columns from each other:
我认为您需要对每一行进行复杂的操作。如果您只需要从彼此减去列:
df['A-B'] = df.A - df.B
回答by agold
Like indicated by Anton you should execute the applyfunction with axis=1
parameter. However it is not necessary to then loop through the rows as you did in the function test, since
the apply
documentationmentions:
就像 Anton 指出的那样,您应该使用参数执行apply函数axis=1
。但是,没有必要像在函数测试中那样循环遍历行,因为apply
文档中提到:
Objects passed to functions are Series objects
传递给函数的对象是 Series 对象
So you could simplify the function to:
因此,您可以将函数简化为:
def test(x):
x['A-B']=x['A']-x['B']
return x
and then run:
然后运行:
df.apply(test,axis=1)
Note that in fact you named the parameter of test x
, while not using x
in the function test
at all.
请注意,实际上您命名了 test 的参数x
,而根本没有x
在函数test
中使用。
Finally I should comment that you can do column wise operations with pandas (i.e. without for loop) doing simply this:
最后,我应该评论一下,您可以使用 Pandas(即没有 for 循环)执行列明智的操作,只需执行以下操作:
df['A-B']=df['A']-df['B']
Also see:
另见: