Python 在逐行迭代时更新 Pandas 中的数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23330654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Update a dataframe in pandas while iterating row by row
提问by AMM
I have a pandas data frame that looks like this (its a pretty big one)
我有一个看起来像这样的熊猫数据框(它很大)
date exer exp ifor mat
1092 2014-03-17 American M 528.205 2014-04-19
1093 2014-03-17 American M 528.205 2014-04-19
1094 2014-03-17 American M 528.205 2014-04-19
1095 2014-03-17 American M 528.205 2014-04-19
1096 2014-03-17 American M 528.205 2014-05-17
now I would like to iterate row by row and as I go through each row, the value of ifor
in each row can change depending on some conditions and I need to lookup another dataframe.
现在我想逐行迭代,当我遍历每一行时,每一行中的值ifor
可能会根据某些条件而改变,我需要查找另一个数据帧。
Now, how do I update this as I iterate. Tried a few things none of them worked.
现在,我如何在迭代时更新它。尝试了一些事情,他们都没有工作。
for i, row in df.iterrows():
if <something>:
row['ifor'] = x
else:
row['ifor'] = y
df.ix[i]['ifor'] = x
None of these approaches seem to work. I don't see the values updated in the dataframe.
这些方法似乎都不起作用。我没有看到数据框中更新的值。
回答by CT Zhu
You should assign value by df.ix[i, 'exp']=X
or df.loc[i, 'exp']=X
instead of df.ix[i]['ifor'] = x
.
您应该通过df.ix[i, 'exp']=X
或df.loc[i, 'exp']=X
而不是 来分配值df.ix[i]['ifor'] = x
。
Otherwise you are working on a view, and should get a warming:
否则,您正在处理视图,并且应该得到加热:
-c:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
-c:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
But certainly, loop probably should better be replaced by some vectorized algorithm to make the full use of DataFrame
as @Phillip Cloud suggested.
但当然,循环可能最好用一些矢量化算法代替,以充分利用DataFrame
@Phillip Cloud 的建议。
回答by rakke
You can assign values in the loop using df.set_value:
您可以使用 df.set_value 在循环中分配值:
for i, row in df.iterrows():
ifor_val = something
if <condition>:
ifor_val = something_else
df.set_value(i,'ifor',ifor_val)
If you don't need the row values you could simply iterate over the indices of df, but I kept the original for-loop in case you need the row value for something not shown here.
如果您不需要行值,您可以简单地遍历 df 的索引,但我保留了原始 for 循环,以防您需要此处未显示的行值。
update
更新
df.set_value() has been deprecated since version 0.21.0 you can use df.at() instead:
df.set_value() 自版本 0.21.0 起已被弃用,您可以使用 df.at() 代替:
for i, row in df.iterrows():
ifor_val = something
if <condition>:
ifor_val = something_else
df.at[i,'ifor'] = ifor_val
回答by GoingMyWay
A method you can use is itertuples()
, it iterates over DataFrame rows as namedtuples, with index value as first element of the tuple. And it is much much faster compared with iterrows()
. For itertuples()
, each row
contains its Index
in the DataFrame, and you can use loc
to set the value.
您可以使用的一种方法是itertuples()
,它将 DataFrame 行作为命名元组进行迭代,并将索引值作为元组的第一个元素。与iterrows()
. 对于itertuples()
,每个都row
包含Index
在 DataFrame 中,您可以使用它loc
来设置值。
for row in df.itertuples():
if <something>:
df.at[row.Index, 'ifor'] = x
else:
df.at[row.Index, 'ifor'] = x
df.loc[row.Index, 'ifor'] = x
Thanks @SantiStSupery, using .at
is much faster.
感谢@SantiStSupery,使用.at
速度要快得多。
回答by piRSquared
Pandas DataFrame object should be thought of as a Series of Series. In other words, you should think of it in terms of columns. The reason why this is important is because when you use pd.DataFrame.iterrows
you are iterating through rows as Series. But these are notthe Series that the data frame is storing and so they are new Series that are created for you while you iterate. That implies that when you attempt to assign tho them, those edits won't end up reflected in the original data frame.
Pandas DataFrame 对象应该被认为是一个系列的系列。换句话说,您应该根据列来考虑它。这很重要的原因是因为当您使用时,您将pd.DataFrame.iterrows
行作为系列进行迭代。但这些不是数据框正在存储的系列,因此它们是在您迭代时为您创建的新系列。这意味着当您尝试分配它们时,这些编辑最终不会反映在原始数据框中。
Ok, now that that is out of the way: What do we do?
好的,现在已经不碍事了:我们该怎么办?
Suggestions prior to this post include:
在这篇文章之前的建议包括:
pd.DataFrame.set_value
is deprecated as of Pandas version 0.21pd.DataFrame.ix
is deprecatedpd.DataFrame.loc
is fine but can work on array indexersand you can do better
pd.DataFrame.set_value
被弃用的熊猫版0.21pd.DataFrame.ix
已弃用pd.DataFrame.loc
很好,但可以在数组索引器上工作,你可以做得更好
My recommendation
Use pd.DataFrame.at
我的建议
使用pd.DataFrame.at
for i in df.index:
if <something>:
df.at[i, 'ifor'] = x
else:
df.at[i, 'ifor'] = y
You can even change this to:
您甚至可以将其更改为:
for i in df.index:
df.at[i, 'ifor'] = x if <something> else y
Response to comment
回复评论
and what if I need to use the value of the previous row for the if condition?
如果我需要将前一行的值用于 if 条件呢?
for i in range(1, len(df) + 1):
j = df.columns.get_loc('ifor')
if <something>:
df.iat[i - 1, j] = x
else:
df.iat[i - 1, j] = y
回答by Duane
for i, row in df.iterrows():
if <something>:
df.at[i, 'ifor'] = x
else:
df.at[i, 'ifor'] = y
回答by Pranzell
Well, if you are going to iterate anyhow, why don't use the simplest method of all, df['Column'].values[i]
好吧,如果你无论如何都要迭代,为什么不使用最简单的方法, df['Column'].values[i]
df['Column'] = ''
for i in range(len(df)):
df['Column'].values[i] = something/update/new_value
Or if you want to compare the new values with old or anything like that, why not store it in a list and then append in the end.
或者,如果您想将新值与旧值或类似值进行比较,为什么不将其存储在列表中,然后将其追加到最后。
mylist, df['Column'] = [], ''
for <condition>:
mylist.append(something/update/new_value)
df['Column'] = mylist
回答by Shazir Jabbar
Increment the MAX number from a column. For Example :
从一列增加 MAX 数。例如 :
df1 = [sort_ID, Column1,Column2]
print(df1)
My output :
我的输出:
Sort_ID Column1 Column2
12 a e
45 b f
65 c g
78 d h
MAX = df1['Sort_ID'].max() #This returns my Max Number
Now , I need to create a column in df2 and fill the column values which increments the MAX .
现在,我需要在 df2 中创建一列并填充增加 MAX 的列值。
Sort_ID Column1 Column2
79 a1 e1
80 b1 f1
81 c1 g1
82 d1 h1
Note : df2 will initially contain only the Column1 and Column2 . we need the Sortid column to be created and incremental of the MAX from df1 .
注意:df2 最初将只包含 Column1 和 Column2。我们需要从 df1 创建 Sortid 列并增加 MAX 。