如何更新 Python Pandas DataFrame 中特定行中的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24036911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to update values in a specific row in a Python Pandas DataFrame?
提问by Alexander
With the nice indexing methods in Pandas I have no problems extracting data in various ways. On the other hand I am still confused about how to change data in an existing DataFrame.
使用 Pandas 中出色的索引方法,我可以轻松地以各种方式提取数据。另一方面,我仍然对如何更改现有 DataFrame 中的数据感到困惑。
In the following code I have two DataFrames and my goal is to update values in a specific row in the first df from values of the second df. How can I achieve this?
在下面的代码中,我有两个 DataFrame,我的目标是从第二个 df 的值更新第一个 df 中特定行中的值。我怎样才能做到这一点?
import pandas as pd
df = pd.DataFrame({'filename' : ['test0.dat', 'test2.dat'],
'm': [12, 13], 'n' : [None, None]})
df2 = pd.DataFrame({'filename' : 'test2.dat', 'n':16}, index=[0])
# this overwrites the first row but we want to update the second
# df.update(df2)
# this does not update anything
df.loc[df.filename == 'test2.dat'].update(df2)
print(df)
gives
给
filename m n
0 test0.dat 12 None
1 test2.dat 13 None
[2 rows x 3 columns]
but how can I achieve this:
但我怎样才能做到这一点:
filename m n
0 test0.dat 12 None
1 test2.dat 13 16
[2 rows x 3 columns]
采纳答案by FooBar
So first of all, pandas updates using the index. When an update command does not update anything, check both left-hand side and right-hand side. If for some reason you are too lazy to update the indices to follow your identification logic, you can do something along the lines of
所以首先,pandas 使用 index 进行更新。当更新命令未更新任何内容时,请检查左侧和右侧。如果由于某种原因你懒得更新索引来遵循你的识别逻辑,你可以做一些类似的事情
>>> df.loc[df.filename == 'test2.dat', 'n'] = df2[df2.filename == 'test2.dat'].loc[0]['n']
>>> df
Out[331]:
filename m n
0 test0.dat 12 None
1 test2.dat 13 16
If you want to do this for the whole table, I suggest a method I believe is superior to the previously mentioned ones: since your identifier is filename
, set filename
as your index, and then use update()
as you wanted to. Both merge
and the apply()
approach contain unnecessary overhead:
如果您想对整个表执行此操作,我建议使用一种我认为优于前面提到的方法:由于您的标识符是filename
,请将其设置filename
为您的索引,然后update()
根据需要使用。无论merge
和apply()
方法包含不必要的开销:
>>> df.set_index('filename', inplace=True)
>>> df2.set_index('filename', inplace=True)
>>> df.update(df2)
>>> df
Out[292]:
m n
filename
test0.dat 12 None
test2.dat 13 16
回答by chrisb
There are probably a few ways to do this, but one approach would be to merge the two dataframes together on the filename/m column, then populate the column 'n' from the right dataframe if a match was found. The n_x, n_y in the code refer to the left/right dataframes in the merge.
可能有几种方法可以做到这一点,但一种方法是在 filename/m 列上将两个数据帧合并在一起,然后如果找到匹配项,则从正确的数据帧中填充列“n”。代码中的 n_x、n_y 指的是合并中的左/右数据帧。
In[100] : df = pd.merge(df1, df2, how='left', on=['filename','m'])
In[101] : df
Out[101]:
filename m n_x n_y
0 test0.dat 12 None NaN
1 test2.dat 13 None 16
In[102] : df['n'] = df['n_y'].fillna(df['n_x'])
In[103] : df = df.drop(['n_x','n_y'], axis=1)
In[104] : df
Out[104]:
filename m n
0 test0.dat 12 None
1 test2.dat 13 16
回答by zach
If you have one large dataframe and only a few update values I would use apply like this:
如果您有一个大型数据框并且只有几个更新值,我会像这样使用 apply:
import pandas as pd
df = pd.DataFrame({'filename' : ['test0.dat', 'test2.dat'],
'm': [12, 13], 'n' : [None, None]})
data = {'filename' : 'test2.dat', 'n':16}
def update_vals(row, data=data):
if row.filename == data['filename']:
row.n = data['n']
return row
df.apply(update_vals, axis=1)