如何更新 Python Pandas DataFrame 中特定行中的值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24036911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:51:58  来源:igfitidea点击:

How to update values in a specific row in a Python Pandas DataFrame?

pythonpandas

提问by Alexander

With the nice indexing methods in Pandas I have no problems extracting data in various ways. On the other hand I am still confused about how to change data in an existing DataFrame.

使用 Pandas 中出色的索引方法,我可以轻松地以各种方式提取数据。另一方面,我仍然对如何更改现有 DataFrame 中的数据感到困惑。

In the following code I have two DataFrames and my goal is to update values in a specific row in the first df from values of the second df. How can I achieve this?

在下面的代码中,我有两个 DataFrame,我的目标是从第二个 df 的值更新第一个 df 中特定行中的值。我怎样才能做到这一点?

import pandas as pd
df = pd.DataFrame({'filename' :  ['test0.dat', 'test2.dat'], 
                                  'm': [12, 13], 'n' : [None, None]})
df2 = pd.DataFrame({'filename' :  'test2.dat', 'n':16}, index=[0])

# this overwrites the first row but we want to update the second
# df.update(df2)

# this does not update anything
df.loc[df.filename == 'test2.dat'].update(df2)

print(df)

gives

   filename   m     n
0  test0.dat  12  None
1  test2.dat  13  None

[2 rows x 3 columns]

but how can I achieve this:

但我怎样才能做到这一点:

    filename   m     n
0  test0.dat  12  None
1  test2.dat  13  16

[2 rows x 3 columns]

采纳答案by FooBar

So first of all, pandas updates using the index. When an update command does not update anything, check both left-hand side and right-hand side. If for some reason you are too lazy to update the indices to follow your identification logic, you can do something along the lines of

所以首先,pandas 使用 index 进行更新。当更新命令未更新任何内容时,请检查左侧和右侧。如果由于某种原因你懒得更新索引来遵循你的识别逻辑,你可以做一些类似的事情

>>> df.loc[df.filename == 'test2.dat', 'n'] = df2[df2.filename == 'test2.dat'].loc[0]['n']
>>> df
Out[331]: 
    filename   m     n
0  test0.dat  12  None
1  test2.dat  13    16

If you want to do this for the whole table, I suggest a method I believe is superior to the previously mentioned ones: since your identifier is filename, set filenameas your index, and then use update()as you wanted to. Both mergeand the apply()approach contain unnecessary overhead:

如果您想对整个表执行此操作,我建议使用一种我认为优于前面提到的方法:由于您的标识符是filename,请将其设置filename为您的索引,然后update()根据需要使用。无论mergeapply()方法包含不必要的开销:

>>> df.set_index('filename', inplace=True)
>>> df2.set_index('filename', inplace=True)
>>> df.update(df2)
>>> df
Out[292]: 
            m     n
filename           
test0.dat  12  None
test2.dat  13    16

回答by chrisb

There are probably a few ways to do this, but one approach would be to merge the two dataframes together on the filename/m column, then populate the column 'n' from the right dataframe if a match was found. The n_x, n_y in the code refer to the left/right dataframes in the merge.

可能有几种方法可以做到这一点,但一种方法是在 filename/m 列上将两个数据帧合并在一起,然后如果找到匹配项,则从正确的数据帧中填充列“n”。代码中的 n_x、n_y 指的是合并中的左/右数据帧。

In[100] : df = pd.merge(df1, df2, how='left', on=['filename','m'])

In[101] : df
Out[101]: 
    filename   m   n_x  n_y
0  test0.dat  12  None  NaN
1  test2.dat  13  None   16

In[102] : df['n'] = df['n_y'].fillna(df['n_x'])

In[103] : df = df.drop(['n_x','n_y'], axis=1)

In[104] : df
Out[104]: 
    filename   m     n
0  test0.dat  12  None
1  test2.dat  13    16

回答by zach

If you have one large dataframe and only a few update values I would use apply like this:

如果您有一个大型数据框并且只有几个更新值,我会像这样使用 apply:

import pandas as pd

df = pd.DataFrame({'filename' :  ['test0.dat', 'test2.dat'], 
                                  'm': [12, 13], 'n' : [None, None]})

data = {'filename' :  'test2.dat', 'n':16}

def update_vals(row, data=data):
    if row.filename == data['filename']:
        row.n = data['n']
    return row

df.apply(update_vals, axis=1)