如何更新 Python Pandas DataFrame 中特定行中的值？

Question

提问by Alexander

With the nice indexing methods in Pandas I have no problems extracting data in various ways. On the other hand I am still confused about how to change data in an existing DataFrame.

使用 Pandas 中出色的索引方法，我可以轻松地以各种方式提取数据。另一方面，我仍然对如何更改现有 DataFrame 中的数据感到困惑。

In the following code I have two DataFrames and my goal is to update values in a specific row in the first df from values of the second df. How can I achieve this?

在下面的代码中，我有两个 DataFrame，我的目标是从第二个 df 的值更新第一个 df 中特定行中的值。我怎样才能做到这一点？

import pandas as pd
df = pd.DataFrame({'filename' :  ['test0.dat', 'test2.dat'], 
                                  'm': [12, 13], 'n' : [None, None]})
df2 = pd.DataFrame({'filename' :  'test2.dat', 'n':16}, index=[0])

# this overwrites the first row but we want to update the second
# df.update(df2)

# this does not update anything
df.loc[df.filename == 'test2.dat'].update(df2)

print(df)

gives

给

   filename   m     n
0  test0.dat  12  None
1  test2.dat  13  None

[2 rows x 3 columns]

but how can I achieve this:

但我怎样才能做到这一点：

    filename   m     n
0  test0.dat  12  None
1  test2.dat  13  16

[2 rows x 3 columns]

Answer 1

采纳答案by FooBar

So first of all, pandas updates using the index. When an update command does not update anything, check both left-hand side and right-hand side. If for some reason you are too lazy to update the indices to follow your identification logic, you can do something along the lines of

所以首先，pandas 使用 index 进行更新。当更新命令未更新任何内容时，请检查左侧和右侧。如果由于某种原因你懒得更新索引来遵循你的识别逻辑，你可以做一些类似的事情

>>> df.loc[df.filename == 'test2.dat', 'n'] = df2[df2.filename == 'test2.dat'].loc[0]['n']
>>> df
Out[331]: 
    filename   m     n
0  test0.dat  12  None
1  test2.dat  13    16

If you want to do this for the whole table, I suggest a method I believe is superior to the previously mentioned ones: since your identifier is filename, set filenameas your index, and then use update()as you wanted to. Both mergeand the apply()approach contain unnecessary overhead:

如果您想对整个表执行此操作，我建议使用一种我认为优于前面提到的方法：由于您的标识符是filename，请将其设置filename为您的索引，然后update()根据需要使用。无论merge和apply()方法包含不必要的开销：

>>> df.set_index('filename', inplace=True)
>>> df2.set_index('filename', inplace=True)
>>> df.update(df2)
>>> df
Out[292]: 
            m     n
filename           
test0.dat  12  None
test2.dat  13    16

Answer 2

回答by chrisb

There are probably a few ways to do this, but one approach would be to merge the two dataframes together on the filename/m column, then populate the column 'n' from the right dataframe if a match was found. The n_x, n_y in the code refer to the left/right dataframes in the merge.

可能有几种方法可以做到这一点，但一种方法是在 filename/m 列上将两个数据帧合并在一起，然后如果找到匹配项，则从正确的数据帧中填充列“n”。代码中的 n_x、n_y 指的是合并中的左/右数据帧。

In[100] : df = pd.merge(df1, df2, how='left', on=['filename','m'])

In[101] : df
Out[101]: 
    filename   m   n_x  n_y
0  test0.dat  12  None  NaN
1  test2.dat  13  None   16

In[102] : df['n'] = df['n_y'].fillna(df['n_x'])

In[103] : df = df.drop(['n_x','n_y'], axis=1)

In[104] : df
Out[104]: 
    filename   m     n
0  test0.dat  12  None
1  test2.dat  13    16

Answer 3

回答by zach

If you have one large dataframe and only a few update values I would use apply like this:

如果您有一个大型数据框并且只有几个更新值，我会像这样使用 apply：

import pandas as pd

df = pd.DataFrame({'filename' :  ['test0.dat', 'test2.dat'], 
                                  'm': [12, 13], 'n' : [None, None]})

data = {'filename' :  'test2.dat', 'n':16}

def update_vals(row, data=data):
    if row.filename == data['filename']:
        row.n = data['n']
    return row

df.apply(update_vals, axis=1)

如何更新 Python Pandas DataFrame 中特定行中的值？

提问by Alexander

采纳答案by FooBar

回答by chrisb

回答by zach

相关推荐

最近更新

标签

如何更新 Python Pandas DataFrame 中特定行中的值？

提问by Alexander

采纳答案by FooBar

回答by chrisb

回答by zach

相关推荐

Python 在 csv 文件中单独读取列名

python“类型错误：'numpy.float64'对象不能解释为整数”

Python 如何在单元测试中使用 JSON 发送请求

Python UnicodeDecodeError: 'utf8' 编解码器无法解码位置 34 中的字节 0xc3：数据意外结束

相关推荐

最近更新

标签