使用 loc 更新数据框 python pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34499584/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:26:27  来源:igfitidea点击:

Use of loc to update a dataframe python pandas

pythonpandasdataframeupdatingloc

提问by Data Enthusiast

I have a pandas dataframe (df) with the column structure :

我有一个列结构的Pandas数据框(df):

month a b c d

this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D

此数据框包含 Jan、Feb、Mar、Apr 的数据。A、B、C、D 是数字列。对于 Feb 月份,我想重新计算 A 列并在数据框中更新它,即月份 = Feb, A = B + C + D

Code I used :

我使用的代码:

 df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D'] 

This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :

这运行没有错误,但没有更改 2 月份 A 列中的值。在控制台中,它给出了一条消息:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

试图在来自 DataFrame 的切片副本上设置值。
尝试使用 .loc[row_indexer,col_indexer] = value 代替

I tried to use .loc but right now the dataframe I am working on, I had used .reset_index()on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here? This is an example dataframe :

我尝试使用 .loc 但现在我正在处理的数据帧,我已经使用.reset_index()过它,但我不确定如何设置索引和使用 .loc。我遵循了文档,但不清楚。你能帮我一下吗?这是一个示例数据框:

 import pandas as pd import numpy as np
 dates = pd.date_range('1/1/2000', periods=8)
 df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) 

I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.

我想更新一个日期:2000-01-03。我无法提供我的数据片段,因为它是实时数据。

回答by Anton Protopopov

As you could see from the warning you should use loc[row_index, col_index]. When you subsetting your data you get index values. You just need to pass for row_indexand then with comma col_name:

正如您从警告中看到的,您应该使用loc[row_index, col_index]. 当您对数据进行子集化时,您将获得索引值。你只需要传递 forrow_index然后用逗号col_name

df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D'] 

回答by DeepSpace

While not being the most beautiful, the way I would achieve your goal (without explicitly iterating over the rows) is:

虽然不是最漂亮的,但我实现目标的方式(不显式迭代行)是:

df.ix[df['month'] == 'Feb', 'a'] = df[df['month'] == 'Feb']['b'] + df[df['month'] == 'Feb']['c']  

Note: ixhas been deprecatedsince Pandas v0.20.0 in favour of iloc/ loc.

注意:自 Pandas v0.20.0ix起已弃用iloc/ loc