Pandas:链式赋值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21463589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:38:32  来源:igfitidea点击:

Pandas: Chained assignments

pythonpandascopychained-assignment

提问by Zhubarb

I have been reading this linkon "Returning a view versus a copy". I do not really get how the chained assignmentconcept in Pandas works and how the usage of .ix(), .iloc(), or .loc()affects it.

我一直在阅读这个链接的“返回视图与副本”。我真的不明白的是如何链接分配在Pandas的概念工作和如何的使用.ix().iloc()或者.loc()影响它。

I get the SettingWithCopyWarningwarnings for the following lines of codes, where datais a Panda dataframe and amountis a column (Series) name in that dataframe:

我收到SettingWithCopyWarning以下代码行的警告,其中data是 Panda 数据框,amount是该数据框中的列(系列)名称:

data['amount'] = data['amount'].astype(float)

data["amount"].fillna(data.groupby("num")["amount"].transform("mean"), inplace=True)

data["amount"].fillna(mean_avg, inplace=True)

Looking at this code, is it obvious that I am doing something suboptimal? If so, can you let me know the replacement code lines?

看看这段代码,很明显我在做一些次优的事情吗?如果是这样,你能告诉我替换代码行吗?

I am aware of the below warning and like to think that the warnings in my case are false positives:

我知道以下警告,并认为我的情况下的警告是误报:

The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid assignment. There may be false positives; situations where a chained assignment is inadvertantly reported.

链式分配警告/异常旨在通知用户可能无效的分配。可能存在误报;无意中报告了链式分配的情况。

EDIT :the code leading to the first copy warning error.

编辑:导致第一个复制警告错误的代码。

data['amount'] = data.apply(lambda row: function1(row,date,qty), axis=1) 
data['amount'] = data['amount'].astype(float)

def function1(row,date,qty):
    try:
        if(row['currency'] == 'A'):
            result = row[qty]
        else:
            rate = lookup[lookup['Date']==row[date]][row['currency'] ]
            result = float(rate) * float(row[qty])
        return result
    except ValueError: # generic exception clause
        print "The current row causes an exception:"

回答by Jeff

The point of the SettingWithCopyis to warn the user that you maybe doing something that will not update the original data frame as one might expect.

的目的SettingWithCopy是警告用户您可能正在做一些不会像人们预期的那样更新原始数据框的事情。

Here, datais a dataframe, possibly of a single dtype (or not). You are then taking a reference to this data['amount']which is a Series, and updating it. This probably works in your case because you are returning the same dtype of data as existed.

这里data是一个数据帧,可能是单个 dtype(或不是)。然后,您将参考这个data['amount']系列,并对其进行更新。这可能适用于您的情况,因为您返回的数据类型与现有数据相同。

However it couldcreate a copy which updates a copy of data['amount']which you would not see; Then you would be wondering why it is not updating.

但是,它可以创建一个副本,该副本会更新data['amount']您看不到的副本;然后你会想知道为什么它不更新。

Pandas returns a copy of an object in almost all method calls. The inplaceoperations are a convience operation which work, but in general are not clear that data is being modified and could potentially work on copies.

Pandas 在几乎所有的方法调用中都会返回一个对象的副本。这些inplace操作是一种可行的操作,但通常不清楚数据正在被修改并且可能在副本上工作。

Much more clear to do this:

更清楚地做到这一点:

data['amount'] = data["amount"].fillna(data.groupby("num")["amount"].transform("mean"))

data["amount"] = data['amount'].fillna(mean_avg)

One further plus to working on copies. You can chain operations, this is not possible with inplaceones.

对副本工作的进一步加分。您可以链接操作,这是不可能inplace的。

e.g.

例如

data['amount'] = data['amount'].fillna(mean_avg)*2

And just an FYI. inplaceoperations are neither faster nor more memory efficient. my2c they should be banned. But too late on that API.

仅供参考。inplace操作既不更快,内存效率也更高。my2c 他们应该被禁止。但是在那个 API 上为时已晚。

You can of course turn this off:

你当然可以关闭它:

pd.set_option('chained_assignment',None)

Pandas runs with the entire test suite with this set to raise(so we know if chaining is happening) on, FYI.

Pandas 与整个测试套件一起运行,此设置为raise(因此我们知道是否正在发生链接),仅供参考。