pandas 应用函数后，在 DataFrame 中就地更改系列

Question

提问by Infinity

I'm trying to use pandasin order to change one of my columns in-place, using simple function.

我正在尝试pandas使用简单的函数来就地更改我的一列。

After reading the whole Dataframe, I tried to apply function on one Serie:

阅读整个 Dataframe 后，我尝试在一个系列上应用函数：

wanted_data.age.apply(lambda x: x+1)

And it's working great. The only problem occurs when I try to put it back into my DataFrame:

而且效果很好。当我尝试将它放回我的 DataFrame 时，会出现唯一的问题：

wanted_data.age = wanted_data.age.apply(lambda x: x+1)

or:

或者：

wanted_data['age'] = wanted_data.age.apply(lambda x: x+1)

Throwing the following warning:

抛出以下警告：

> C:\Anaconda\lib\site-packages\pandas\core\generic.py:1974:
> SettingWithCopyWarning: A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
> value instead
> 
> See the the caveats in the documentation:
> http://pandas.pydata.org/pandas-docs/stable
> /indexing.html#indexing-view-versus-copy   self[name] = value

Of Course, I can set the DataFrame using the long form of:

当然，我可以使用以下长格式设置 DataFrame：

wanted_data.loc[:, 'age'] = wanted_data.age.apply(lambda x: x+1)

But is there no other, easier and more syntactic-nicer way to do it?

但是有没有其他更简单、更语法更好的方法来做到这一点？

Thanks!

谢谢！

Answer 1

采纳答案by Alexander

Use loc:

使用loc：

wanted_data.loc[:, 'age'] = wanted_data.age.apply(lambda x: x + 1)

Answer 2

回答by Irfanullah

I would suggest wanted_data['age']= wanted_data['age'].apply(lambda x: x+1),then save file as wanted_data.to_csv(fname,index=False), where "fname" is the name of a file to be updated.

我建议 wanted_data['age']= wanted_data['age'].apply(lambda x: x+1)，然后将文件另存为 wanted_data.to_csv(fname,index=False)，其中“fname”是要更新的文件的名称。

Answer 3

回答by Thanasis Mattas

I cannot comment, so I'll leave this as an answer.

我无法发表评论，所以我将把它作为答案。

Because of the way chained indexing is hundled internally, you may get back a deep copy, instead of a reference to your initial DataFrame (For more see chained assignment - this is a very good source. Bare .loc[] always returns a reference). Thus, you may not assign back to your DataFrame, but to a copy of it. On the other hand, your format may return a reference to your initial Dataframe and, while mutating it, the initial DataFrame will mutate, too. Python prints this warning to beat the drum for the situation, so as the user can decide whether this is the wanted treatment or not.

由于链式索引在内部打包的方式，您可能会得到一个深层副本，而不是对初始 DataFrame 的引用（有关更多信息，请参阅链式分配 -这是一个非常好的来源。裸 .loc[] 总是返回一个引用） . 因此，您可能不会分配回您的 DataFrame，而是分配给它的副本。另一方面，您的格式可能会返回对初始 Dataframe 的引用，并且在对其进行变异时，初始 DataFrame 也会发生变异。Python 打印此警告以应对这种情况，以便用户可以决定这是否是想要的处理方式。

If you know what you're doing, you can silence the warning using:

如果您知道自己在做什么，则可以使用以下命令使警告静音：

with pd.options.mode.chained_assignment = "None":
    wanted_data.age = wanted_data.age.apply(lambda x: x+1)

If you think that this is an important manner (e.g. there is the possibility of unintentionally mutating the initial DataFrame), you can set the above option to "raise", so that an error would be raised, instead of a warning.

如果您认为这是一种重要的方式（例如，有可能无意中改变了初始 DataFrame），您可以将上述选项设置为“raise”，这样就会引发错误，而不是警告。

Also, I think usage of the term "inplace" is not fully correct. "inplace" is used as an argument at some methods, so as to mutate an object without assigning it to itself (the assignment is hundled internally), and apply() does not support this feature.

另外，我认为“就地”一词的用法并不完全正确。“就地”在某些方法中用作参数，以便在不将对象分配给自身的情况下对其进行变异（分配在内部进行），而 apply() 不支持此功能。

pandas 应用函数后，在 DataFrame 中就地更改系列

提问by Infinity

采纳答案by Alexander

回答by Irfanullah

回答by Thanasis Mattas

相关推荐

最近更新

标签

pandas 应用函数后，在 DataFrame 中就地更改系列

提问by Infinity

采纳答案by Alexander

回答by Irfanullah

回答by Thanasis Mattas

相关推荐

pandas 熊猫，按列和行选择

pandas sklearn.cross_validation.StratifiedShuffleSplit - 错误：“索引越界”

pandas NetworkX From_Pandas_dataframe

pandas 熊猫离开加入并更新现有列

相关推荐

最近更新

标签