Python 熊猫不会填充()到位

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21998354/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:03:28  来源:igfitidea点击:

Pandas won't fillna() inplace

pythonpandasdataframe

提问by Beau Bristow

I'm trying to fill NAs with "" on 4 specific columns in a data frame that are string/object types. I can assign these columns to a new variable as I fillna(), but when I fillna() inplace the underlying data doesn't change.

我试图在字符串/对象类型的数据框中的 4 个特定列上用 "" 填充 NA。我可以在填充()时将这些列分配给一个新变量,但是当我填充()时,基础数据不会改变。

a_n6 = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
a_n6

gives me:

给我:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 4 columns):
PROV LAST     1542  non-null values
PROV FIRST    1542  non-null values
PROV MID      1542  non-null values
SPEC NM       1542  non-null values
dtypes: object(4)

but

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("", inplace=True)
a_n6

gives me:

给我:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 7 columns):
NPI           1103  non-null values
PIN           1542  non-null values
PROV FIRST    1541  non-null values
PROV LAST     1542  non-null values
PROV MID      1316  non-null values
SPEC NM       1541  non-null values
flag          439  non-null values
dtypes: float64(2), int64(1), object(4)

It's just one row, but still frustrating. What am I doing wrong?

这只是一排,但仍然令人沮丧。我究竟做错了什么?

回答by Jeff

you are filling a copy (which you then can't see)

你正在填写一份副本(然后你看不到)

either:

任何一个:

  • don't fillnainplace (there is no performance gain from doing something inplace)
  • 不要fillna就地(就地做某事没有性能提升)

for example

例如

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]] = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")

or preferably

或者最好

a_n6.fillna({'PROV LAST': '', 'PROV FIRST': '',
            'PROV MID': '', 'SPEC NM': ''}, inplace=True)

here's a more in-depth explanation Pandas: Chained assignments

这里有一个更深入的解释 Pandas: Chained assignments

回答by C8H10N4O2

Use a dictas the valueargument to fillna()

使用 adict作为value参数fillna()

As mentioned in the comment by @rhkarls on @Jeff's answer, using .locindexed to a list of columns won't support inplaceoperations, which I too find frustrating. Here's a workaround.

正如@rhkarls 对@Jeff 的回答的评论中所述,使用.loc索引到列列表将不支持inplace操作,我也觉得这令人沮丧。这是一个解决方法。

Example:

例子:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,np.nan],
                   'b':[6,7,8,np.nan,np.nan],
                   'x':[11,12,13,np.nan,np.nan],
                   'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   NaN
#2  3.0  8.0  13.0   NaN
#3  4.0  NaN   NaN  19.0
#4  NaN  NaN   NaN   NaN

Let's say we want to fillnafor xand yonly, notaand b.

假设我们想要fillnafor xand yonly,而不是aand b

I would expect using .locto work (as in an assignment), but it doesn't, as mentioned earlier:

我希望使用.loc工作(如在任务中),但它没有,如前所述:

# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed

However, the documentationsays that the valueargument to fillna()can be:

但是,文档valueto的论点fillna()可以是:

alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled).

或者一个 dict/Series/DataFrame 值,指定用于每个索引(对于系列)或列(对于数据帧)的值。(不在 dict/Series/DataFrame 中的值将不会被填充)。

It turns out that using a dict of values will work:

事实证明,使用值的字典会起作用:

# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   0.0
#2  3.0  8.0  13.0   0.0
#3  4.0  NaN   0.0  19.0
#4  NaN  NaN   0.0   0.0

Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:

此外,如果您的子集中有很多列,则可以使用字典理解,如下所示:

df.fillna({x:0 for x in ['x','y']}, inplace=True) # also works

回答by user2677285

a workaround is to save fillna results in another variable and assign it back like this:

一种解决方法是将 fillna 结果保存在另一个变量中,然后像这样将其分配回来:

na_values_filled = X.fillna(0)
X = na_values_filled

My exact example (which I couldn't get to work otherwise) was a case where I wanted to fillna in only the first line of every group. Like this:

我的确切示例(否则我无法开始工作)是我只想填充每个组的第一行的情况。像这样:

groups = one_train.groupby("installation_id")
first_indexes_filled = one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'].fillna(0)
one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'] =  first_indexes_filled

My case may be unnecessarily complicated but i think the general "save results, then assign back" method should work as a workaround for the failing inplace=True

我的情况可能不必要地复杂,但我认为一般的“保存结果,然后分配回”方法应该作为失败的 inplace=True 的解决方法