Python 熊猫不会填充()到位
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21998354/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas won't fillna() inplace
提问by Beau Bristow
I'm trying to fill NAs with "" on 4 specific columns in a data frame that are string/object types. I can assign these columns to a new variable as I fillna(), but when I fillna() inplace the underlying data doesn't change.
我试图在字符串/对象类型的数据框中的 4 个特定列上用 "" 填充 NA。我可以在填充()时将这些列分配给一个新变量,但是当我填充()时,基础数据不会改变。
a_n6 = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
a_n6
gives me:
给我:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 4 columns):
PROV LAST 1542 non-null values
PROV FIRST 1542 non-null values
PROV MID 1542 non-null values
SPEC NM 1542 non-null values
dtypes: object(4)
but
但
a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("", inplace=True)
a_n6
gives me:
给我:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 7 columns):
NPI 1103 non-null values
PIN 1542 non-null values
PROV FIRST 1541 non-null values
PROV LAST 1542 non-null values
PROV MID 1316 non-null values
SPEC NM 1541 non-null values
flag 439 non-null values
dtypes: float64(2), int64(1), object(4)
It's just one row, but still frustrating. What am I doing wrong?
这只是一排,但仍然令人沮丧。我究竟做错了什么?
回答by Jeff
you are filling a copy (which you then can't see)
你正在填写一份副本(然后你看不到)
either:
任何一个:
- don't
fillnainplace (there is no performance gain from doing something inplace)
- 不要
fillna就地(就地做某事没有性能提升)
for example
例如
a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]] = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
or preferably
或者最好
a_n6.fillna({'PROV LAST': '', 'PROV FIRST': '',
'PROV MID': '', 'SPEC NM': ''}, inplace=True)
- assign the copy to a new variable first (the
a_n6[[list_of_fileds]]is a copy in a multi-dtype object), see here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
- 首先将副本分配给一个新变量(这
a_n6[[list_of_fileds]]是多数据类型对象中的副本),请参见此处:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-与副本
here's a more in-depth explanation Pandas: Chained assignments
这里有一个更深入的解释 Pandas: Chained assignments
回答by C8H10N4O2
Use a dictas the valueargument to fillna()
使用 adict作为value参数fillna()
As mentioned in the comment by @rhkarls on @Jeff's answer, using .locindexed to a list of columns won't support inplaceoperations, which I too find frustrating. Here's a workaround.
正如@rhkarls 对@Jeff 的回答的评论中所述,使用.loc索引到列列表将不支持inplace操作,我也觉得这令人沮丧。这是一个解决方法。
Example:
例子:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4,np.nan],
'b':[6,7,8,np.nan,np.nan],
'x':[11,12,13,np.nan,np.nan],
'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
# a b x y
#0 1.0 6.0 11.0 16.0
#1 2.0 7.0 12.0 NaN
#2 3.0 8.0 13.0 NaN
#3 4.0 NaN NaN 19.0
#4 NaN NaN NaN NaN
Let's say we want to fillnafor xand yonly, notaand b.
假设我们想要fillnafor xand yonly,而不是aand b。
I would expect using .locto work (as in an assignment), but it doesn't, as mentioned earlier:
我希望使用.loc工作(如在任务中),但它没有,如前所述:
# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed
However, the documentationsays that the valueargument to fillna()can be:
但是,文档说valueto的论点fillna()可以是:
alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled).
或者一个 dict/Series/DataFrame 值,指定用于每个索引(对于系列)或列(对于数据帧)的值。(不在 dict/Series/DataFrame 中的值将不会被填充)。
It turns out that using a dict of values will work:
事实证明,使用值的字典会起作用:
# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
# a b x y
#0 1.0 6.0 11.0 16.0
#1 2.0 7.0 12.0 0.0
#2 3.0 8.0 13.0 0.0
#3 4.0 NaN 0.0 19.0
#4 NaN NaN 0.0 0.0
Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:
此外,如果您的子集中有很多列,则可以使用字典理解,如下所示:
df.fillna({x:0 for x in ['x','y']}, inplace=True) # also works
回答by user2677285
a workaround is to save fillna results in another variable and assign it back like this:
一种解决方法是将 fillna 结果保存在另一个变量中,然后像这样将其分配回来:
na_values_filled = X.fillna(0)
X = na_values_filled
My exact example (which I couldn't get to work otherwise) was a case where I wanted to fillna in only the first line of every group. Like this:
我的确切示例(否则我无法开始工作)是我只想填充每个组的第一行的情况。像这样:
groups = one_train.groupby("installation_id")
first_indexes_filled = one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'].fillna(0)
one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'] = first_indexes_filled
My case may be unnecessarily complicated but i think the general "save results, then assign back" method should work as a workaround for the failing inplace=True
我的情况可能不必要地复杂,但我认为一般的“保存结果,然后分配回”方法应该作为失败的 inplace=True 的解决方法

