Python 熊猫不会填充（）到位

Question

提问by Beau Bristow

I'm trying to fill NAs with "" on 4 specific columns in a data frame that are string/object types. I can assign these columns to a new variable as I fillna(), but when I fillna() inplace the underlying data doesn't change.

我试图在字符串/对象类型的数据框中的 4 个特定列上用 "" 填充 NA。我可以在填充（）时将这些列分配给一个新变量，但是当我填充（）时，基础数据不会改变。

a_n6 = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
a_n6

gives me:

给我：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 4 columns):
PROV LAST     1542  non-null values
PROV FIRST    1542  non-null values
PROV MID      1542  non-null values
SPEC NM       1542  non-null values
dtypes: object(4)

but

但

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("", inplace=True)
a_n6

gives me:

给我：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 7 columns):
NPI           1103  non-null values
PIN           1542  non-null values
PROV FIRST    1541  non-null values
PROV LAST     1542  non-null values
PROV MID      1316  non-null values
SPEC NM       1541  non-null values
flag          439  non-null values
dtypes: float64(2), int64(1), object(4)

It's just one row, but still frustrating. What am I doing wrong?

这只是一排，但仍然令人沮丧。我究竟做错了什么？

Answer 1

回答by Jeff

you are filling a copy (which you then can't see)

你正在填写一份副本（然后你看不到）

either:

任何一个：

don't fillnainplace (there is no performance gain from doing something inplace)

不要fillna就地（就地做某事没有性能提升）

for example

例如

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]] = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")

or preferably

或者最好

a_n6.fillna({'PROV LAST': '', 'PROV FIRST': '',
            'PROV MID': '', 'SPEC NM': ''}, inplace=True)

assign the copy to a new variable first (the a_n6[[list_of_fileds]]is a copy in a multi-dtype object), see here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy

首先将副本分配给一个新变量（这a_n6[[list_of_fileds]]是多数据类型对象中的副本），请参见此处：http: //pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-与副本

here's a more in-depth explanation Pandas: Chained assignments

这里有一个更深入的解释 Pandas: Chained assignments

Answer 2

回答by C8H10N4O2

Use a `dict`as the `value`argument to `fillna()`

使用 a`dict`作为`value`参数`fillna()`

As mentioned in the comment by @rhkarls on @Jeff's answer, using .locindexed to a list of columns won't support inplaceoperations, which I too find frustrating. Here's a workaround.

正如@rhkarls 对@Jeff 的回答的评论中所述，使用.loc索引到列列表将不支持inplace操作，我也觉得这令人沮丧。这是一个解决方法。

Example:

例子：

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,np.nan],
                   'b':[6,7,8,np.nan,np.nan],
                   'x':[11,12,13,np.nan,np.nan],
                   'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   NaN
#2  3.0  8.0  13.0   NaN
#3  4.0  NaN   NaN  19.0
#4  NaN  NaN   NaN   NaN

Let's say we want to fillnafor xand yonly, notaand b.

假设我们想要fillnafor xand yonly，而不是aand b。

I would expect using .locto work (as in an assignment), but it doesn't, as mentioned earlier:

我希望使用.loc工作（如在任务中），但它没有，如前所述：

# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed

However, the documentationsays that the valueargument to fillna()can be:

但是，文档说valueto的论点fillna()可以是：

alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled).

或者一个 dict/Series/DataFrame 值，指定用于每个索引（对于系列）或列（对于数据帧）的值。（不在 dict/Series/DataFrame 中的值将不会被填充）。

It turns out that using a dict of values will work:

事实证明，使用值的字典会起作用：

# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   0.0
#2  3.0  8.0  13.0   0.0
#3  4.0  NaN   0.0  19.0
#4  NaN  NaN   0.0   0.0

Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:

此外，如果您的子集中有很多列，则可以使用字典理解，如下所示：

df.fillna({x:0 for x in ['x','y']}, inplace=True) # also works

Answer 3

回答by user2677285

a workaround is to save fillna results in another variable and assign it back like this:

一种解决方法是将 fillna 结果保存在另一个变量中，然后像这样将其分配回来：

na_values_filled = X.fillna(0)
X = na_values_filled

My exact example (which I couldn't get to work otherwise) was a case where I wanted to fillna in only the first line of every group. Like this:

我的确切示例（否则我无法开始工作）是我只想填充每个组的第一行的情况。像这样：

groups = one_train.groupby("installation_id")
first_indexes_filled = one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'].fillna(0)
one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'] =  first_indexes_filled

My case may be unnecessarily complicated but i think the general "save results, then assign back" method should work as a workaround for the failing inplace=True

我的情况可能不必要地复杂，但我认为一般的“保存结果，然后分配回”方法应该作为失败的 inplace=True 的解决方法

Python 熊猫不会填充（）到位

提问by Beau Bristow

回答by Jeff

回答by C8H10N4O2

Use a `dict`as the `value`argument to `fillna()`

使用 a`dict`作为`value`参数`fillna()`

回答by user2677285

相关推荐

最近更新

标签

Python 熊猫不会填充（）到位

提问by Beau Bristow

回答by Jeff

回答by C8H10N4O2

Use a dictas the valueargument to fillna()

使用 adict作为value参数fillna()

回答by user2677285

相关推荐

Python 错误绑定参数 0：可能不受支持的类型

Python 如何修复 Selenium WebDriverException：在我们可以连接之前浏览器似乎已经退出？

转换为二进制并在 Python 中保留前导零

Python - 从文件中加载 JSON 不起作用

相关推荐

最近更新

标签

Use a `dict`as the `value`argument to `fillna()`

使用 a`dict`作为`value`参数`fillna()`