在 Python Pandas 中的多个列中填充

Question

提问by ozzy

I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.

我有一个混合类型的熊猫数据帧，有些是字符串，有些是数字。我想用 '.' 替换字符串列中的 NAN 值，用 0 替换浮点列中的 NAN 值。

Consider this small fictitious example:

考虑这个虚构的小例子：

df = pd.DataFrame({'Name':['Hyman','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

现在，我可以用 3 行来完成：

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

由于这是一个小数据框，因此 3 行可能没问题。在我的真实示例中（由于数据机密性原因，我无法在此处分享），我有更多的字符串列和数字列。所以我最终只为fillna写了很多行。有没有一种简洁的方法来做到这一点？

Answer 1

采纳答案by Anton Protopopov

You could use applyfor your columns with checking dtypewhether it's numericor not by checking dtype.kind:

您可以使用apply您的列来检查dtype它是否numeric通过检查dtype.kind：

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Hyman
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John

Answer 2

回答by Bob Baxley

You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillnaon all those columns at once.

您可以手动列出字符串列，也可以从df.dtypes. 获得字符串/对象列的列表后，您可以一次调用fillna所有这些列。

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df.fillna(0,inplace=True)

Answer 3

回答by latorrefabian

define a function:

定义一个函数：

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

如果您想以其他方式填充不同 dtype 的列，您可以添加其他 elif 语句。现在将此函数应用于数据框的所有列

df = df.apply(myfillna)

this is the same as 'inplace'

这与“就地”相同

Answer 4

回答by Rob Bulmahn

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.

在寻找这个问题的答案时遇到了这个页面，但不喜欢现有的答案。我最终在DataFrame.fillna 文档中找到了更好的东西，并认为我会为发生在这方面的任何其他人做出贡献。

If you have multiple columns, but only want to replace the NaNin a subset of them, you can use:

如果您有多个列，但只想替换其中NaN的一个子集，您可以使用：

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaNvalues, you can just throw another fillnaon the end:

这也允许您为每列指定不同的替换。如果你想继续填充所有剩余的NaN值，你可以fillna在最后抛出另一个：

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

在 Python Pandas 中的多个列中填充

提问by ozzy

采纳答案by Anton Protopopov

回答by Bob Baxley

回答by latorrefabian

回答by Rob Bulmahn

相关推荐

最近更新

标签

在 Python Pandas 中的多个列中填充

提问by ozzy

采纳答案by Anton Protopopov

回答by Bob Baxley

回答by latorrefabian

回答by Rob Bulmahn

相关推荐

Python 如何从PDF文件中提取文本？

Python 将整个熊猫数据帧转换为熊猫中的整数（0.17.0）

Python 如何计算熊猫数据框中连续行之间的差异？

Python 查找字符串中有多少行

相关推荐

最近更新

标签