在 Python Pandas 中的多个列中填充
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34913590/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fillna in multiple columns in place in Python Pandas
提问by ozzy
I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.
我有一个混合类型的熊猫数据帧,有些是字符串,有些是数字。我想用 '.' 替换字符串列中的 NAN 值,用 0 替换浮点列中的 NAN 值。
Consider this small fictitious example:
考虑这个虚构的小例子:
df = pd.DataFrame({'Name':['Hyman','Sue',pd.np.nan,'Bob','Alice','John'],
'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})
Now, I can do it in 3 lines:
现在,我可以用 3 行来完成:
df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)
Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?
由于这是一个小数据框,因此 3 行可能没问题。在我的真实示例中(由于数据机密性原因,我无法在此处分享),我有更多的字符串列和数字列。所以我最终只为fillna写了很多行。有没有一种简洁的方法来做到这一点?
采纳答案by Anton Protopopov
You could use apply
for your columns with checking dtype
whether it's numeric
or not by checking dtype.kind
:
您可以使用apply
您的列来检查dtype
它是否numeric
通过检查dtype.kind
:
res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))
print(res)
A B City Name
0 1.0 0.25 Seattle Hyman
1 2.1 0.00 SF Sue
2 0.0 0.00 LA .
3 4.7 4.00 OC Bob
4 5.6 12.20 . Alice
5 6.8 14.40 . John
回答by Bob Baxley
You can either list the string columns by hand or glean them from df.dtypes
. Once you have the list of string/object columns, you can call fillna
on all those columns at once.
您可以手动列出字符串列,也可以从df.dtypes
. 获得字符串/对象列的列表后,您可以一次调用fillna
所有这些列。
# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df.fillna(0,inplace=True)
回答by latorrefabian
define a function:
定义一个函数:
def myfillna(series):
if series.dtype is pd.np.dtype(float):
return series.fillna(0)
elif series.dtype is pd.np.dtype(object):
return series.fillna('.')
else:
return series
you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe
如果您想以其他方式填充不同 dtype 的列,您可以添加其他 elif 语句。现在将此函数应用于数据框的所有列
df = df.apply(myfillna)
this is the same as 'inplace'
这与“就地”相同
回答by Rob Bulmahn
Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.
在寻找这个问题的答案时遇到了这个页面,但不喜欢现有的答案。我最终在DataFrame.fillna 文档中找到了更好的东西,并认为我会为发生在这方面的任何其他人做出贡献。
If you have multiple columns, but only want to replace the NaN
in a subset of them, you can use:
如果您有多个列,但只想替换其中NaN
的一个子集,您可以使用:
df.fillna({'Name':'.', 'City':'.'}, inplace=True)
This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN
values, you can just throw another fillna
on the end:
这也允许您为每列指定不同的替换。如果你想继续填充所有剩余的NaN
值,你可以fillna
在最后抛出另一个:
df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)