Python 在 Pandas 中查找类型为 float 或特定类型的所有数据框列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21720022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:28:59  来源:igfitidea点击:

Find all columns of dataframe in Pandas whose type is float, or a particular type?

pythonpandasdataframedata-cleaning

提问by Yu Shen

I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to the mixed nature, I cannot use

我有一个数据框 df,它有一些 float64 类型的列,而其他列是对象。由于混合性质,我不能使用

df.fillna('unknown') #getting error "ValueError: could not convert string to float:"

as the error happened with the columns whose type is float64 (what a misleading error message!)

因为错误发生在类型为 float64 的列上(多么具有误导性的错误消息!)

so I'd wish that I could do something like

所以我希望我能做类似的事情

for col in df.columns[<dtype == object>]:
    df[col] = df[col].fillna("unknown")

So my question is if there is any such filter expression that I can use with df.columns?

所以我的问题是,是否有任何此类过滤器表达式可以与 df.columns 一起使用?

I guess alternatively, less elegantly, I could do:

我想或者,不那么优雅,我可以这样做:

 for col in df.columns:
        if (df[col].dtype == dtype('O')): # for object type
            df[col] = df[col].fillna('') 
            # still puzzled, only empty string works as replacement, 'unknown' would not work for certain value leading to error of "ValueError: Error parsing datetime string "unknown" at position 0" 

I also would like to know why in the above code replacing '' with 'unknown' the code would work for certain cells but failed with a cell with the error of "ValueError: Error parsing datetime string "unknown" at position 0"

我还想知道为什么在上面的代码中用 'unknown' 替换 '' 代码对某些单元格有效,但由于单元格失败,错误为“ValueError: Error parsing datetime string “unknown” at position 0”

Thanks a lot!

非常感谢!

Yu

采纳答案by Andy Hayden

You can see what the dtype is for all the columns using the dtypes attribute:

您可以使用 dtypes 属性查看所有列的 dtype:

In [11]: df = pd.DataFrame([[1, 'a', 2.]])

In [12]: df
Out[12]: 
   0  1  2
0  1  a  2

In [13]: df.dtypes
Out[13]: 
0      int64
1     object
2    float64
dtype: object

In [14]: df.dtypes == object
Out[14]: 
0    False
1     True
2    False
dtype: bool

To access the object columns:

要访问对象列:

In [15]: df.loc[:, df.dtypes == object]
Out[15]: 
   1
0  a

I think it's most explicit to use (I'm not surethat inplace would work here):

我认为使用它是最明确的(我不确定就地是否可以在这里工作):

In [16]: df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('')

Saying that, I recommend you use NaN for missing data.

话虽如此,我建议您对缺失数据使用NaN

回答by RNA

This is conciser:

这更简洁:

# select the float columns
df_num = df.select_dtypes(include=[np.float])
# select non-numeric columns
df_num = df.select_dtypes(exclude=[np.number])

回答by Jaroslav Bezděk

As @RNA said, you can use pandas.DataFrame.select_dtypes. The code using your example from a question would look like this:

正如@RNA 所说,您可以使用pandas.DataFrame.select_dtypes。使用您的问题示例的代码如下所示:

for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].fillna('unknown')