Python 在 Pandas 中查找类型为 float 或特定类型的所有数据框列？

Question

提问by Yu Shen

I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to the mixed nature, I cannot use

我有一个数据框 df，它有一些 float64 类型的列，而其他列是对象。由于混合性质，我不能使用

df.fillna('unknown') #getting error "ValueError: could not convert string to float:"

as the error happened with the columns whose type is float64 (what a misleading error message!)

因为错误发生在类型为 float64 的列上（多么具有误导性的错误消息！）

so I'd wish that I could do something like

所以我希望我能做类似的事情

for col in df.columns[<dtype == object>]:
    df[col] = df[col].fillna("unknown")

So my question is if there is any such filter expression that I can use with df.columns?

所以我的问题是，是否有任何此类过滤器表达式可以与 df.columns 一起使用？

I guess alternatively, less elegantly, I could do:

我想或者，不那么优雅，我可以这样做：

 for col in df.columns:
        if (df[col].dtype == dtype('O')): # for object type
            df[col] = df[col].fillna('') 
            # still puzzled, only empty string works as replacement, 'unknown' would not work for certain value leading to error of "ValueError: Error parsing datetime string "unknown" at position 0"

I also would like to know why in the above code replacing '' with 'unknown' the code would work for certain cells but failed with a cell with the error of "ValueError: Error parsing datetime string "unknown" at position 0"

我还想知道为什么在上面的代码中用 'unknown' 替换 '' 代码对某些单元格有效，但由于单元格失败，错误为“ValueError: Error parsing datetime string “unknown” at position 0”

Thanks a lot!

非常感谢！

Yu

于

Answer 1

采纳答案by Andy Hayden

You can see what the dtype is for all the columns using the dtypes attribute:

您可以使用 dtypes 属性查看所有列的 dtype：

In [11]: df = pd.DataFrame([[1, 'a', 2.]])

In [12]: df
Out[12]: 
   0  1  2
0  1  a  2

In [13]: df.dtypes
Out[13]: 
0      int64
1     object
2    float64
dtype: object

In [14]: df.dtypes == object
Out[14]: 
0    False
1     True
2    False
dtype: bool

To access the object columns:

要访问对象列：

In [15]: df.loc[:, df.dtypes == object]
Out[15]: 
   1
0  a

I think it's most explicit to use (I'm not surethat inplace would work here):

我认为使用它是最明确的（我不确定就地是否可以在这里工作）：

In [16]: df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('')

Saying that, I recommend you use NaN for missing data.

话虽如此，我建议您对缺失数据使用NaN。

Answer 2

回答by RNA

This is conciser:

这更简洁：

# select the float columns
df_num = df.select_dtypes(include=[np.float])
# select non-numeric columns
df_num = df.select_dtypes(exclude=[np.number])

Answer 3

回答by Jaroslav Bezděk

As @RNA said, you can use pandas.DataFrame.select_dtypes. The code using your example from a question would look like this:

正如@RNA 所说，您可以使用pandas.DataFrame.select_dtypes。使用您的问题示例的代码如下所示：

for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].fillna('unknown')

Python 在 Pandas 中查找类型为 float 或特定类型的所有数据框列？

提问by Yu Shen

采纳答案by Andy Hayden

回答by RNA

回答by Jaroslav Bezděk

相关推荐

最近更新

标签

Python 在 Pandas 中查找类型为 float 或特定类型的所有数据框列？

提问by Yu Shen

采纳答案by Andy Hayden

回答by RNA

回答by Jaroslav Bezděk

相关推荐

Python 导入错误：没有名为 flask.ext.login 的模块

Python Numpy 逆掩码

Python 使用 h5py 增量写入 hdf5

Python 根据 if-elif-else 条件创建新列

相关推荐

最近更新

标签