Pandas:将带有空字符串的列转换为浮动

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35465741/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:42:51  来源:igfitidea点击:

Pandas: convert column with empty strings to float

pythonpandas

提问by LateCoder

In my application, I receive a pandas DataFrame (say, block), that has a column called est. This column can contain a mix of strings or floats. I need to convert all values in the column to floats and have the column type be float64. I do so using the following code:

在我的应用程序中,我收到一个 Pandas DataFrame(比如block),它有一列名为est. 此列可以包含字符串或浮点数的混合。我需要将列中的所有值转换为浮点数并将列类型设为float64. 我使用以下代码执行此操作:

block[est].convert_objects(convert_numeric=True)
block[est].astype('float')

This works for most cases. However, in one case, estcontains all empty strings. In this case, the first statement executes without error, but the empty strings in the column remain empty strings. The second statement then causes an error: ValueError: could not convert string to float:.

这适用于大多数情况。但是,在一种情况下,est包含所有空字符串。在这种情况下,第一条语句执行没有错误,但列中的空字符串仍然是空字符串。然后第二个语句导致错误:ValueError: could not convert string to float:

How can I modify my code to handle a column with all empty strings?

如何修改我的代码以处理包含所有空字符串的列?

Edit: I know I can just do block[est].replace("", np.NaN), but I was wondering if there's some way to do it with just convert_objectsor astypethat I'm missing.

编辑:我知道我可以做到block[est].replace("", np.NaN),但我想知道是否有某种方法可以做到这一点,convert_objects或者astype我错过了。

Clarification: For project-specific reasons, I need to use pandas 0.16.2.

说明:由于项目特定的原因,我需要使用 Pandas 0.16.2。

Here's an interaction with some sample data that demonstrates the failure:

这是与一些演示失败的示例数据的交互:

>>> block = pd.DataFrame({"eps":["", ""]})
>>> block = block.convert_objects(convert_numeric=True)
>>> block["eps"]
0
1
Name: eps, dtype: object
>>> block["eps"].astype('float')
...
ValueError: could not convert string to float:

回答by mcrrnz

It's easier to do it using:

使用以下方法更容易:

pandas.to_numeric

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html

pandas.to_numeric

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html

import pandas as pd
df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})

df['eps'] = pd.to_numeric(df['eps'], errors='coerce')

'coerce' will convert any value error to NaN

'coerce' 会将任何值错误转换为 NaN

df['eps'].astype('float')
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

Then you can apply other functions without getting errors :

然后您可以应用其他功能而不会出错:

df['eps'].round()
0    1.0
1    2.0
2    2.0
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

回答by Alexander

def convert_float(val):
    try:
        return float(val)
    except ValueError:
        return np.nan

df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})
>>> df.eps.apply(lambda x: convert_float(x))
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64