Pandas：将带有空字符串的列转换为浮动

Question

提问by LateCoder

In my application, I receive a pandas DataFrame (say, block), that has a column called est. This column can contain a mix of strings or floats. I need to convert all values in the column to floats and have the column type be float64. I do so using the following code:

在我的应用程序中，我收到一个 Pandas DataFrame（比如block），它有一列名为est. 此列可以包含字符串或浮点数的混合。我需要将列中的所有值转换为浮点数并将列类型设为float64. 我使用以下代码执行此操作：

block[est].convert_objects(convert_numeric=True)
block[est].astype('float')

This works for most cases. However, in one case, estcontains all empty strings. In this case, the first statement executes without error, but the empty strings in the column remain empty strings. The second statement then causes an error: ValueError: could not convert string to float:.

这适用于大多数情况。但是，在一种情况下，est包含所有空字符串。在这种情况下，第一条语句执行没有错误，但列中的空字符串仍然是空字符串。然后第二个语句导致错误：ValueError: could not convert string to float:。

How can I modify my code to handle a column with all empty strings?

如何修改我的代码以处理包含所有空字符串的列？

Edit: I know I can just do block[est].replace("", np.NaN), but I was wondering if there's some way to do it with just convert_objectsor astypethat I'm missing.

编辑：我知道我可以做到block[est].replace("", np.NaN)，但我想知道是否有某种方法可以做到这一点，convert_objects或者astype我错过了。

Clarification: For project-specific reasons, I need to use pandas 0.16.2.

说明：由于项目特定的原因，我需要使用 Pandas 0.16.2。

Here's an interaction with some sample data that demonstrates the failure:

这是与一些演示失败的示例数据的交互：

>>> block = pd.DataFrame({"eps":["", ""]})
>>> block = block.convert_objects(convert_numeric=True)
>>> block["eps"]
0
1
Name: eps, dtype: object
>>> block["eps"].astype('float')
...
ValueError: could not convert string to float:

Answer 1

回答by mcrrnz

It's easier to do it using:

使用以下方法更容易：

pandas.to_numeric
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html

import pandas as pd
df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})

df['eps'] = pd.to_numeric(df['eps'], errors='coerce')

'coerce' will convert any value error to NaN

'coerce' 会将任何值错误转换为 NaN

df['eps'].astype('float')
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

Then you can apply other functions without getting errors :

然后您可以应用其他功能而不会出错：

df['eps'].round()
0    1.0
1    2.0
2    2.0
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

Answer 2

回答by Alexander

def convert_float(val):
    try:
        return float(val)
    except ValueError:
        return np.nan

df = pd.DataFrame({'eps': ['1', 1.6, '1.6', 'a', '', 'a1']})
>>> df.eps.apply(lambda x: convert_float(x))
0    1.0
1    1.6
2    1.6
3    NaN
4    NaN
5    NaN
Name: eps, dtype: float64

Pandas：将带有空字符串的列转换为浮动

提问by LateCoder

回答by mcrrnz

回答by Alexander

相关推荐

最近更新

标签

Pandas：将带有空字符串的列转换为浮动

提问by LateCoder

回答by mcrrnz

回答by Alexander

相关推荐

Pandas 将变量名传递给列名

Pandas 和 Rolling_Mean with Offset（平均每日交易量计算）

访问 Pandas 数据透视表中元素的正确方法

pandas 通过 pd.read_excel() 将 excel 表作为多索引数据框读取

相关推荐

最近更新

标签