python：在存在字符串的情况下将pandas数据帧中的数值数据转换为浮点数

Question

提问by natsuki_2002

I've got a pandas dataframe with a column 'cap'. This column mostly consists of floats but has a few strings in it, for instance at index 2.

我有一个带有“cap”列的熊猫数据框。该列主要由浮点数组成，但其中有一些字符串，例如在索引 2 处。

df =
    cap
0    5.2
1    na
2    2.2
3    7.6
4    7.5
5    3.0
...

I import my data from a csv file like so:

我从一个 csv 文件导入我的数据，如下所示：

df = DataFrame(pd.read_csv(myfile.file))

Unfortunately, when I do this, the column 'cap' is imported entirely as strings. I would like floats to be identified as floats and strings as strings. Trying to convert this using:

不幸的是，当我这样做时，列 'cap' 完全作为字符串导入。我希望将浮点数标识为浮点数，将字符串标识为字符串。尝试使用以下方法转换：

df['cap'] = df['cap'].astype(float)

throws up an error:

抛出一个错误：

could not convert string to float: na

Is there any way to make all the numbers into floats but keep the 'na' as a string?

有没有办法将所有数字变成浮点数，但将 'na' 保留为字符串？

Answer 1

采纳答案by Acorbe

Here is a possible workaround

这是一个可能的解决方法

first you define a function that converts numbers to float only when needed

首先，您定义一个函数，仅在需要时将数字转换为浮点数

 def to_number(s):
    try:
        s1 = float(s)
        return s1
    except ValueError:
        return s

and then you apply it row by row.

然后逐行应用它。

Example:

例子：

given

给予

where both aand 2are strings, we do the conversion via

其中a和2都是字符串，我们通过

converted = df.apply(lambda f : to_number(f[0]) , axis = 1)  

 converted
 0    a
 1    2

A direct check on the types:

直接检查类型：

type(converted.iloc[0])                                                                                                                             
str

type(converted.iloc[1])                                                                                                                             
float

Answer 2

回答by Andy Hayden

Calculations with columns of float64 dtype (rather than object) are much more efficient, so this is usually preferred... it will also allow you to do other calculations. Because of this is recommended to use NaN for missing data(rather than your own placeholder, or None).

使用 float64 dtype（而不是对象）列的计算效率更高，因此这通常是首选......它还允许您进行其他计算。因此，建议对缺失数据使用 NaN（而不是您自己的占位符或 None）。

Is this really the answer you want?

这真的是你想要的答案吗？

In [11]: df.sum()  # all strings
Out[11]: 
cap    5.2na2.27.67.53.0
dtype: object

In [12]: df.apply(lambda f: to_number(f[0]), axis=1).sum()  # floats and 'na' strings
TypeError: unsupported operand type(s) for +: 'float' and 'str'

You should use convert_numeric to coerce to floats:

您应该使用 convert_numeric 来强制浮动：

In [21]: df.convert_objects(convert_numeric=True)
Out[21]: 
   cap
0  5.2
1  NaN
2  2.2
3  7.6
4  7.5
5  3.0

Or read it in directly as a csv, by appending 'na' to the list of values to be considered NaN:

或者通过将“na”附加到要被视为 NaN 的值列表，直接将其作为 csv 读入：

In [22]: pd.read_csv(myfile.file, na_values=['na'])
Out[22]: 
   cap
0  5.2
1  NaN
2  2.2
3  7.6
4  7.5
5  3.0

In either case, sum (and many other pandas functions) will now work:

无论哪种情况， sum（以及许多其他 Pandas 函数）现在都可以工作：

In [23]: df.sum()
Out[23]:
cap    25.5
dtype: float64

As Jeff advises:

正如杰夫所建议的：

repeat 3 times fast: object==bad, float==good

快速重复 3 次：object==bad，float==good

Answer 3

回答by reabow

I tried an alternative on the above:

我在上面尝试了另一种选择：

for num, item in enumerate(data['col']):
    try:
        float(item)
    except:
        data['col'][num] = nan

Answer 4

回答by Victor Grau Serrat

First of all the way you import you CSV is redundant, instead of doing:

首先，您导入 CSV 的方式是多余的，而不是执行以下操作：

df = DataFrame(pd.read_csv(myfile.file))

You can do directly:

你可以直接做：

df = pd.read_csv(myfile.file)

Then to convert to float, and put whatever is not a number as NaN:

然后转换为浮点数，并将任何不是数字的内容作为 NaN：

df = pd.to_numeric(df, errors='coerce')

python：在存在字符串的情况下将pandas数据帧中的数值数据转换为浮点数

提问by natsuki_2002

采纳答案by Acorbe

回答by Andy Hayden

Is this really the answer you want?

这真的是你想要的答案吗？

repeat 3 times fast: object==bad, float==good

快速重复 3 次：object==bad，float==good

回答by reabow

回答by Victor Grau Serrat

相关推荐

最近更新

标签

python：在存在字符串的情况下将pandas数据帧中的数值数据转换为浮点数

提问by natsuki_2002

采纳答案by Acorbe

回答by Andy Hayden

Is this really the answer you want?

这真的是你想要的答案吗？

repeat 3 times fast: object==bad, float==good

快速重复 3 次：object==bad，float==good

回答by reabow

回答by Victor Grau Serrat

相关推荐

Python 中的洪水填充

Python 了解 Beautiful Soup 中的 Find() 函数

Python Matplotlib 维恩图

Python 没有模块名称pyspark错误

相关推荐

最近更新

标签