用 $ 将货币转换为 Python pandas 中的数字

Question

提问by kevin

I have the following data in pandas dataframe:

我在熊猫数据框中有以下数据：

    state        1st        2nd             3rd
0   California  ,593,820 9,264,246    ,496,273
1   New York    ,861,680 ,336,041     ,317,300
2   Florida     ,942,848  ,369,589     ,697,244
3   Texas       ,536,817  ,830,712     ,736,941

I want to perform some simple analysis (e.g., sum, groupby) with three columns (1st, 2nd, 3rd), but the data type of those three columns is object (or string).

我想用三列（第一、第二、第三）执行一些简单的分析（例如，sum、groupby），但是这三列的数据类型是对象（或字符串）。

So I used the following code for data conversion:

所以我使用以下代码进行数据转换：

data = data.convert_objects(convert_numeric=True)

But, conversion does not work, perhaps, due to the dollar sign. Any suggestion?

但是，转换不起作用，也许是由于美元符号。有什么建议吗？

Answer 1

采纳答案by dagrha

@EdChum's answer is clever and works well. But since there's more than one way to bake a cake.... why not use regex? For example:

@EdChum 的回答很聪明，而且效果很好。但是既然烤蛋糕的方法不止一种……为什么不使用正则表达式？例如：

df[df.columns[1:]] = df[df.columns[1:]].replace('[$,]', '', regex=True).astype(float)

To me, that is a little bit more readable.

对我来说，这更具可读性。

Answer 2

回答by EdChum

You can use the vectorised strmethods to replace the unwanted characters and then cast the type to int:

您可以使用矢量化str方法替换不需要的字符，然后将类型转换为 int：

In [81]:
df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str.replace('$','')).apply(lambda x: x.str.replace(',','')).astype(np.int64)
df

Out[81]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

dtypechange is now confirmed:

dtype现在确认更改：

In [82]:

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
state    4 non-null object
1st      4 non-null int64
2nd      4 non-null int64
3rd      4 non-null int64
dtypes: int64(3), object(1)
memory usage: 160.0+ bytes

Another way:

其它的办法：

In [108]:

df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str[1:].str.split(',').str.join('')).astype(np.int64)
df
Out[108]:
            state       1st        2nd      3rd
index                                          
0      California  11593820  109264246  8496273
1        New York  10861680   45336041  6317300
2         Florida   7942848   69369589  4697244
3           Texas   7536817   61830712  5736941

Answer 3

回答by sushmit

You can also use localeas follows

您也可以使用locale如下

import locale
import pandas as pd
locale.setlocale(locale.LC_ALL,'')
df['1st']=df.1st.map(lambda x: locale.atof(x.strip('$')))

Note the above code was tested in Python 3 and Windows environment

注意上面的代码是在 Python 3 和 Windows 环境下测试的

用 $ 将货币转换为 Python pandas 中的数字

提问by kevin

采纳答案by dagrha

回答by EdChum

回答by sushmit

相关推荐

最近更新

标签

用 $ 将货币转换为 Python pandas 中的数字

提问by kevin

采纳答案by dagrha

回答by EdChum

回答by sushmit

相关推荐

Python 如何使用按钮退出 Kivy 应用程序

Python 如何将 unicode 字符串拆分为列表

Python 将 Matplotlib 图形保存为全屏图像

Python virtualenv 不会在 Windows 上激活

相关推荐

最近更新

标签