用 $ 将货币转换为 Python pandas 中的数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32464280/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
converting currency with $ to numbers in Python pandas
提问by kevin
I have the following data in pandas dataframe:
我在熊猫数据框中有以下数据:
state 1st 2nd 3rd
0 California ,593,820 9,264,246 ,496,273
1 New York ,861,680 ,336,041 ,317,300
2 Florida ,942,848 ,369,589 ,697,244
3 Texas ,536,817 ,830,712 ,736,941
I want to perform some simple analysis (e.g., sum, groupby) with three columns (1st, 2nd, 3rd), but the data type of those three columns is object (or string).
我想用三列(第一、第二、第三)执行一些简单的分析(例如,sum、groupby),但是这三列的数据类型是对象(或字符串)。
So I used the following code for data conversion:
所以我使用以下代码进行数据转换:
data = data.convert_objects(convert_numeric=True)
But, conversion does not work, perhaps, due to the dollar sign. Any suggestion?
但是,转换不起作用,也许是由于美元符号。有什么建议吗?
采纳答案by dagrha
@EdChum's answer is clever and works well. But since there's more than one way to bake a cake.... why not use regex? For example:
@EdChum 的回答很聪明,而且效果很好。但是既然烤蛋糕的方法不止一种……为什么不使用正则表达式?例如:
df[df.columns[1:]] = df[df.columns[1:]].replace('[$,]', '', regex=True).astype(float)
To me, that is a little bit more readable.
对我来说,这更具可读性。
回答by EdChum
You can use the vectorised str
methods to replace the unwanted characters and then cast the type to int:
您可以使用矢量化str
方法替换不需要的字符,然后将类型转换为 int:
In [81]:
df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str.replace('$','')).apply(lambda x: x.str.replace(',','')).astype(np.int64)
df
Out[81]:
state 1st 2nd 3rd
index
0 California 11593820 109264246 8496273
1 New York 10861680 45336041 6317300
2 Florida 7942848 69369589 4697244
3 Texas 7536817 61830712 5736941
dtype
change is now confirmed:
dtype
现在确认更改:
In [82]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
state 4 non-null object
1st 4 non-null int64
2nd 4 non-null int64
3rd 4 non-null int64
dtypes: int64(3), object(1)
memory usage: 160.0+ bytes
Another way:
其它的办法:
In [108]:
df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str[1:].str.split(',').str.join('')).astype(np.int64)
df
Out[108]:
state 1st 2nd 3rd
index
0 California 11593820 109264246 8496273
1 New York 10861680 45336041 6317300
2 Florida 7942848 69369589 4697244
3 Texas 7536817 61830712 5736941
回答by sushmit
You can also use locale
as follows
您也可以使用locale
如下
import locale
import pandas as pd
locale.setlocale(locale.LC_ALL,'')
df['1st']=df.1st.map(lambda x: locale.atof(x.strip('$')))
Note the above code was tested in Python 3 and Windows environment
注意上面的代码是在 Python 3 和 Windows 环境下测试的