如何在python中将数据类型:对象转换为float64?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28277137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert datatype:object to float64 in python?
提问by Ning Chen
I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.
我兜兜转转,尝试了很多不同的方法,所以我想我的核心理解是错误的。我将不胜感激帮助理解我的编码/解码问题。
I import the dataframe from SQL and it seems that some datatypes:float64 are converted to Object. Thus, I cannot do any calculation. I fail to convert the Object back to float64.
我从 SQL 导入数据框,似乎某些数据类型:float64 被转换为对象。因此,我无法进行任何计算。我无法将对象转换回 float64。
df.head()
df.head()
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4
2013/4/6 6 NaN 2,645 5.27% 0.29 407 533 454 368
2013/4/7 7 NaN 2,118 5.89% 0.31 257 659 583 369
2013/4/13 6 NaN 2,470 5.38% 0.29 354 531 473 383
2013/4/14 7 NaN 2,033 6.77% 0.37 396 748 681 458
2013/4/20 6 NaN 2,690 5.38% 0.29 361 528 541 381
df.dtypes
df.dtypes
WD float64
Manpower float64
2nd object
CTR object
2ndU float64
T1 object
T2 object
T3 object
T4 object
T5 object
dtype: object
SQL table:
SQL表:
采纳答案by EdChum
You can convert most of the columns by just calling convert_objects
:
您只需调用即可转换大部分列convert_objects
:
In [36]:
df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[36]:
Date object
WD int64
Manpower float64
2nd object
CTR object
2ndU float64
T1 int64
T2 int64
T3 int64
T4 float64
dtype: object
For column '2nd' and 'CTR' we can call the vectorised str
methods to replace the thousands separator and remove the '%' sign and then astype
to convert:
对于列“2nd”和“CTR”,我们可以调用向量化str
方法来替换千位分隔符并删除“%”符号,然后astype
进行转换:
In [39]:
df['2nd'] = df['2nd'].str.replace(',','').astype(int)
df['CTR'] = df['CTR'].str.replace('%','').astype(np.float64)
df.dtypes
Out[39]:
Date object
WD int64
Manpower float64
2nd int32
CTR float64
2ndU float64
T1 int64
T2 int64
T3 int64
T4 object
dtype: object
In [40]:
df.head()
Out[40]:
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4
0 2013/4/6 6 NaN 2645 5.27 0.29 407 533 454 368
1 2013/4/7 7 NaN 2118 5.89 0.31 257 659 583 369
2 2013/4/13 6 NaN 2470 5.38 0.29 354 531 473 383
3 2013/4/14 7 NaN 2033 6.77 0.37 396 748 681 458
4 2013/4/20 6 NaN 2690 5.38 0.29 361 528 541 381
Or you can do the string handling operations above without the call to astype
and then call convert_objects
to convert everything in one go.
或者,您可以在不调用 的情况下执行上述字符串处理操作astype
,然后调用convert_objects
以一次性转换所有内容。
UPDATE
更新
Since version 0.17.0
convert_objects
is deprecated and there isn't a top-level function to do this so you need to do:
由于版本0.17.0
convert_objects
已被弃用并且没有顶级函数来执行此操作,因此您需要执行以下操作:
df.apply(lambda col:pd.to_numeric(col, errors='coerce'))
df.apply(lambda col:pd.to_numeric(col, errors='coerce'))
See the docsand this related question: pandas: to_numeric for multiple columns
回答by Nirali Khoda
You can try this:
你可以试试这个:
df['2nd'] = pd.to_numeric(df['2nd'].str.replace(',', ''))
df['CTR'] = pd.to_numeric(df['CTR'].str.replace('%', ''))
回答by Amir
Or you can use regular expression to handle multiple items as the general case of this issue,
或者你可以使用正则表达式来处理多个项目作为这个问题的一般情况,
df['2nd'] = pd.to_numeric(df['2nd'].str.replace(r'[,.%]',''))
df['CTR'] = pd.to_numeric(df['CTR'].str.replace(r'[^\d%]',''))
回答by Sesquipedalism
convert_objects is deprecated.
不推荐使用 convert_objects。
For pandas >= 0.17.0, use pd.to_numeric
对于 >= 0.17.0 的熊猫,使用pd.to_numeric
df["2nd"] = pd.to_numeric(df["2nd"])
回答by S. Jessen
I had this problem in a DataFrame (df
) created from an Excel-sheet with several internal header rows.
我df
在从具有多个内部标题行的 Excel 工作表创建的 DataFrame ( ) 中遇到了这个问题。
After cleaning out the internal header rows from df
, the columns' values were of "non-null object" type (DataFrame.info()
).
从 中清除内部标题行后df
,列的值属于“非空对象”类型 ( DataFrame.info()
)。
This code converted all numerical values of multiple columns to int64 and float64 in one go:
此代码将多列的所有数值一次性转换为 int64 和 float64:
for i in range(0, len(df.columns)):
df.iloc[:,i] = pd.to_numeric(df.iloc[:,i], errors='ignore')
# errors='ignore' lets strings remain as 'non-null objects'