Python 将熊猫数据框中的一列从字符串转换为浮点数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36874246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert a column in pandas dataframe from String to Float
提问by Kevin
I've already read about various solutions, and tried the solution stated here: Pandas: Converting to numeric, creating NaNs when necessary
我已经阅读了各种解决方案,并尝试了此处所述的解决方案:Pandas: Converting to numeric, create NaNs when必要
But it didn't really solve my problem:
I have a dataframe contains multiple columns, in where a column ['PricePerSeat_Outdoor']
contains some float values, some empty values, and some '-'
但这并没有真正解决我的问题:我有一个包含多列的数据框,其中一列['PricePerSeat_Outdoor']
包含一些浮点值、一些空值和一些'-'
print type(df_raw['PricePerSeat_Outdoor'][99])
print df_raw['PricePerSeat_Outdoor'][95:101]
df_raw['PricePerSeat_Outdoor'] = df_raw['PricePerSeat_Outdoor'].apply(pd.to_numeric, errors='coerce')
print type(df_raw['PricePerSeat_Outdoor'][99])
Then I got:
然后我得到:
<type 'str'>
95 17.21
96 17.24
97 -
98 -
99 17.2
100 17.24
Name: PricePerSeat_Outdoor, dtype: object
<type 'str'>
Values at row #98 and 99 didn't get converted. Again, I've already tried multiple methods including following but it just didn't work. Much appreciated if someone can give me some hints.
第 98 行和第 99 行的值未转换。同样,我已经尝试了多种方法,包括以下方法,但它不起作用。如果有人能给我一些提示,我将不胜感激。
df_raw['PricePerSeat_Outdoor'] = df_raw['PricePerSeat_Outdoor'].apply(pd.to_numeric, errors='coerce')
df_raw['PricePerSeat_Outdoor'] = df_raw['PricePerSeat_Outdoor'].apply(pd.to_numeric, errors='coerce')
Also, how can I convert multiple columns to numeric at once? Thanks.
另外,如何一次将多列转换为数字?谢谢。
回答by MaxU
try this:
尝试这个:
df_raw['PricePerSeat_Outdoor'] = pd.to_numeric(df_raw['PricePerSeat_Outdoor'], errors='coerce')
Here is an example:
下面是一个例子:
In [97]: a = pd.Series(['17.21','17.34','15.23','-','-','','12.34']
In [98]: b = pd.Series(['0.21','0.34','0.23','-','','-','0.34'])
In [99]: df = pd.DataFrame({'a':a, 'b':b})
In [100]: df['c'] = np.random.choice(['a','b','b'], len(df))
In [101]: df
Out[101]:
a b c
0 17.21 0.21 a
1 17.34 0.34 b
2 15.23 0.23 b
3 - - b
4 - b
5 - b
6 12.34 0.34 b
In [102]: cols_to_convert = ['a','b']
In [103]: cols_to_convert
Out[103]: ['a', 'b']
In [104]: for col in cols_to_convert:
.....: df[col] = pd.to_numeric(df[col], errors='coerce')
.....:
In [105]: df
Out[105]:
a b c
0 17.21 0.21 a
1 17.34 0.34 b
2 15.23 0.23 b
3 NaN NaN b
4 NaN NaN b
5 NaN NaN b
6 12.34 0.34 b
check:
查看:
In [106]: df.dtypes
Out[106]:
a float64
b float64
c object
dtype: object