何时在python中应用(pd.to_numeric)以及何时使用astype(np.float64)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40095712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When to apply(pd.to_numeric) and when to astype(np.float64) in python?
提问by d8aninja
I have a pandas DataFrame object named xiv
which has a column of int64
Volume measurements.
我有一个名为 Pandas DataFrame 的对象xiv
,它有一列int64
体积测量值。
In[]: xiv['Volume'].head(5)
Out[]:
0 252000
1 484000
2 62000
3 168000
4 232000
Name: Volume, dtype: int64
I have read other posts (like thisand this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype
of the underlying data:
我已经阅读了其他建议以下解决方案的帖子(如this和this)。但是当我使用任何一种方法时,它似乎都不会改变dtype
基础数据的:
In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
In[]: xiv['Volume'].dtypes
Out[]:
dtype('int64')
Or...
或者...
In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###
In[]: xiv['Volume'].dtypes
Out[]:
dtype('int64')
In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)
In[]: xiv['Volume'].dtypes
Out[]:
dtype('int64')
I've also tried making a separate pandas Series
and using the methods listed above on that Series and reassigning to the x['Volume']
obect, which is a pandas.core.series.Series
object.
我还尝试制作一个单独的熊猫Series
并使用上面在该系列中列出的方法并重新分配给对象x['Volume']
,这是一个pandas.core.series.Series
对象。
I have, however, found a solution to this problem using the numpy
package's float64
type - this works but I don't know why it's different.
但是,我使用numpy
包的float64
类型找到了解决此问题的方法-这有效,但我不知道为什么它不同。
In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)
In[]: xiv['Volume'].dtypes
Out[]:
dtype('float64')
Can someone explain how to accomplish with the pandas
library what the numpy
library seems to do easily with its float64
class; that is, convert the column in the xiv
DataFrame to a float64
in place.
有人能解释一下如何用pandas
图书馆完成numpy
图书馆似乎很容易用它的float64
类做的事情吗?也就是说,将xiv
DataFrame 中的列转换为float64
就地。
回答by MaxU
If you already have numeric dtypes (int8|16|32|64
,float64
,boolean
) you can convert it to another "numeric" dtype using Pandas.astype()method.
如果您已经有数字 dtypes ( int8|16|32|64
, float64
, boolean
) 您可以使用Pandas .astype()方法将其转换为另一个“数字” dtype。
Demo:
演示:
In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)
In [91]: df
Out[91]:
a b c
0 9059440 9590567 2076918
1 5861102 4566089 1947323
2 6636568 162770 2487991
3 6794572 5236903 5628779
4 470121 4044395 4546794
In [92]: df.dtypes
Out[92]:
a int64
b int64
c int64
dtype: object
In [93]: df['a'] = df['a'].astype(float)
In [94]: df.dtypes
Out[94]:
a float64
b int64
c int64
dtype: object
It won't work for object
(string) dtypes, that can'tbe converted to numbers:
它不适用于object
(string) dtypes,不能转换为数字:
In [95]: df.loc[1, 'b'] = 'XXXXXX'
In [96]: df
Out[96]:
a b c
0 9059440.0 9590567 2076918
1 5861102.0 XXXXXX 1947323
2 6636568.0 162770 2487991
3 6794572.0 5236903 5628779
4 470121.0 4044395 4546794
In [97]: df.dtypes
Out[97]:
a float64
b object
c int64
dtype: object
In [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'
So here we want to use pd.to_numeric()method:
所以这里我们要使用pd.to_numeric()方法:
In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')
In [100]: df
Out[100]:
a b c
0 9059440.0 9590567.0 2076918
1 5861102.0 NaN 1947323
2 6636568.0 162770.0 2487991
3 6794572.0 5236903.0 5628779
4 470121.0 4044395.0 4546794
In [101]: df.dtypes
Out[101]:
a float64
b float64
c int64
dtype: object
回答by Shobhit Sharma
I observed that I was able to convert object(str) to float first and then float to Int64.
我观察到我能够先将 object(str) 转换为浮动,然后再浮动为 Int64。
df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'),
dtype=np.int64)
df['a'] = df['a'].astype('str')
df.dtypes
df['a'] = df['a'].astype('float')
df['a'] = df['a'].astype('int64')
Worked fine.
工作得很好。
回答by reevesnmortimer
I don't have a technical explanation for this but, I have noticed that pd.to_numeric() raises the following error when converting the string 'nan':
我没有对此的技术解释,但是,我注意到 pd.to_numeric() 在转换字符串 'nan' 时会引发以下错误:
In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0])
In [11]: pd.to_numeric(df.value)
Traceback (most recent call last):
File "<ipython-input-11-98729d13e45c>", line 1, in <module>
pd.to_numeric(df.value)
File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric
coerce_numeric=coerce_numeric)
File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "nan" at position 0
whereas astype(float) does not:
而 astype(float) 不会:
df.value.astype(float)
Out[12]:
0 NaN
Name: value, dtype: float64
回答by Mohd Waseem
You can use this:
你可以使用这个:
pd.to_numeric(df.valueerrors='coerce').fillna(0, downcast='infer')
It will use zero in place of nan.
它将使用零代替 nan。