何时在python中应用(pd.to_numeric)以及何时使用astype(np.float64)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40095712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:09:48  来源:igfitidea点击:

When to apply(pd.to_numeric) and when to astype(np.float64) in python?

pythonpandasnumpydataframetypes

提问by d8aninja

I have a pandas DataFrame object named xivwhich has a column of int64Volume measurements.

我有一个名为 Pandas DataFrame 的对象xiv,它有一列int64体积测量值。

In[]: xiv['Volume'].head(5)
Out[]: 

0    252000
1    484000
2     62000
3    168000
4    232000
Name: Volume, dtype: int64

I have read other posts (like thisand this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtypeof the underlying data:

我已经阅读了其他建议以下解决方案的帖子(如thisthis)。但是当我使用任何一种方法时,它似乎都不会改变dtype基础数据的:

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

Or...

或者...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

I've also tried making a separate pandas Seriesand using the methods listed above on that Series and reassigning to the x['Volume']obect, which is a pandas.core.series.Seriesobject.

我还尝试制作一个单独的熊猫Series并使用上面在该系列中列出的方法并重新分配给对象x['Volume'],这是一个pandas.core.series.Series对象。

I have, however, found a solution to this problem using the numpypackage's float64type - this works but I don't know why it's different.

但是,我使用numpy包的float64类型找到了解决此问题的方法-这有效,但我不知道为什么它不同

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('float64') 

Can someone explain how to accomplish with the pandaslibrary what the numpylibrary seems to do easily with its float64class; that is, convert the column in the xivDataFrame to a float64in place.

有人能解释一下如何用pandas图书馆完成numpy图书馆似乎很容易用它的float64类做的事情吗?也就是说,将xivDataFrame 中的列转换为float64就地。

回答by MaxU

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas.astype()method.

如果您已经有数字 dtypes ( int8|16|32|64, float64, boolean) 您可以使用Pandas .astype()方法将其转换为另一个“数字” dtype

Demo:

演示:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)

In [91]: df
Out[91]:
         a        b        c
0  9059440  9590567  2076918
1  5861102  4566089  1947323
2  6636568   162770  2487991
3  6794572  5236903  5628779
4   470121  4044395  4546794

In [92]: df.dtypes
Out[92]:
a    int64
b    int64
c    int64
dtype: object

In [93]: df['a'] = df['a'].astype(float)

In [94]: df.dtypes
Out[94]:
a    float64
b      int64
c      int64
dtype: object

It won't work for object(string) dtypes, that can'tbe converted to numbers:

它不适用于object(string) dtypes,不能转换为数字:

In [95]: df.loc[1, 'b'] = 'XXXXXX'

In [96]: df
Out[96]:
           a        b        c
0  9059440.0  9590567  2076918
1  5861102.0   XXXXXX  1947323
2  6636568.0   162770  2487991
3  6794572.0  5236903  5628779
4   470121.0  4044395  4546794

In [97]: df.dtypes
Out[97]:
a    float64
b     object
c      int64
dtype: object

In [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'

So here we want to use pd.to_numeric()method:

所以这里我们要使用pd.to_numeric()方法:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')

In [100]: df
Out[100]:
           a          b        c
0  9059440.0  9590567.0  2076918
1  5861102.0        NaN  1947323
2  6636568.0   162770.0  2487991
3  6794572.0  5236903.0  5628779
4   470121.0  4044395.0  4546794

In [101]: df.dtypes
Out[101]:
a    float64
b    float64
c      int64
dtype: object

回答by Shobhit Sharma

I observed that I was able to convert object(str) to float first and then float to Int64.

我观察到我能够先将 object(str) 转换为浮动,然后再浮动为 Int64。

df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), 
dtype=np.int64)
df['a'] = df['a'].astype('str')
df.dtypes

df['a'] = df['a'].astype('float')
df['a'] = df['a'].astype('int64')

Worked fine.

工作得很好。

回答by reevesnmortimer

I don't have a technical explanation for this but, I have noticed that pd.to_numeric() raises the following error when converting the string 'nan':

我没有对此的技术解释,但是,我注意到 pd.to_numeric() 在转换字符串 'nan' 时会引发以下错误:

In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0])

In [11]: pd.to_numeric(df.value)

Traceback (most recent call last):

  File "<ipython-input-11-98729d13e45c>", line 1, in <module>
    pd.to_numeric(df.value)

  File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric
    coerce_numeric=coerce_numeric)

  File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric

ValueError: Unable to parse string "nan" at position 0

whereas astype(float) does not:

而 astype(float) 不会:

df.value.astype(float)
Out[12]: 
0   NaN
Name: value, dtype: float64

回答by Mohd Waseem

You can use this:

你可以使用这个:

pd.to_numeric(df.valueerrors='coerce').fillna(0, downcast='infer')  

It will use zero in place of nan.

它将使用零代替 nan。