何时在python中应用（pd.to_numeric）以及何时使用astype（np.float64）？

Question

提问by d8aninja

I have a pandas DataFrame object named xivwhich has a column of int64Volume measurements.

我有一个名为 Pandas DataFrame 的对象xiv，它有一列int64体积测量值。

In[]: xiv['Volume'].head(5)
Out[]: 

0    252000
1    484000
2     62000
3    168000
4    232000
Name: Volume, dtype: int64

I have read other posts (like thisand this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtypeof the underlying data:

我已经阅读了其他建议以下解决方案的帖子（如this和this）。但是当我使用任何一种方法时，它似乎都不会改变dtype基础数据的：

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

Or...

或者...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

I've also tried making a separate pandas Seriesand using the methods listed above on that Series and reassigning to the x['Volume']obect, which is a pandas.core.series.Seriesobject.

我还尝试制作一个单独的熊猫Series并使用上面在该系列中列出的方法并重新分配给对象x['Volume']，这是一个pandas.core.series.Series对象。

I have, however, found a solution to this problem using the numpypackage's float64type - this works but I don't know why it's different.

但是，我使用numpy包的float64类型找到了解决此问题的方法-这有效，但我不知道为什么它不同。

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('float64')

Can someone explain how to accomplish with the pandaslibrary what the numpylibrary seems to do easily with its float64class; that is, convert the column in the xivDataFrame to a float64in place.

有人能解释一下如何用pandas图书馆完成numpy图书馆似乎很容易用它的float64类做的事情吗？也就是说，将xivDataFrame 中的列转换为float64就地。

Answer 1

回答by MaxU

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas.astype()method.

如果您已经有数字 dtypes ( int8|16|32|64, float64, boolean) 您可以使用Pandas .astype()方法将其转换为另一个“数字” dtype。

Demo:

演示：

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)

In [91]: df
Out[91]:
         a        b        c
0  9059440  9590567  2076918
1  5861102  4566089  1947323
2  6636568   162770  2487991
3  6794572  5236903  5628779
4   470121  4044395  4546794

In [92]: df.dtypes
Out[92]:
a    int64
b    int64
c    int64
dtype: object

In [93]: df['a'] = df['a'].astype(float)

In [94]: df.dtypes
Out[94]:
a    float64
b      int64
c      int64
dtype: object

It won't work for object(string) dtypes, that can'tbe converted to numbers:

它不适用于object(string) dtypes，不能转换为数字：

In [95]: df.loc[1, 'b'] = 'XXXXXX'

In [96]: df
Out[96]:
           a        b        c
0  9059440.0  9590567  2076918
1  5861102.0   XXXXXX  1947323
2  6636568.0   162770  2487991
3  6794572.0  5236903  5628779
4   470121.0  4044395  4546794

In [97]: df.dtypes
Out[97]:
a    float64
b     object
c      int64
dtype: object

In [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'

So here we want to use pd.to_numeric()method:

所以这里我们要使用pd.to_numeric()方法：

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')

In [100]: df
Out[100]:
           a          b        c
0  9059440.0  9590567.0  2076918
1  5861102.0        NaN  1947323
2  6636568.0   162770.0  2487991
3  6794572.0  5236903.0  5628779
4   470121.0  4044395.0  4546794

In [101]: df.dtypes
Out[101]:
a    float64
b    float64
c      int64
dtype: object

Answer 2

回答by Shobhit Sharma

I observed that I was able to convert object(str) to float first and then float to Int64.

我观察到我能够先将 object(str) 转换为浮动，然后再浮动为 Int64。

df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), 
dtype=np.int64)
df['a'] = df['a'].astype('str')
df.dtypes

df['a'] = df['a'].astype('float')
df['a'] = df['a'].astype('int64')

Worked fine.

工作得很好。

Answer 3

回答by reevesnmortimer

I don't have a technical explanation for this but, I have noticed that pd.to_numeric() raises the following error when converting the string 'nan':

我没有对此的技术解释，但是，我注意到 pd.to_numeric() 在转换字符串 'nan' 时会引发以下错误：

In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0])

In [11]: pd.to_numeric(df.value)

Traceback (most recent call last):

  File "<ipython-input-11-98729d13e45c>", line 1, in <module>
    pd.to_numeric(df.value)

  File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric
    coerce_numeric=coerce_numeric)

  File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric

ValueError: Unable to parse string "nan" at position 0

whereas astype(float) does not:

而 astype(float) 不会：

df.value.astype(float)
Out[12]: 
0   NaN
Name: value, dtype: float64

Answer 4

回答by Mohd Waseem

You can use this:

你可以使用这个：

pd.to_numeric(df.valueerrors='coerce').fillna(0, downcast='infer')

It will use zero in place of nan.

它将使用零代替 nan。

何时在python中应用（pd.to_numeric）以及何时使用astype（np.float64）？

提问by d8aninja

回答by MaxU

回答by Shobhit Sharma

回答by reevesnmortimer

回答by Mohd Waseem

相关推荐

最近更新

标签

何时在python中应用（pd.to_numeric）以及何时使用astype（np.float64）？

提问by d8aninja

回答by MaxU

回答by Shobhit Sharma

回答by reevesnmortimer

回答by Mohd Waseem

相关推荐

Python geckodriver 可执行文件需要在路径中

Python 如何为每个配置文件配置 PIP 以使用代理（带身份验证）？

Python 深度学习 Nan 损失的原因

类型错误：在 Python 实例之间不支持“<”

相关推荐

最近更新

标签