Python 在 Pandas 中将 float64 列转换为 int64
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43956335/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert float64 column to int64 in Pandas
提问by MCG Code
I tried to convert a column from data type float64
to int64
using:
我试图将一列从数据类型转换float64
为int64
使用:
df['column name'].astype(int64)
but got an error:
但出现错误:
NameError: name 'int64' is not defined
NameError: 名称 'int64' 未定义
The column has number of people but was formatted as 7500000.0
, any idea how I can simply change this float64
into int64
?
该列有人数,但格式为7500000.0
,知道如何将其简单地更改float64
为int64
吗?
回答by jezrael
Solution for pandas 0.24+for converting numeric with missing values:
pandas 0.24+用于转换具有缺失值的数字的解决方案:
df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]})
print (df['column name'])
0 7500000.0
1 7500000.0
2 NaN
Name: column name, dtype: float64
df['column name'] = df['column name'].astype(np.int64)
ValueError: Cannot convert non-finite values (NA or inf) to integer
ValueError:无法将非有限值(NA 或 inf)转换为整数
#http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
df['column name'] = df['column name'].astype('Int64')
print (df['column name'])
0 7500000
1 7500000
2 NaN
Name: column name, dtype: Int64
I think you need cast to numpy.int64
:
我认为您需要强制转换为numpy.int64
:
df['column name'].astype(np.int64)
Sample:
样本:
df = pd.DataFrame({'column name':[7500000.0,7500000.0]})
print (df['column name'])
0 7500000.0
1 7500000.0
Name: column name, dtype: float64
df['column name'] = df['column name'].astype(np.int64)
#same as
#df['column name'] = df['column name'].astype(pd.np.int64)
print (df['column name'])
0 7500000
1 7500000
Name: column name, dtype: int64
If some NaN
s in columns need replace them to some int
(e.g. 0
) by fillna
, because type
of NaN
is float
:
如果某些NaN
S IN列需要他们取代一些int
(例如0
)通过fillna
,因为type
的NaN
是float
:
df = pd.DataFrame({'column name':[7500000.0,np.nan]})
df['column name'] = df['column name'].fillna(0).astype(np.int64)
print (df['column name'])
0 7500000
1 0
Name: column name, dtype: int64
Also check documentation - missing data casting rules
还要检查文档 - 缺少数据转换规则
EDIT:
编辑:
Convert values with NaN
s is buggy:
用NaN
s转换值是错误的:
df = pd.DataFrame({'column name':[7500000.0,np.nan]})
df['column name'] = df['column name'].values.astype(np.int64)
print (df['column name'])
0 7500000
1 -9223372036854775808
Name: column name, dtype: int64
回答by MSeifert
You can need to pass in the string 'int64'
:
您可能需要传入字符串'int64'
:
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1.0, 2.0]}) # some test dataframe
>>> df['a'].astype('int64')
0 1
1 2
Name: a, dtype: int64
There are some alternative ways to specify 64-bit integers:
有一些替代方法可以指定 64 位整数:
>>> df['a'].astype('i8') # integer with 8 bytes (64 bit)
0 1
1 2
Name: a, dtype: int64
>>> import numpy as np
>>> df['a'].astype(np.int64) # native numpy 64 bit integer
0 1
1 2
Name: a, dtype: int64
Or use np.int64
directly on your column (but it returns a numpy.array
):
或者np.int64
直接在您的列上使用(但它返回 a numpy.array
):
>>> np.int64(df['a'])
array([1, 2], dtype=int64)
回答by sparrow
This seems to be a little buggy in Pandas 0.23.4?
这在 Pandas 0.23.4 中似乎有点问题?
If there are np.nan values then this will throw an error as expected:
如果有 np.nan 值,那么这将按预期抛出错误:
df['col'] = df['col'].astype(np.int64)
But doesn't change any values from float to int as I would expect if "ignore" is used:
但是,如果使用“忽略”,则不会像我期望的那样将任何值从 float 更改为 int:
df['col'] = df['col'].astype(np.int64,errors='ignore')
It worked if I first converted np.nan:
如果我首先转换 np.nan,它会起作用:
df['col'] = df['col'].fillna(0).astype(np.int64)
df['col'] = df['col'].astype(np.int64)
Now I can't figure out how to get null values back in place of the zeroes since this will convert everything back to float again:
现在我不知道如何让空值代替零,因为这会将所有内容再次转换回浮点数:
df['col'] = df['col'].replace(0,np.nan)
回答by Muhammad Bin Ali
consider using
考虑使用
df['column name'].astype('Int64')
df['column name'].astype('Int64')
nan
will be changed to NaN
nan
将更改为 NaN