Python 熊猫识别的所有 dtypes 是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29245848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
what are all the dtypes that pandas recognizes?
提问by uday
For pandas, would anyone know, if any datatype apart from
对于大熊猫,有没有人知道,除了
(i) float64
, int64
(and other variants of np.number
like float32
, int8
etc.)
(ⅰ) ,float64
(int64
和的其它变体np.number
等float32
,int8
等)
(ii) bool
(二) bool
(iii) datetime64
, timedelta64
(三) datetime64
,timedelta64
such as string columns, always have a dtype
of object
?
如字符串列,总有一个dtype
的object
?
Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandas
does not make it's dtype
an object
?
另外,我想知道,如果有来自任何数据类型分开(I),(II)和(iii)在上面的列表中pandas
不会使这是dtype
一个object
?
采纳答案by Alexander
EDIT Feb 2020 following pandas 1.0.0 release
2020 年 2 月大熊猫 1.0.0 发布后编辑
Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypeshave a lot of additional information.
Pandas 主要为每个系列使用 NumPy 数组和数据类型(数据帧是系列的集合,每个系列都可以有自己的数据类型)。NumPy 的文档进一步解释了dtype、数据类型和数据类型对象。此外,@lcameron05 提供的答案提供了对 numpy dtypes 的出色描述。此外,关于dtypes的 pandas 文档有很多附加信息。
The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.
By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.
Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of
pd.NA
to represent scalar missing values (rather than the previous values ofnp.nan
,pd.NaT
orNone
, depending on usage).
pandas 对象中存储的主要类型是 float、int、bool、datetime64[ns]、timedelta[ns] 和 object。此外,这些 dtype 具有项目大小,例如 int64 和 int32。
默认整数类型是 int64 和浮点类型是 float64,不管平台(32 位或 64 位)。以下都将导致 int64 dtypes。
然而,Numpy 在创建数组时会选择平台相关的类型。以下将导致 32 位平台上的 int32。pandas 1.0.0 版的主要变化之一是引入了
pd.NA
来表示标量缺失值(而不是之前的np.nan
,pd.NaT
或 的值None
,具体取决于用法)。
Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.
Pandas 扩展了 NumPy 的类型系统,还允许用户编写他们的扩展类型。下面列出了所有的 Pandas 扩展类型。
Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).
数据类型:tz-aware datetime(注意 NumPy 不支持时区感知日期时间)。
Data type: DatetimeTZDtype
数据类型:DatetimeTZDtype
Scalar: Timestamp
标量:时间戳
Array: arrays.DatetimeArray
String Aliases: 'datetime64[ns, ]'
字符串别名:'datetime64[ns, ]'
Kind of data: Categorical
数据类型:分类
Data type: CategoricalDtype
数据类型:CategoricalDtype
Scalar: (none)
标量:(无)
Array: Categorical
数组:分类
String Aliases: 'category'
字符串别名:'类别'
Kind of data: period (time spans)
数据类型:期间(时间跨度)
Data type: PeriodDtype
数据类型:PeriodDtype
Scalar: Period
标量:句点
Array: arrays.PeriodArray
String Aliases: 'period[]', 'Period[]'
字符串别名: 'period[]', 'Period[]'
Kind of data: sparse
数据类型:稀疏
Data type: SparseDtype
数据类型:SparseDtype
Scalar: (none)
标量:(无)
Array: arrays.SparseArray
String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'
字符串别名:'Sparse'、'Sparse[int]'、'Sparse[float]'
Kind of data: intervals
数据类型:区间
Data type: IntervalDtype
数据类型:IntervalDtype
Scalar: Interval
标量:区间
Array: arrays.IntervalArray
String Aliases: 'interval', 'Interval', 'Interval[]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'
字符串别名:'interval'、'Interval'、'Interval[]'、'Interval[datetime64[ns, ]]'、'Interval[timedelta64[]]'
Kind of data: nullable integer
数据类型:可为空的整数
Data type: Int64Dtype, ...
数据类型:Int64Dtype, ...
Scalar: (none)
标量:(无)
Array: arrays.IntegerArray
String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'
字符串别名:'Int8'、'Int16'、'Int32'、'Int64'、'UInt8'、'UInt16'、'UInt32'、'UInt64'
Kind of data: Strings
数据类型:字符串
Data type: StringDtype
数据类型:StringDtype
Scalar: str
标量:str
Array: arrays.StringArray
String Aliases: 'string'
字符串别名:'字符串'
8) Boolean data with missing values
Kind of data: Boolean (with NA)
数据类型:布尔型(带NA)
Data type: BooleanDtype
数据类型:BooleanDtype
Scalar: bool
标量:布尔
Array: arrays.BooleanArray
String Aliases: 'boolean'
字符串别名:'boolean'
回答by lcameron05
pandas
borrows its dtypes from numpy
. For demonstration of this see the following:
pandas
从numpy
. 为了演示这一点,请参见以下内容:
import pandas as pd
df = pd.DataFrame({'A': [1,'C',2.]})
df['A'].dtype
>>> dtype('O')
type(df['A'].dtype)
>>> numpy.dtype
You can find the list of valid numpy.dtypes
in the documentation:
您可以numpy.dtypes
在文档中找到有效列表:
'?' boolean
'b' (signed) byte
'B' unsigned byte
'i' (signed) integer
'u' unsigned integer
'f' floating-point
'c' complex-floating point
'm' timedelta
'M' datetime
'O' (Python) objects
'S', 'a' zero-terminated bytes (not recommended)
'U' Unicode string
'V' raw data (void)
'?布尔值
'b'(有符号)字节
'B' 无符号字节
'i'(有符号)整数
'u' 无符号整数
'f' 浮点数
'c' 复数浮点数
'm' 时间增量
'M' 日期时间
'O' (Python) 对象
'S', 'a' 以零结尾的字节(不推荐)
'U' Unicode 字符串
'V' 原始数据(无效)
pandas
should support these types. Using the astype
method of a pandas.Series
object with any of the above options as the input argument will result in pandas
trying to convert the Series
to that type (or at the very least falling back to object
type); 'u'
is the only one that I see pandas
not understanding at all:
pandas
应该支持这些类型。使用具有上述任何选项astype
的pandas.Series
对象的方法作为输入参数将导致pandas
尝试将 转换Series
为该类型(或至少回退到object
类型);'u'
是我看到的唯一一个完全pandas
不理解的:
df['A'].astype('u')
>>> TypeError: data type "u" not understood
This is a numpy
error that results because the 'u'
needs to be followed by a number specifying the number of bytes per item in (which needs to be valid):
这是一个numpy
错误,因为'u'
需要后跟一个数字,指定每个项目的字节数(需要有效):
import numpy as np
np.dtype('u')
>>> TypeError: data type "u" not understood
np.dtype('u1')
>>> dtype('uint8')
np.dtype('u2')
>>> dtype('uint16')
np.dtype('u4')
>>> dtype('uint32')
np.dtype('u8')
>>> dtype('uint64')
# testing another invalid argument
np.dtype('u3')
>>> TypeError: data type "u3" not understood
To summarise, the astype
methods of pandas
objects will try and do something sensible with any argument that is valid for numpy.dtype
. Note that numpy.dtype('f')
is the same as numpy.dtype('float32')
and numpy.dtype('f8')
is the same as numpy.dtype('float64')
etc. Same goes for passing the arguments to pandas
astype
methods.
总而言之,对象的astype
方法pandas
将尝试对任何对 有效的参数做一些明智的事情numpy.dtype
。请注意,numpy.dtype('f')
是一样的numpy.dtype('float32')
和numpy.dtype('f8')
相同numpy.dtype('float64')
等也是一样的传递参数pandas
astype
的方法。
To locate the respective data type classes in NumPy, the Pandas docsrecommends this:
要在 NumPy 中定位相应的数据类型类,Pandas 文档建议如下:
def subdtypes(dtype):
subs = dtype.__subclasses__()
if not subs:
return dtype
return [dtype, [subdtypes(dt) for dt in subs]]
subdtypes(np.generic)
Output:
输出:
[numpy.generic,
[[numpy.number,
[[numpy.integer,
[[numpy.signedinteger,
[numpy.int8,
numpy.int16,
numpy.int32,
numpy.int64,
numpy.int64,
numpy.timedelta64]],
[numpy.unsignedinteger,
[numpy.uint8,
numpy.uint16,
numpy.uint32,
numpy.uint64,
numpy.uint64]]]],
[numpy.inexact,
[[numpy.floating,
[numpy.float16, numpy.float32, numpy.float64, numpy.float128]],
[numpy.complexfloating,
[numpy.complex64, numpy.complex128, numpy.complex256]]]]]],
[numpy.flexible,
[[numpy.character, [numpy.bytes_, numpy.str_]],
[numpy.void, [numpy.record]]]],
numpy.bool_,
numpy.datetime64,
numpy.object_]]
Pandas accepts these classes as valid types. For example, dtype={'A': np.float}
.
Pandas 接受这些类作为有效类型。例如,dtype={'A': np.float}
。
NumPy docs containmore details and a chart:
NumPy 文档包含更多详细信息和图表:
回答by jeffhale
Building on other answers, pandas also includes a number of its own dtypes.
在其他答案的基础上,pandas 还包括许多自己的 dtype。
Pandas and third-party libraries extend NumPy's type system in a few places. This section describes the extensions pandas has made internally. See Extension types for how to write your own extension that works with pandas. See Extension data types for a list of third-party libraries that have implemented an extension.
The following table lists all of pandas extension types. See the respective document
Pandas 和第三方库在一些地方扩展了 NumPy 的类型系统。本节描述了 pandas 内部所做的扩展。请参阅扩展类型以了解如何编写自己的可与 Pandas 一起使用的扩展。有关已实现扩展的第三方库列表,请参阅扩展数据类型。
下表列出了所有 Pandas 扩展类型。请参阅相应的文件
https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes
https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes
Also, pandas 1.0 will have a string dtype.
此外,pandas 1.0 将有一个字符串 dtype。