Python 熊猫识别的所有 dtypes 是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29245848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:16:42  来源:igfitidea点击:

what are all the dtypes that pandas recognizes?

pythonpython-3.xpandas

提问by uday

For pandas, would anyone know, if any datatype apart from

对于大熊猫,有没有人知道,除了

(i) float64, int64(and other variants of np.numberlike float32, int8etc.)

(ⅰ) ,float64int64和的其它变体np.numberfloat32int8等)

(ii) bool

(二) bool

(iii) datetime64, timedelta64

(三) datetime64,timedelta64

such as string columns, always have a dtypeof object?

如字符串列,总有一个dtypeobject

Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandasdoes not make it's dtypean object?

另外,我想知道,如果有来自任何数据类型分开(I),(II)和(iii)在上面的列表中pandas不会使这是dtype一个object

采纳答案by Alexander

EDIT Feb 2020 following pandas 1.0.0 release

2020 年 2 月大熊猫 1.0.0 发布后编辑

Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypeshave a lot of additional information.

Pandas 主要为每个系列使用 NumPy 数组和数据类型(数据帧是系列的集合,每个系列都可以有自己的数据类型)。NumPy 的文档进一步解释了dtype数据类型数据类型对象。此外,@lcameron05 提供的答案提供了对 numpy dtypes 的出色描述。此外,关于dtypes的 pandas 文档有很多附加信息。

The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.

By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.

Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of pd.NAto represent scalar missing values (rather than the previous values of np.nan, pd.NaTor None, depending on usage).

pandas 对象中存储的主要类型是 float、int、bool、datetime64[ns]、timedelta[ns] 和 object。此外,这些 dtype 具有项目大小,例如 int64 和 int32。

默认整数类型是 int64 和浮点类型是 float64,不管平台(32 位或 64 位)。以下都将导致 int64 dtypes。

然而,Numpy 在创建数组时会选择平台相关的类型。以下将导致 32 位平台上的 int32。pandas 1.0.0 版的主要变化之一是引入了pd.NA来表示标量缺失值(而不是之前的np.nan,pd.NaT或 的值None,具体取决于用法)。

Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.

Pandas 扩展了 NumPy 的类型系统,还允许用户编写他们的扩展类型。下面列出了所有的 Pandas 扩展类型。

1) Time zone handling

1) 时区处理

Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).

数据类型:tz-aware datetime(注意 NumPy 不支持时区感知日期时间)。

Data type: DatetimeTZDtype

数据类型:DatetimeTZDtype

Scalar: Timestamp

标量:时间戳

Array: arrays.DatetimeArray

数组:arrays.DatetimeArray

String Aliases: 'datetime64[ns, ]'

字符串别名:'datetime64[ns, ]'

2) Categorical data

2) 分类数据

Kind of data: Categorical

数据类型:分类

Data type: CategoricalDtype

数据类型:CategoricalDtype

Scalar: (none)

标量:(无)

Array: Categorical

数组:分类

String Aliases: 'category'

字符串别名:'类别'

3) Time span representation

3) 时间跨度表示

Kind of data: period (time spans)

数据类型:期间(时间跨度)

Data type: PeriodDtype

数据类型:PeriodDtype

Scalar: Period

标量:句点

Array: arrays.PeriodArray

数组:arrays.PeriodArray

String Aliases: 'period[]', 'Period[]'

字符串别名: 'period[]', 'Period[]'

4) Sparse data structures

4) 稀疏数据结构

Kind of data: sparse

数据类型:稀疏

Data type: SparseDtype

数据类型:SparseDtype

Scalar: (none)

标量:(无)

Array: arrays.SparseArray

数组:arrays.SparseArray

String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'

字符串别名:'Sparse'、'Sparse[int]'、'Sparse[float]'

5) IntervalIndex

5) 间隔指数

Kind of data: intervals

数据类型:区间

Data type: IntervalDtype

数据类型:IntervalDtype

Scalar: Interval

标量:区间

Array: arrays.IntervalArray

数组:arrays.IntervalArray

String Aliases: 'interval', 'Interval', 'Interval[]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'

字符串别名:'interval'、'Interval'、'Interval[]'、'Interval[datetime64[ns, ]]'、'Interval[timedelta64[]]'

6) Nullable integer data type

6) 可为空的整数数据类型

Kind of data: nullable integer

数据类型:可为空的整数

Data type: Int64Dtype, ...

数据类型:Int64Dtype, ...

Scalar: (none)

标量:(无)

Array: arrays.IntegerArray

数组:arrays.IntegerArray

String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'

字符串别名:'Int8'、'Int16'、'Int32'、'Int64'、'UInt8'、'UInt16'、'UInt32'、'UInt64'

7) Working with text data

7) 处理文本数据

Kind of data: Strings

数据类型:字符串

Data type: StringDtype

数据类型:StringDtype

Scalar: str

标量:str

Array: arrays.StringArray

数组:arrays.StringArray

String Aliases: 'string'

字符串别名:'字符串'

8) Boolean data with missing values

8) 带有缺失值的布尔数据

Kind of data: Boolean (with NA)

数据类型:布尔型(带NA)

Data type: BooleanDtype

数据类型:BooleanDtype

Scalar: bool

标量:布尔

Array: arrays.BooleanArray

数组: arrays.BooleanArray

String Aliases: 'boolean'

字符串别名:'boolean'

回答by lcameron05

pandasborrows its dtypes from numpy. For demonstration of this see the following:

pandasnumpy. 为了演示这一点,请参见以下内容:

import pandas as pd

df = pd.DataFrame({'A': [1,'C',2.]})
df['A'].dtype

>>> dtype('O')

type(df['A'].dtype)

>>> numpy.dtype

You can find the list of valid numpy.dtypesin the documentation:

您可以numpy.dtypes在文档中找到有效列表:

'?' boolean

'b' (signed) byte

'B' unsigned byte

'i' (signed) integer

'u' unsigned integer

'f' floating-point

'c' complex-floating point

'm' timedelta

'M' datetime

'O' (Python) objects

'S', 'a' zero-terminated bytes (not recommended)

'U' Unicode string

'V' raw data (void)

'?布尔值

'b'(有符号)字节

'B' 无符号字节

'i'(有符号)整数

'u' 无符号整数

'f' 浮点数

'c' 复数浮点数

'm' 时间增量

'M' 日期时间

'O' (Python) 对象

'S', 'a' 以零结尾的字节(不推荐)

'U' Unicode 字符串

'V' 原始数据(无效)

pandasshould support these types. Using the astypemethod of a pandas.Seriesobject with any of the above options as the input argument will result in pandastrying to convert the Seriesto that type (or at the very least falling back to objecttype); 'u'is the only one that I see pandasnot understanding at all:

pandas应该支持这些类型。使用具有上述任何选项astypepandas.Series对象的方法作为输入参数将导致pandas尝试将 转换Series为该类型(或至少回退到object类型);'u'是我看到的唯一一个完全pandas不理解的:

df['A'].astype('u')

>>> TypeError: data type "u" not understood

This is a numpyerror that results because the 'u'needs to be followed by a number specifying the number of bytes per item in (which needs to be valid):

这是一个numpy错误,因为'u'需要后跟一个数字,指定每个项目的字节数(需要有效):

import numpy as np

np.dtype('u')

>>> TypeError: data type "u" not understood

np.dtype('u1')

>>> dtype('uint8')

np.dtype('u2')

>>> dtype('uint16')

np.dtype('u4')

>>> dtype('uint32')

np.dtype('u8')

>>> dtype('uint64')

# testing another invalid argument
np.dtype('u3')

>>> TypeError: data type "u3" not understood

To summarise, the astypemethods of pandasobjects will try and do something sensible with any argument that is valid for numpy.dtype. Note that numpy.dtype('f')is the same as numpy.dtype('float32')and numpy.dtype('f8')is the same as numpy.dtype('float64')etc. Same goes for passing the arguments to pandasastypemethods.

总而言之,对象的astype方法pandas将尝试对任何对 有效的参数做一些明智的事情numpy.dtype。请注意,numpy.dtype('f')是一样的numpy.dtype('float32')numpy.dtype('f8')相同numpy.dtype('float64')等也是一样的传递参数pandasastype的方法。

To locate the respective data type classes in NumPy, the Pandas docsrecommends this:

要在 NumPy 中定位相应的数据类型类,Pandas 文档建议如下:

def subdtypes(dtype):
    subs = dtype.__subclasses__()
    if not subs:
        return dtype
    return [dtype, [subdtypes(dt) for dt in subs]]

subdtypes(np.generic)

Output:

输出:

[numpy.generic,
 [[numpy.number,
   [[numpy.integer,
     [[numpy.signedinteger,
       [numpy.int8,
        numpy.int16,
        numpy.int32,
        numpy.int64,
        numpy.int64,
        numpy.timedelta64]],
      [numpy.unsignedinteger,
       [numpy.uint8,
        numpy.uint16,
        numpy.uint32,
        numpy.uint64,
        numpy.uint64]]]],
    [numpy.inexact,
     [[numpy.floating,
       [numpy.float16, numpy.float32, numpy.float64, numpy.float128]],
      [numpy.complexfloating,
       [numpy.complex64, numpy.complex128, numpy.complex256]]]]]],
  [numpy.flexible,
   [[numpy.character, [numpy.bytes_, numpy.str_]],
    [numpy.void, [numpy.record]]]],
  numpy.bool_,
  numpy.datetime64,
  numpy.object_]]

Pandas accepts these classes as valid types. For example, dtype={'A': np.float}.

Pandas 接受这些类作为有效类型。例如,dtype={'A': np.float}

NumPy docs containmore details and a chart:

NumPy 文档包含更多详细信息和图表:

dtypes

数据类型

回答by jeffhale

Building on other answers, pandas also includes a number of its own dtypes.

在其他答案的基础上,pandas 还包括许多自己的 dtype。

Pandas and third-party libraries extend NumPy's type system in a few places. This section describes the extensions pandas has made internally. See Extension types for how to write your own extension that works with pandas. See Extension data types for a list of third-party libraries that have implemented an extension.

The following table lists all of pandas extension types. See the respective document

Pandas 和第三方库在一些地方扩展了 NumPy 的类型系统。本节描述了 pandas 内部所做的扩展。请参阅扩展类型以了解如何编写自己的可与 Pandas 一起使用的扩展。有关已实现扩展的第三方库列表,请参阅扩展数据类型。

下表列出了所有 Pandas 扩展类型。请参阅相应的文件

https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes

https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes

enter image description here

在此处输入图片说明

Also, pandas 1.0 will have a string dtype.

此外,pandas 1.0 将有一个字符串 dtype。