Python 什么是 dtype('O')?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37561991/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:34:11  来源:igfitidea点击:

What is dtype('O')?

pythonpandasnumpydataframetypes

提问by quant

I have a dataframe in pandas and I'm trying to figure out what the types of its values are. I am unsure what the type is of column 'Test'. However, when I run myFrame['Test'].dtype, I get;

我在 Pandas 中有一个数据框,我试图弄清楚它的值的类型是什么。我不确定 column 的类型是什么'Test'。但是,当我运行时myFrame['Test'].dtype,我得到了;

dtype('O')

What does this mean?

这是什么意思?

采纳答案by prosti

When you see dtype('O')inside dataframe this means Pandas string.

当您看到dtype('O')内部数据框时,这意味着 Pandas 字符串。

What is dtype?

什么是dtype

Something that belongs to pandasor numpy, or both, or something else? If we examine pandas code:

有时候,那属于pandasnumpy,或两者兼而有之,还是其他什么东西?如果我们检查熊猫代码:

df = pd.DataFrame({'float': [1.0],
                    'int': [1],
                    'datetime': [pd.Timestamp('20180310')],
                    'string': ['foo']})
print(df)
print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)
df['string'].dtype

It will output like this:

它会输出如下:

   float  int   datetime string    
0    1.0    1 2018-03-10    foo
---
float64 int64 datetime64[ns] object
---
dtype('O')

You can interpret the last as Pandas dtype('O')or Pandas object which is Python type string, and this corresponds to Numpy string_, or unicode_types.

您可以将最后一个解释为 Pandasdtype('O')或 Pandas 对象,它是 Python 类型的字符串,这对应于 Numpystring_unicode_类型。

Pandas dtype    Python type     NumPy type          Usage
object          str             string_, unicode_   Text

Like Don Quixote is on ass, Pandas is on Numpy and Numpy understand the underlying architecture of your system and uses the class numpy.dtypefor that.

就像唐吉诃德在屁股上一样,Pandas 在 Numpy 上,而 Numpy 了解您系统的底层架构并numpy.dtype为此使用该类。

Data type object is an instance of numpy.dtypeclass that understand the data type more preciseincluding:

数据类型对象是numpy.dtype类的一个实例,可以更精确地理解数据类型,包括:

  • Type of the data (integer, float, Python object, etc.)
  • Size of the data (how many bytes is in e.g. the integer)
  • Byte order of the data (little-endian or big-endian)
  • If the data type is structured, an aggregate of other data types, (e.g., describing an array item consisting of an integer and a float)
  • What are the names of the "fields" of the structure
  • What is the data-type of each field
  • Which part of the memory block each field takes
  • If the data type is a sub-array, what is its shape and data type
  • 数据类型(整数、浮点数、Python 对象等)
  • 数据的大小(例如整数中有多少字节)
  • 数据的字节顺序(小端或大端)
  • 如果数据类型是结构化的,则是其他数据类型的聚合(例如,描述由整数和浮点数组成的数组项)
  • 结构的“字段”的名称是什么
  • 每个字段的数据类型是什么
  • 每个字段占用内存块的哪一部分
  • 如果数据类型是子数组,它的形状和数据类型是什么


In the context of this question dtypebelongs to both pands and numpy and in particular dtype('O')means we expect the string.

在这个问题的上下文中,dtypepands 和 numpy 都属于,特别是dtype('O')意味着我们期望字符串。



Here is some code for testing with explanation: If we have the dataset as dictionary

下面是一些带有解释的测试代码:如果我们将数据集作为字典

import pandas as pd
import numpy as np
from pandas import Timestamp

data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}
df = pd.DataFrame.from_dict(data) #now we have a dataframe

print(df)
print(df.dtypes)

The last lines will examine the dataframe and note the output:

最后几行将检查数据框并注意输出:

   id       date                  role  num   fnum
0   1 2018-12-12               Support  123   3.14
1   2 2018-12-12             Marketing  234   2.14
2   3 2018-12-12  Business Development  345  -0.14
3   4 2018-12-12                 Sales  456  41.30
4   5 2018-12-12           Engineering  567   3.14
id               int64
date    datetime64[ns]
role            object
num              int64
fnum           float64
dtype: object

All kind of different dtypes

各种不一样 dtypes

df.iloc[1,:] = np.nan
df.iloc[2,:] = None

But if we try to set np.nanor Nonethis will not affect the original column dtype. The output will be like this:

但是如果我们尝试设置np.nanorNone这不会影响原始列的 dtype。输出将是这样的:

print(df)
print(df.dtypes)

    id       date         role    num   fnum
0  1.0 2018-12-12      Support  123.0   3.14
1  NaN        NaT          NaN    NaN    NaN
2  NaN        NaT         None    NaN    NaN
3  4.0 2018-12-12        Sales  456.0  41.30
4  5.0 2018-12-12  Engineering  567.0   3.14
id             float64
date    datetime64[ns]
role            object
num            float64
fnum           float64
dtype: object

So np.nanor Nonewill not change the columns dtype, unless we set the all column rows to np.nanor None. In that case column will become float64or objectrespectively.

所以np.nanorNone不会改变列dtype,除非我们将所有列的行设置为np.nanor None。在这种情况下,列将分别变为float64object

You may try also setting single rows:

您也可以尝试设置单行:

df.iloc[3,:] = 0 # will convert datetime to object only
df.iloc[4,:] = '' # will convert all columns to object

And to note here, if we set string inside a non string column it will become string or object dtype.

并在此注意,如果我们在非字符串列中设置 string ,它将成为 string 或 object dtype

回答by jezrael

It means:

它的意思是:

'O'     (Python) objects

Source.

来源

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are to an existing type, or an error will be raised. The supported kinds are:

第一个字符指定数据的类型,其余字符指定每个项目的字节数,Unicode 除外,它被解释为字符数。项目大小必须与现有类型相对应,否则将引发错误。支持的类型是现有类型,否则将引发错误。支持的种类有:

'b'       boolean
'i'       (signed) integer
'u'       unsigned integer
'f'       floating-point
'c'       complex-floating point
'O'       (Python) objects
'S', 'a'  (byte-)string
'U'       Unicode
'V'       raw data (void)

Another answerhelps if need check types.

如果需要检查,另一个答案会有所帮助type

回答by shx2

It means "a python object", i.e. not one of the builtin scalar types supported by numpy.

它的意思是“一个 python 对象”,即不是 numpy 支持的内置标量类型之一。

np.array([object()]).dtype
=> dtype('O')

回答by Jeru Luke

'O' stands for object.

'O' 代表对象

#Loading a csv file as a dataframe
import pandas as pd 
train_df = pd.read_csv('train.csv')
col_name = 'Name of Employee'

#Checking the datatype of column name
train_df[col_name].dtype

#Instead try printing the same thing
print train_df[col_name].dtype

The first line returns: dtype('O')

第一行返回: dtype('O')

The line with the print statement returns the following: object

带有打印语句的行返回以下内容: object