Python 什么是 dtype('O')?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37561991/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is dtype('O')?
提问by quant
I have a dataframe in pandas and I'm trying to figure out what the types of its values are. I am unsure what the type is of column 'Test'
. However, when I run myFrame['Test'].dtype
, I get;
我在 Pandas 中有一个数据框,我试图弄清楚它的值的类型是什么。我不确定 column 的类型是什么'Test'
。但是,当我运行时myFrame['Test'].dtype
,我得到了;
dtype('O')
What does this mean?
这是什么意思?
采纳答案by prosti
When you see dtype('O')
inside dataframe this means Pandas string.
当您看到dtype('O')
内部数据框时,这意味着 Pandas 字符串。
What is dtype
?
什么是dtype
?
Something that belongs to pandas
or numpy
, or both, or something else? If we examine pandas code:
有时候,那属于pandas
或numpy
,或两者兼而有之,还是其他什么东西?如果我们检查熊猫代码:
df = pd.DataFrame({'float': [1.0],
'int': [1],
'datetime': [pd.Timestamp('20180310')],
'string': ['foo']})
print(df)
print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)
df['string'].dtype
It will output like this:
它会输出如下:
float int datetime string
0 1.0 1 2018-03-10 foo
---
float64 int64 datetime64[ns] object
---
dtype('O')
You can interpret the last as Pandas dtype('O')
or Pandas object which is Python type string, and this corresponds to Numpy string_
, or unicode_
types.
您可以将最后一个解释为 Pandasdtype('O')
或 Pandas 对象,它是 Python 类型的字符串,这对应于 Numpystring_
或unicode_
类型。
Pandas dtype Python type NumPy type Usage
object str string_, unicode_ Text
Like Don Quixote is on ass, Pandas is on Numpy and Numpy understand the underlying architecture of your system and uses the class numpy.dtype
for that.
就像唐吉诃德在屁股上一样,Pandas 在 Numpy 上,而 Numpy 了解您系统的底层架构并numpy.dtype
为此使用该类。
Data type object is an instance of numpy.dtype
class that understand the data type more preciseincluding:
数据类型对象是numpy.dtype
类的一个实例,可以更精确地理解数据类型,包括:
- Type of the data (integer, float, Python object, etc.)
- Size of the data (how many bytes is in e.g. the integer)
- Byte order of the data (little-endian or big-endian)
- If the data type is structured, an aggregate of other data types, (e.g., describing an array item consisting of an integer and a float)
- What are the names of the "fields" of the structure
- What is the data-type of each field
- Which part of the memory block each field takes
- If the data type is a sub-array, what is its shape and data type
- 数据类型(整数、浮点数、Python 对象等)
- 数据的大小(例如整数中有多少字节)
- 数据的字节顺序(小端或大端)
- 如果数据类型是结构化的,则是其他数据类型的聚合(例如,描述由整数和浮点数组成的数组项)
- 结构的“字段”的名称是什么
- 每个字段的数据类型是什么
- 每个字段占用内存块的哪一部分
- 如果数据类型是子数组,它的形状和数据类型是什么
In the context of this question dtype
belongs to both pands and numpy and in particular dtype('O')
means we expect the string.
在这个问题的上下文中,dtype
pands 和 numpy 都属于,特别是dtype('O')
意味着我们期望字符串。
Here is some code for testing with explanation: If we have the dataset as dictionary
下面是一些带有解释的测试代码:如果我们将数据集作为字典
import pandas as pd
import numpy as np
from pandas import Timestamp
data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}
df = pd.DataFrame.from_dict(data) #now we have a dataframe
print(df)
print(df.dtypes)
The last lines will examine the dataframe and note the output:
最后几行将检查数据框并注意输出:
id date role num fnum
0 1 2018-12-12 Support 123 3.14
1 2 2018-12-12 Marketing 234 2.14
2 3 2018-12-12 Business Development 345 -0.14
3 4 2018-12-12 Sales 456 41.30
4 5 2018-12-12 Engineering 567 3.14
id int64
date datetime64[ns]
role object
num int64
fnum float64
dtype: object
All kind of different dtypes
各种不一样 dtypes
df.iloc[1,:] = np.nan
df.iloc[2,:] = None
But if we try to set np.nan
or None
this will not affect the original column dtype. The output will be like this:
但是如果我们尝试设置np.nan
orNone
这不会影响原始列的 dtype。输出将是这样的:
print(df)
print(df.dtypes)
id date role num fnum
0 1.0 2018-12-12 Support 123.0 3.14
1 NaN NaT NaN NaN NaN
2 NaN NaT None NaN NaN
3 4.0 2018-12-12 Sales 456.0 41.30
4 5.0 2018-12-12 Engineering 567.0 3.14
id float64
date datetime64[ns]
role object
num float64
fnum float64
dtype: object
So np.nan
or None
will not change the columns dtype
, unless we set the all column rows to np.nan
or None
. In that case column will become float64
or object
respectively.
所以np.nan
orNone
不会改变列dtype
,除非我们将所有列的行设置为np.nan
or None
。在这种情况下,列将分别变为float64
或object
。
You may try also setting single rows:
您也可以尝试设置单行:
df.iloc[3,:] = 0 # will convert datetime to object only
df.iloc[4,:] = '' # will convert all columns to object
And to note here, if we set string inside a non string column it will become string or object dtype
.
并在此注意,如果我们在非字符串列中设置 string ,它将成为 string 或 object dtype
。
回答by jezrael
It means:
它的意思是:
'O' (Python) objects
来源。
The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are to an existing type, or an error will be raised. The supported kinds are:
第一个字符指定数据的类型,其余字符指定每个项目的字节数,Unicode 除外,它被解释为字符数。项目大小必须与现有类型相对应,否则将引发错误。支持的类型是现有类型,否则将引发错误。支持的种类有:
'b' boolean
'i' (signed) integer
'u' unsigned integer
'f' floating-point
'c' complex-floating point
'O' (Python) objects
'S', 'a' (byte-)string
'U' Unicode
'V' raw data (void)
Another answerhelps if need check type
s.
如果需要检查,另一个答案会有所帮助type
。
回答by shx2
It means "a python object", i.e. not one of the builtin scalar types supported by numpy.
它的意思是“一个 python 对象”,即不是 numpy 支持的内置标量类型之一。
np.array([object()]).dtype
=> dtype('O')
回答by Jeru Luke
'O' stands for object.
'O' 代表对象。
#Loading a csv file as a dataframe
import pandas as pd
train_df = pd.read_csv('train.csv')
col_name = 'Name of Employee'
#Checking the datatype of column name
train_df[col_name].dtype
#Instead try printing the same thing
print train_df[col_name].dtype
The first line returns: dtype('O')
第一行返回: dtype('O')
The line with the print statement returns the following: object
带有打印语句的行返回以下内容: object