Python 如何确定 Pandas/NumPy 中的列/变量是否为数字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19900202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:56:04  来源:igfitidea点击:

How to determine whether a column/variable is numeric or not in Pandas/NumPy?

pythonpandasnumpy

提问by user2808117

Is there a better way to determine whether a variable in Pandasand/or NumPyis numericor not ?

有没有更好的方法来确定变量是否在Pandas和/或NumPynumeric

I have a self defined dictionarywith dtypesas keys and numeric/ notas values.

我定义了一个自我dictionarydtypes密钥和numeric/not作为值。

回答by Jeff

This is a pseudo-internal method to return only the numeric type data

这是一个仅返回数字类型数据的伪内部方法

In [27]: df = DataFrame(dict(A = np.arange(3), 
                             B = np.random.randn(3), 
                             C = ['foo','bar','bah'], 
                             D = Timestamp('20130101')))

In [28]: df
Out[28]: 
   A         B    C                   D
0  0 -0.667672  foo 2013-01-01 00:00:00
1  1  0.811300  bar 2013-01-01 00:00:00
2  2  2.020402  bah 2013-01-01 00:00:00

In [29]: df.dtypes
Out[29]: 
A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [30]: df._get_numeric_data()
Out[30]: 
   A         B
0  0 -0.667672
1  1  0.811300
2  2  2.020402

回答by danodonovan

Based on @jaime's answer in the comments, you need to check .dtype.kindfor the column of interest. For example;

根据@jaime 在评论中的回答,您需要检查.dtype.kind感兴趣的列。例如;

>>> import pandas as pd
>>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
>>> df['numeric'].dtype.kind in 'biufc'
>>> True
>>> df['not_numeric'].dtype.kind in 'biufc'
>>> False

NB The meaning of biufc: bbool, iint (signed), uunsigned int, ffloat, ccomplex. See https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind

NB 的含义biufcbbool, iint (signed), uunsigned int, ffloat, ccomplex。请参阅https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind

回答by ayhan

You can use np.issubdtypeto check if the dtype is a sub dtype of np.number. Examples:

您可以使用np.issubdtype来检查 dtype 是否是np.number. 例子:

np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array
np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series

This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtypefunction from pandas is a better alternative than np.issubdtype.

这适用于 numpy 的 dtypes,但不适用于 pd.Categorical 等熊猫特定类型,正如托马斯指出的那样。如果您使用的is_numeric_dtype是 pandas 的分类函数,则比 np.issubdtype 更好。

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 
                   'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
df
Out: 
   A    B   C  D
0  1  1.0  1j  a
1  2  2.0  2j  b
2  3  3.0  3j  c

df.dtypes
Out: 
A         int64
B       float64
C    complex128
D        object
dtype: object


np.issubdtype(df['A'].dtype, np.number)
Out: True

np.issubdtype(df['B'].dtype, np.number)
Out: True

np.issubdtype(df['C'].dtype, np.number)
Out: True

np.issubdtype(df['D'].dtype, np.number)
Out: False

For multiple columns you can use np.vectorize:

对于多列,您可以使用 np.vectorize:

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
is_number(df.dtypes)
Out: array([ True,  True,  True, False], dtype=bool)

And for selection, pandas now has select_dtypes:

对于选择,熊猫现在有select_dtypes

df.select_dtypes(include=[np.number])
Out: 
   A    B   C
0  1  1.0  1j
1  2  2.0  2j
2  3  3.0  3j

回答by paulwasit

You can also try:

你也可以试试:

df_dtypes = np.array(df.dtypes)
df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]

It returns a list of booleans: Trueif numeric, Falseif not.

它返回一个布尔值列表:True如果是数字,False如果不是。

回答by danthelion

In pandas 0.20.2you can do:

pandas 0.20.2你可以这样做:

import pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True

回答by Punit S

How about just checking type for one of the values in the column? We've always had something like this:

只检查列中值之一的类型怎么样?我们一直有这样的事情:

isinstance(x, (int, long, float, complex))

When I try to check the datatypes for the columns in below dataframe, I get them as 'object' and not a numerical type I'm expecting:

当我尝试检查下面数据框中列的数据类型时,我将它们作为“对象”而不是我期望的数字类型:

df = pd.DataFrame(columns=('time', 'test1', 'test2'))
for i in range(20):
    df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100]
df.dtypes

time     datetime64[ns]
test1            object
test2            object
dtype: object

When I do the following, it seems to give me accurate result:

当我执行以下操作时,它似乎给了我准确的结果:

isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))

returns

返回

True

回答by Beta

Just to add to all other answers, one can also use df.info()to get whats the data type of each column.

只是为了添加到所有其他答案中,还可以使用df.info()来获取每一列的数据类型。

回答by farshad madani

Pandas has select_dtypefunction. You can easily filter your columns on int64, and float64like this:

熊猫有select_dtype功能。您可以轻松过滤int64float64上的列,如下所示:

df.select_dtypes(include=['int64','float64'])