Python 如何确定 Pandas/NumPy 中的列/变量是否为数字?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19900202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to determine whether a column/variable is numeric or not in Pandas/NumPy?
提问by user2808117
Is there a better way to determine whether a variable in Pandas
and/or NumPy
is numeric
or not ?
有没有更好的方法来确定变量是否在Pandas
和/或NumPy
中numeric
?
I have a self defined dictionary
with dtypes
as keys and numeric
/ not
as values.
我定义了一个自我dictionary
与dtypes
密钥和numeric
/not
作为值。
回答by Jeff
This is a pseudo-internal method to return only the numeric type data
这是一个仅返回数字类型数据的伪内部方法
In [27]: df = DataFrame(dict(A = np.arange(3),
B = np.random.randn(3),
C = ['foo','bar','bah'],
D = Timestamp('20130101')))
In [28]: df
Out[28]:
A B C D
0 0 -0.667672 foo 2013-01-01 00:00:00
1 1 0.811300 bar 2013-01-01 00:00:00
2 2 2.020402 bah 2013-01-01 00:00:00
In [29]: df.dtypes
Out[29]:
A int64
B float64
C object
D datetime64[ns]
dtype: object
In [30]: df._get_numeric_data()
Out[30]:
A B
0 0 -0.667672
1 1 0.811300
2 2 2.020402
回答by danodonovan
Based on @jaime's answer in the comments, you need to check .dtype.kind
for the column of interest. For example;
根据@jaime 在评论中的回答,您需要检查.dtype.kind
感兴趣的列。例如;
>>> import pandas as pd
>>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
>>> df['numeric'].dtype.kind in 'biufc'
>>> True
>>> df['not_numeric'].dtype.kind in 'biufc'
>>> False
NB The meaning of biufc
: b
bool, i
int (signed), u
unsigned int, f
float, c
complex. See https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind
NB 的含义biufc
:b
bool, i
int (signed), u
unsigned int, f
float, c
complex。请参阅https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind
回答by ayhan
You can use np.issubdtype
to check if the dtype is a sub dtype of np.number
. Examples:
您可以使用np.issubdtype
来检查 dtype 是否是np.number
. 例子:
np.issubdtype(arr.dtype, np.number) # where arr is a numpy array
np.issubdtype(df['X'].dtype, np.number) # where df['X'] is a pandas Series
This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype
function from pandas is a better alternative than np.issubdtype.
这适用于 numpy 的 dtypes,但不适用于 pd.Categorical 等熊猫特定类型,正如托马斯指出的那样。如果您使用的is_numeric_dtype
是 pandas 的分类函数,则比 np.issubdtype 更好。
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0],
'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
df
Out:
A B C D
0 1 1.0 1j a
1 2 2.0 2j b
2 3 3.0 3j c
df.dtypes
Out:
A int64
B float64
C complex128
D object
dtype: object
np.issubdtype(df['A'].dtype, np.number)
Out: True
np.issubdtype(df['B'].dtype, np.number)
Out: True
np.issubdtype(df['C'].dtype, np.number)
Out: True
np.issubdtype(df['D'].dtype, np.number)
Out: False
For multiple columns you can use np.vectorize:
对于多列,您可以使用 np.vectorize:
is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
is_number(df.dtypes)
Out: array([ True, True, True, False], dtype=bool)
And for selection, pandas now has select_dtypes
:
对于选择,熊猫现在有select_dtypes
:
df.select_dtypes(include=[np.number])
Out:
A B C
0 1 1.0 1j
1 2 2.0 2j
2 3 3.0 3j
回答by paulwasit
You can also try:
你也可以试试:
df_dtypes = np.array(df.dtypes)
df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]
It returns a list of booleans: True
if numeric, False
if not.
它返回一个布尔值列表:True
如果是数字,False
如果不是。
回答by danthelion
In pandas 0.20.2
you can do:
在pandas 0.20.2
你可以这样做:
import pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype
df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})
is_string_dtype(df['A'])
>>>> True
is_numeric_dtype(df['B'])
>>>> True
回答by Punit S
How about just checking type for one of the values in the column? We've always had something like this:
只检查列中值之一的类型怎么样?我们一直有这样的事情:
isinstance(x, (int, long, float, complex))
When I try to check the datatypes for the columns in below dataframe, I get them as 'object' and not a numerical type I'm expecting:
当我尝试检查下面数据框中列的数据类型时,我将它们作为“对象”而不是我期望的数字类型:
df = pd.DataFrame(columns=('time', 'test1', 'test2'))
for i in range(20):
df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100]
df.dtypes
time datetime64[ns]
test1 object
test2 object
dtype: object
When I do the following, it seems to give me accurate result:
当我执行以下操作时,它似乎给了我准确的结果:
isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))
returns
返回
True
回答by Beta
Just to add to all other answers, one can also use df.info()
to get whats the data type of each column.
只是为了添加到所有其他答案中,还可以使用df.info()
来获取每一列的数据类型。
回答by farshad madani
Pandas has select_dtype
function. You can easily filter your columns on int64, and float64like this:
熊猫有select_dtype
功能。您可以轻松过滤int64和float64上的列,如下所示:
df.select_dtypes(include=['int64','float64'])