在 Pandas 中断言列数据类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28596493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:57:41  来源:igfitidea点击:

Asserting column(s) data type in Pandas

pythonpandasdataframeassert

提问by nfmcclure

I'm trying to find a better way to assert the column data type in Python/Pandas of a given dataframe.

我试图找到一种更好的方法来断言给定数据帧的 Python/Pandas 中的列数据类型。

For example:

例如:

import pandas as pd
t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer']})

I would like to assert that specific columns in the data frame are numeric. Here's what I have:

我想断言数据框中的特定列是数字。这是我所拥有的:

numeric_cols = ['a', 'b']  # These will be given
assert [x in ['int64','float'] for x in [t[y].dtype for y in numeric_cols]]

This last assert line doesn't feel very pythonic. Maybe it is and I'm just cramming it all in one hard to read line. Is there a better way? I would like to write something like:

最后一个断言行感觉不是很pythonic。也许是这样,我只是把它全部塞进了一条难以阅读的行中。有没有更好的办法?我想写一些类似的东西:

assert t[numeric_cols].dtype.isnumeric()

I can't seem to find something like that though.

我似乎无法找到类似的东西。

回答by unutbu

You could use ptypes.is_numeric_dtypeto identify numeric columns, ptypes.is_string_dtypeto identify string-like columns, and ptypes.is_datetime64_any_dtypeto identify datetime64 columns:

您可以使用ptypes.is_numeric_dtype来标识数字列、ptypes.is_string_dtype标识类似字符串的列以及ptypes.is_datetime64_any_dtype标识 datetime64 列:

import pandas as pd
import pandas.api.types as ptypes

t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer'],
              'd':pd.date_range('2000-1-1', periods=3)})
cols_to_check = ['a', 'b']

assert all(ptypes.is_numeric_dtype(t[col]) for col in cols_to_check)
# True
assert ptypes.is_string_dtype(t['c'])
# True
assert ptypes.is_datetime64_any_dtype(t['d'])
# True


The pandas.api.typesmodule (which I aliased to ptypes) has both a is_datetime64_any_dtypeand a is_datetime64_dtypefunction. The difference is in how they treat timezone-aware array-likes:

pandas.api.types模块(我将其别名为ptypes)同时具有 ais_datetime64_any_dtypeis_datetime64_dtype函数。不同之处在于他们如何处理时区感知类数组:

In [239]: ptypes.is_datetime64_any_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
Out[239]: True

In [240]: ptypes.is_datetime64_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
Out[240]: False

回答by ely

You can do this

你可以这样做

import numpy as np
numeric_dtypes = [np.dtype('int64'), np.dtype('float64')]
# or whatever types you want

assert t[numeric_cols].apply(lambda c: c.dtype).isin(numeric_dtypes).all()