pandas 如何判断熊猫数据框中的列是否为日期时间类型?如何判断一列是否为数字?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43214204/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?
提问by Charlie Haley
I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.
我试图根据它们是否为日期类型来过滤Pandas数据框中的列。我可以弄清楚哪些是,但随后必须解析该输出或手动选择列。我想自动选择日期列。这是我目前所拥有的示例 - 在这种情况下,我只想选择“date_col”列。
import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
['Mar-2017', 1, 2],
['Apr-2017', 1, 2],
['May-2017', 1, 2]],
columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes
Out:
出去:
date_str object
col1 int64
col2 int64
date_col datetime64[ns]
dtype: object
采纳答案by Charlie Haley
Pandas has a cool function called select_dtypes
, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64
. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int]
, for float: [np.float32, np.float64, np.float16, np.float]
, to filter by numerical columns only: [np.number]
.
Pandas 有一个很酷的函数叫做select_dtypes
,它可以将 exclude 或 include(或两者)作为参数。它根据 dtypes 过滤数据帧。因此,在这种情况下,您需要包含 dtype 列np.datetime64
。要按整数过滤,您可以使用[np.int64, np.int32, np.int16, np.int]
, for float: [np.float32, np.float64, np.float16, np.float]
,仅按数字列过滤:[np.number]
。
df.select_dtypes(include=[np.datetime64])
Out:
出去:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In:
在:
df.select_dtypes(include=[np.number])
Out:
出去:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
回答by jsignell
I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64
doesn't seem to match datetime64[ns, UTC]
.
我刚遇到这个问题,发现@charlie-haley 的回答对于我的用例来说不够通用。特别是np.datetime64
似乎不匹配datetime64[ns, UTC]
。
df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype) # datetime64[ns, UTC]
You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype
function from the pandas api instead.
您还可以扩展 dtypes 列表以包含其他类型,但这似乎不是未来兼容性的好解决方案,因此我最终使用is_datetime64_any_dtype
了 pandas api 中的函数。
In:
在:
from pandas.api.types import is_datetime64_any_dtype as is_datetime
df[[column for column in df.columns if is_datetime(df[column])]]
Out:
出去:
date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00
回答by MaxU
bit uglier Numpy alternative:
有点丑陋的 Numpy 替代方案:
In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
回答by Bhagwat Chate
This code automatically identify the date column and change datatype from object to 'datetime64[ns]'. Once you got date datatype you can easily perform other operations.
此代码自动识别日期列并将数据类型从对象更改为“datetime64[ns]”。获得日期数据类型后,您可以轻松执行其他操作。
for col in data.columns:
if data[col].dtype == 'object':
try:
data[col] = pd.to_datetime(data[col])
except ValueError:
pass