pandas 如何判断熊猫数据框中的列是否为日期时间类型?如何判断一列是否为数字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43214204/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:20:25  来源:igfitidea点击:

How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?

pythonpandasnumpydataframe

提问by Charlie Haley

I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.

我试图根据它们是否为日期类型来过滤Pandas数据框中的列。我可以弄清楚哪些是,但随后必须解析该输出或手动选择列。我想自动选择日期列。这是我目前所拥有的示例 - 在这种情况下,我只想选择“date_col”列。

import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
                   ['Mar-2017', 1, 2],
                   ['Apr-2017', 1, 2],
                   ['May-2017', 1, 2]], 
                  columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes

Out:

出去:

date_str            object
col1                 int64
col2                 int64
date_col    datetime64[ns]
dtype: object

采纳答案by Charlie Haley

Pandas has a cool function called select_dtypes, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float], to filter by numerical columns only: [np.number].

Pandas 有一个很酷的函数叫做select_dtypes,它可以将 exclude 或 include(或两者)作为参数。它根据 dtypes 过滤数据帧。因此,在这种情况下,您需要包含 dtype 列np.datetime64。要按整数过滤,您可以使用[np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float],仅按数字列过滤:[np.number]

df.select_dtypes(include=[np.datetime64])

Out:

出去:

    date_col
0   2017-02-01
1   2017-03-01
2   2017-04-01
3   2017-05-01

In:

在:

df.select_dtypes(include=[np.number])

Out:

出去:

    col1    col2
0   1       2
1   1       2
2   1       2
3   1       2

回答by jsignell

I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64doesn't seem to match datetime64[ns, UTC].

我刚遇到这个问题,发现@charlie-haley 的回答对于我的用例来说不够通用。特别是np.datetime64似乎不匹配datetime64[ns, UTC]

df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype)  # datetime64[ns, UTC]

You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtypefunction from the pandas api instead.

您还可以扩展 dtypes 列表以包含其他类型,但这似乎不是未来兼容性的好解决方案,因此我最终使用is_datetime64_any_dtype了 pandas api 中的函数。

In:

在:

from pandas.api.types import is_datetime64_any_dtype as is_datetime

df[[column for column in df.columns if is_datetime(df[column])]]

Out:

出去:

                   date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00

回答by MaxU

bit uglier Numpy alternative:

有点丑陋的 Numpy 替代方案:

In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
    date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01

In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
   col1  col2
0     1     2
1     1     2
2     1     2
3     1     2

回答by Bhagwat Chate

This code automatically identify the date column and change datatype from object to 'datetime64[ns]'. Once you got date datatype you can easily perform other operations.

此代码自动识别日期列并将数据类型从对象更改为“datetime64[ns]”。获得日期数据类型后,您可以轻松执行其他操作。

for col in data.columns:
    if data[col].dtype == 'object':
        try:
            data[col] = pd.to_datetime(data[col])
        except ValueError:
            pass