pandas 如何判断熊猫数据框中的列是否为日期时间类型？如何判断一列是否为数字？

Question

提问by Charlie Haley

I am trying to filter the columns in a pandas dataframe based on whether they are of type date or not. I can figure out which ones are, but then would have to parse that output or manually select columns. I want to select date columns automatically. Here's what I have so far as an example - I'd want to only select the 'date_col' column in this case.

我试图根据它们是否为日期类型来过滤Pandas数据框中的列。我可以弄清楚哪些是，但随后必须解析该输出或手动选择列。我想自动选择日期列。这是我目前所拥有的示例 - 在这种情况下，我只想选择“date_col”列。

import pandas as pd
df = pd.DataFrame([['Feb-2017', 1, 2],
                   ['Mar-2017', 1, 2],
                   ['Apr-2017', 1, 2],
                   ['May-2017', 1, 2]], 
                  columns=['date_str', 'col1', 'col2'])
df['date_col'] = pd.to_datetime(df['date_str'])
df.dtypes

Out:

出去：

date_str            object
col1                 int64
col2                 int64
date_col    datetime64[ns]
dtype: object

Answer 1

采纳答案by Charlie Haley

Pandas has a cool function called select_dtypes, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float], to filter by numerical columns only: [np.number].

Pandas 有一个很酷的函数叫做select_dtypes，它可以将 exclude 或 include（或两者）作为参数。它根据 dtypes 过滤数据帧。因此，在这种情况下，您需要包含 dtype 列np.datetime64。要按整数过滤，您可以使用[np.int64, np.int32, np.int16, np.int], for float: [np.float32, np.float64, np.float16, np.float]，仅按数字列过滤：[np.number]。

df.select_dtypes(include=[np.datetime64])

Out:

出去：

    date_col
0   2017-02-01
1   2017-03-01
2   2017-04-01
3   2017-05-01

In:

在：

df.select_dtypes(include=[np.number])

Out:

出去：

    col1    col2
0   1       2
1   1       2
2   1       2
3   1       2

Answer 2

回答by jsignell

I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64doesn't seem to match datetime64[ns, UTC].

我刚遇到这个问题，发现@charlie-haley 的回答对于我的用例来说不够通用。特别是np.datetime64似乎不匹配datetime64[ns, UTC]。

df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype)  # datetime64[ns, UTC]

You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtypefunction from the pandas api instead.

您还可以扩展 dtypes 列表以包含其他类型，但这似乎不是未来兼容性的好解决方案，因此我最终使用is_datetime64_any_dtype了 pandas api 中的函数。

In:

在：

from pandas.api.types import is_datetime64_any_dtype as is_datetime

df[[column for column in df.columns if is_datetime(df[column])]]

Out:

出去：

                   date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00

Answer 3

回答by MaxU

bit uglier Numpy alternative:

有点丑陋的 Numpy 替代方案：

In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
    date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01

In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
   col1  col2
0     1     2
1     1     2
2     1     2
3     1     2

Answer 4

回答by Bhagwat Chate

This code automatically identify the date column and change datatype from object to 'datetime64[ns]'. Once you got date datatype you can easily perform other operations.

此代码自动识别日期列并将数据类型从对象更改为“datetime64[ns]”。获得日期数据类型后，您可以轻松执行其他操作。

for col in data.columns:
    if data[col].dtype == 'object':
        try:
            data[col] = pd.to_datetime(data[col])
        except ValueError:
            pass

pandas 如何判断熊猫数据框中的列是否为日期时间类型？如何判断一列是否为数字？

提问by Charlie Haley

采纳答案by Charlie Haley

回答by jsignell

回答by MaxU

回答by Bhagwat Chate

相关推荐

最近更新

标签

pandas 如何判断熊猫数据框中的列是否为日期时间类型？如何判断一列是否为数字？

提问by Charlie Haley

采纳答案by Charlie Haley

回答by jsignell

回答by MaxU

回答by Bhagwat Chate

相关推荐

“TypeError: 'DataFrame' 对象是可变的，因此它们不能被散列”在对 Pandas 数据帧索引进行排序时

python pandas 如果列字符串包含单词标志

pandas 如何找出列中唯一值的数量以及数据框中唯一值的数量？

pandas 删除 DataFrame 中的多个空白

相关推荐

最近更新

标签