将 datetime64 列拆分为 Pandas 数据框中的日期和时间列

Question

提问by azuric

If I have a dataframe with the first column being a datetime64 column. How do I split this column into 2 new columns, a date column and a time column. Here is my data and code so far:

如果我有一个数据框，第一列是 datetime64 列。如何将此列拆分为 2 个新列、一个日期列和一个时间列。到目前为止，这是我的数据和代码：

DateTime,Actual,Consensus,Previous
20140110 13:30:00,74000,196000,241000
20131206 13:30:00,241000,180000,200000
20131108 13:30:00,200000,125000,163000
20131022 12:30:00,163000,180000,193000
20130906 12:30:00,193000,180000,104000
20130802 12:30:00,104000,184000,188000
20130705 12:30:00,188000,165000,176000
20130607 12:30:00,176000,170000,165000
20130503 12:30:00,165000,145000,138000
20130405 12:30:00,138000,200000,268000
...


import pandas as pd
nfp = pd.read_csv("NFP.csv", parse_dates=[0])
nfp

Gives:

给出：

Out[10]: <class 'pandas.core.frame.DataFrame'>
         Int64Index: 83 entries, 0 to 82
         Data columns (total 4 columns):
         DateTime     82  non-null values
         Actual       82  non-null values
         Consensus    82  non-null values
         Previous     82  non-null values
         dtypes: datetime64[ns](1), float64(3)

All good but not sure what to do from here.

一切都很好，但不知道从这里开始做什么。

Two points specifically I am unsure about:

具体有两点我不确定：

Is it possible to do this when I read the csv file in the first place? If so, how?
Can any one help show me how to do the split once I have performed csv_read?

当我首先读取 csv 文件时可以这样做吗？如果是这样，如何？
一旦我执行了 csv_read，任何人都可以帮助我展示如何进行拆分吗？

Also is there anywhere I can look up this kind of information?

还有什么地方可以查找此类信息？

Having a hard time finding a detailed reference of the class libraries Thanks!

很难找到类库的详细参考，谢谢！

Answer 1

回答by unutbu

How to parse the CSV directly into the desired DataFrame:

如何将 CSV 直接解析为所需的 DataFrame：

Pass a dict of functions to pandas.read_csv's converterskeyword argument:

将函数的 dict 传递给pandas.read_csv的converters关键字参数：

import pandas as pd
import datetime as DT
nfp = pd.read_csv("NFP.csv", 
                  sep=r'[\s,]',              # 1
                  header=None, skiprows=1,
                  converters={               # 2
                      0: lambda x: DT.datetime.strptime(x, '%Y%m%d'),  
                      1: lambda x: DT.time(*map(int, x.split(':')))},
                  names=['Date', 'Time', 'Actual', 'Consensus', 'Previous'])

print(nfp)

yields

产量

        Date      Time  Actual  Consensus  Previous
0 2014-01-10  13:30:00   74000     196000    241000
1 2013-12-06  13:30:00  241000     180000    200000
2 2013-11-08  13:30:00  200000     125000    163000
3 2013-10-22  12:30:00  163000     180000    193000
4 2013-09-06  12:30:00  193000     180000    104000
5 2013-08-02  12:30:00  104000     184000    188000
6 2013-07-05  12:30:00  188000     165000    176000
7 2013-06-07  12:30:00  176000     170000    165000
8 2013-05-03  12:30:00  165000     145000    138000
9 2013-04-05  12:30:00  138000     200000    268000

sep=r'[\s,]'tells read_csvto split lines of the csv on the regex pattern r'[\s,]'-- a whitespace or a comma.
The convertersparameter tells read_csvto apply the given functions to certain columns. The keys (e.g. 0 and 1) refer to the column index, and the values are the functions to be applied.

sep=r'[\s,]'告诉read_csv在正则表达式模式上拆分 csv 行r'[\s,]'- 空格或逗号。
该converters参数告诉read_csv将给定的函数应用于某些列。键（例如 0 和 1）指的是列索引，值是要应用的函数。

How to split the DataFrame after performing csv_read

执行 csv_read 后如何拆分 DataFrame

import pandas as pd
nfp = pd.read_csv("NFP.csv", parse_dates=[0], infer_datetime_format=True)
temp = pd.DatetimeIndex(nfp['DateTime'])
nfp['Date'] = temp.date
nfp['Time'] = temp.time
del nfp['DateTime']

print(nfp)

Which is faster?

哪个更快？

It depends on the size of the CSV. (Thanks to Jeff for pointing this out.)

这取决于 CSV 的大小。（感谢杰夫指出这一点。）

For tiny CSVs, parsing the CSV into the desired form directly is faster than using a DatetimeIndex after parsing with parse_dates=[0]:

对于微小的 CSV 文件，直接将 CSV 解析为所需的格式比在解析后使用 DatetimeIndex 更快parse_dates=[0]：

def using_converter():
    nfp = pd.read_csv("NFP.csv", sep=r'[\s,]', header=None, skiprows=1,
                      converters={
                          0: lambda x: DT.datetime.strptime(x, '%Y%m%d'),
                          1: lambda x: DT.time(*map(int, x.split(':')))},
                      names=['Date', 'Time', 'Actual', 'Consensus', 'Previous'])
    return nfp

def using_index():
    nfp = pd.read_csv("NFP.csv", parse_dates=[0], infer_datetime_format=True)
    temp = pd.DatetimeIndex(nfp['DateTime'])
    nfp['Date'] = temp.date
    nfp['Time'] = temp.time
    del nfp['DateTime']
    return nfp

In [114]: %timeit using_index()
100 loops, best of 3: 1.71 ms per loop

In [115]: %timeit using_converter()
1000 loops, best of 3: 914 μs per loop

However, for CSVs of just a few hundred lines or more, using a DatetimeIndex is faster.

但是，对于只有几百行或更多行的 CSV，使用 DatetimeIndex 会更快。

N = 20
filename = '/tmp/data'
content = '''\
DateTime,Actual,Consensus,Previous
20140110 13:30:00,74000,196000,241000
20131206 13:30:00,241000,180000,200000
20131108 13:30:00,200000,125000,163000
20131022 12:30:00,163000,180000,193000
20130906 12:30:00,193000,180000,104000
20130802 12:30:00,104000,184000,188000
20130705 12:30:00,188000,165000,176000
20130607 12:30:00,176000,170000,165000
20130503 12:30:00,165000,145000,138000
20130405 12:30:00,138000,200000,268000'''

def setup(n):
    header, remainder = content.split('\n', 1)
    with open(filename, 'w') as f:
        f.write('\n'.join([header]+[remainder]*n))

In [304]: setup(50)

In [305]: %timeit using_converter()
100 loops, best of 3: 9.78 ms per loop

In [306]: %timeit using_index()
100 loops, best of 3: 9.3 ms per loop

Where can I look up this kind of information?

在哪里可以查到这种信息？

Sometimes you can find examples in the Pandas Cookbook.
Sometimes web searching or searching Stackoverflow suffices.
Spending a weekend snowed in with nothing to do but reading the pandas documentationwill surely help too.
Install IPython. It has tab completion and if you type a ?after a function, it gives you the function's docstring. Those two features really help you introspect Python objects quickly. It also tells you in what file the function is defined (if defined in pure Python) -- which leads me to...
Reading the source code

有时您可以在Pandas Cookbook 中找到示例。
有时网络搜索或搜索 Stackoverflow 就足够了。
花一个周末无所事事，但阅读 Pandas 文档肯定也会有所帮助。
安装IPython。它具有制表符完成?功能，如果您在函数后键入 a ，它会为您提供该函数的文档字符串。这两个功能确实可以帮助您快速内省 Python 对象。它还告诉您函数是在哪个文件中定义的（如果是在纯 Python 中定义的）——这导致我......
阅读源代码

Just keep at it. The more you know the easier it gets.

只要坚持下去。你知道的越多，就越容易。

If you give it your best shot and still can't find the answer, post a question on Stackoverflow. You'll hopefully get an answer quickly, and help others searching for the same thing.

如果您尽力而为，但仍然找不到答案，请在 Stackoverflow 上发布问题。希望您能快速得到答案，并帮助其他人搜索相同的内容。

将 datetime64 列拆分为 Pandas 数据框中的日期和时间列

提问by azuric

回答by unutbu

相关推荐

最近更新

标签

将 datetime64 列拆分为 Pandas 数据框中的日期和时间列

提问by azuric

回答by unutbu

相关推荐

IPython Notebook 和 Pandas 自动完成

pandas 如何使用熊猫在 x 轴上绘制列并使用索引作为 y 轴？

Pandas - 重采样和标准差

从深度嵌套的 JSON 创建 Pandas DataFrame

相关推荐

最近更新

标签