python pandas从时间序列中提取唯一的日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14673394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:07:20  来源:igfitidea点击:

python pandas extract unique dates from time series

pythondatetimedataframepandastime-series

提问by tesla1060

I have a DataFrame which contains a lot of intraday data, the DataFrame has several days of data, dates are not continuous.

我有一个包含大量日内数据的 DataFrame,DataFrame 有几天的数据,日期不连续。

 2012-10-08 07:12:22            0.0    0          0  2315.6    0     0.0    0
 2012-10-08 09:14:00         2306.4   20  326586240  2306.4  472  2306.8    4
 2012-10-08 09:15:00         2306.8   34  249805440  2306.8  361  2308.0   26
 2012-10-08 09:15:01         2308.0    1   53309040  2307.4   77  2308.6    9
 2012-10-08 09:15:01.500000  2308.2    1  124630140  2307.0  180  2308.4    1
 2012-10-08 09:15:02         2307.0    5   85846260  2308.2  124  2308.0    9
 2012-10-08 09:15:02.500000  2307.0    3  128073540  2307.0  185  2307.6   11
 ......
 2012-10-10 07:19:30            0.0    0          0  2276.6    0     0.0    0
 2012-10-10 09:14:00         2283.2   80   98634240  2283.2  144  2283.4    1
 2012-10-10 09:15:00         2285.2   18  126814260  2285.2  185  2285.6    3
 2012-10-10 09:15:01         2285.8    6   98719560  2286.8  144  2287.0   25
 2012-10-10 09:15:01.500000  2287.0   36  144759420  2288.8  211  2289.0    4
 2012-10-10 09:15:02         2287.4    6  109829280  2287.4  160  2288.6    5
 ......

How can I extract the unique date in the datetime format from the above DataFrame? To have result like [2012-10-08, 2012-10-10]

如何从上述 DataFrame 中提取日期时间格式的唯一日期?有这样的结果[2012-10-08, 2012-10-10]

采纳答案by DSM

If you have a Serieslike:

如果你有一个Series喜欢:

In [116]: df["Date"]
Out[116]: 
0           2012-10-08 07:12:22
1           2012-10-08 09:14:00
2           2012-10-08 09:15:00
3           2012-10-08 09:15:01
4    2012-10-08 09:15:01.500000
5           2012-10-08 09:15:02
6    2012-10-08 09:15:02.500000
7           2012-10-10 07:19:30
8           2012-10-10 09:14:00
9           2012-10-10 09:15:00
10          2012-10-10 09:15:01
11   2012-10-10 09:15:01.500000
12          2012-10-10 09:15:02
Name: Date

where each object is a Timestamp:

其中每个对象是一个Timestamp

In [117]: df["Date"][0]
Out[117]: <Timestamp: 2012-10-08 07:12:22>

you can get only the date by calling .date():

您只能通过调用获取日期.date()

In [118]: df["Date"][0].date()
Out[118]: datetime.date(2012, 10, 8)

and Series have a .unique()method. So you can use mapand a lambda:

和系列有一个.unique()方法。所以你可以使用map和一个lambda

In [126]: df["Date"].map(lambda t: t.date()).unique()
Out[126]: array([2012-10-08, 2012-10-10], dtype=object)

or use the Timestamp.datemethod:

或使用以下Timestamp.date方法:

In [127]: df["Date"].map(pd.Timestamp.date).unique()
Out[127]: array([2012-10-08, 2012-10-10], dtype=object)

回答by iTayb

Using regex:

使用正则表达式:

(\d{4}-\d{2}-\d{2})

Run it with re.findallfunction to get all matches:

使用re.findall函数运行它以获取所有匹配项:

result = re.findall(r"(\d{4}-\d{2}-\d{2})", subject)

回答by Nicolás Trejo

Just to give an alternative answer to @DSM, look at this other answerfrom @Psidom

只是为了给@DSM 提供一个替代答案,请查看@Psidom 的另一个答案

It would be something like:

它会是这样的:

pd.to_datetime(df['DateTime']).dt.date.unique()

It seems to me that it performs slightly better

在我看来它的表现要好一些