pandas 在python中匹配日期时间的正则表达式模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51395590/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:49:11  来源:igfitidea点击:

regex pattern to match datetime in python

pythonregexpython-3.xpandasdatetime

提问by pyd

I have a string contains datetimes, I am trying to split the string based on the datetime occurances,

我有一个包含日期时间的字符串,我试图根据日期时间出现拆分字符串,

data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"

what I am doing,

我在做什么,

out=re.split('^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$',data)

what I get

我得到了什么

["2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"]

What I want:

我想要的是:

["2018-03-14 06:08:18, he went on","2018-03-15 06:08:18, lets play"]

采纳答案by Wiktor Stribi?ew

You want to split with at least 1 whitespace followed with a date like pattern, thus, you may use

您想用至少 1 个空格分割,然后是类似日期的模式,因此,您可以使用

re.split(r'\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)', s)

See the regex demo

查看正则表达式演示

Details

细节

  • \s+- 1+ whitespace chars
  • (?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)- a positive lookaheadthat makes sure, that immediately to the left of the current location, there are
    • \d{2}(?:\d{2})?- 2 or 4 digits
    • -- a hyphen
    • \d{1,2}- 1 or 2 digits
    • -\d{1,2}- again a hyphen and 1 or 2 digits
    • \b- a word boundary (if not necessary, remove it, or replace with (?!\d)in case you may have dates glued to letters or other text)
  • \s+- 1+ 个空白字符
  • (?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)- 一个积极的前瞻,确保在当前位置的左侧,有
    • \d{2}(?:\d{2})?- 2 或 4 位数字
    • -- 一个连字符
    • \d{1,2}- 1 或 2 位数字
    • -\d{1,2}- 又是一个连字符和 1 或 2 位数字
    • \b- 单词边界(如果不需要,请将其删除,或者替换为(?!\d)以防您将日期粘在字母或其他文本上)

Python demo:

Python 演示

import re
rex = r"\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
s = "2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"
print(re.split(rex, s))
# => ['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']

NOTEIf there can be no whitespace before the date, in Python 3.7 and newer you may use r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"(note the *quantifier with \s*that will allow zero-length matches). For older versions, you will need to use a solution as @blhsing suggestsor install PyPi regex moduleand use r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"with regex.split.

注意如果日期前不能有空格,则在 Python 3.7 和更新版本中您可以使用r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"(注意*带有的量词\s*将允许零长度匹配)。对于旧版本,您需要使用@blhsing 建议的解决方案或安装PyPi 正则表达式模块r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"regex.split.

回答by blhsing

re.splitis meant for cases where you have a certain delimiter pattern. Use re.findallwith a lookahead pattern instead:

re.split适用于具有特定分隔符模式的情况。re.findall与前瞻模式一起使用:

import re
data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"
d = r'\d{4}-\d?\d-\d?\d (?:2[0-3]|[01]?[0-9]):[0-5]?[0-9]:[0-5]?[0-9]'
print(re.findall(r'{0}.*?(?=\s*{0}|$)'.format(d), data, re.DOTALL))

This outputs:

这输出:

['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']

回答by Chris

An similar, but alternative solution using a group instead:

使用组的类似但替代解决方案:

import re

data="2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"

print(re.findall(r'(.*?\D{2,})', data))

Which gives:

这使:

['2018-03-14 06:08:18, he went on ', '2018-03-15 06:08:18, lets play']