pandas 在python中匹配日期时间的正则表达式模式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51395590/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
regex pattern to match datetime in python
提问by pyd
I have a string contains datetimes, I am trying to split the string based on the datetime occurances,
我有一个包含日期时间的字符串,我试图根据日期时间出现拆分字符串,
data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"
what I am doing,
我在做什么,
out=re.split('^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$',data)
what I get
我得到了什么
["2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"]
What I want:
我想要的是:
["2018-03-14 06:08:18, he went on","2018-03-15 06:08:18, lets play"]
采纳答案by Wiktor Stribi?ew
You want to split with at least 1 whitespace followed with a date like pattern, thus, you may use
您想用至少 1 个空格分割,然后是类似日期的模式,因此,您可以使用
re.split(r'\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)', s)
See the regex demo
查看正则表达式演示
Details
细节
\s+
- 1+ whitespace chars(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)
- a positive lookaheadthat makes sure, that immediately to the left of the current location, there are\d{2}(?:\d{2})?
- 2 or 4 digits-
- a hyphen\d{1,2}
- 1 or 2 digits-\d{1,2}
- again a hyphen and 1 or 2 digits\b
- a word boundary (if not necessary, remove it, or replace with(?!\d)
in case you may have dates glued to letters or other text)
\s+
- 1+ 个空白字符(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)
- 一个积极的前瞻,确保在当前位置的左侧,有\d{2}(?:\d{2})?
- 2 或 4 位数字-
- 一个连字符\d{1,2}
- 1 或 2 位数字-\d{1,2}
- 又是一个连字符和 1 或 2 位数字\b
- 单词边界(如果不需要,请将其删除,或者替换为(?!\d)
以防您将日期粘在字母或其他文本上)
import re
rex = r"\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
s = "2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"
print(re.split(rex, s))
# => ['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']
NOTEIf there can be no whitespace before the date, in Python 3.7 and newer you may use r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
(note the *
quantifier with \s*
that will allow zero-length matches). For older versions, you will need to use a solution as @blhsing suggestsor install PyPi regex moduleand use r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
with regex.split
.
注意如果日期前不能有空格,则在 Python 3.7 和更新版本中您可以使用r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
(注意*
带有的量词\s*
将允许零长度匹配)。对于旧版本,您需要使用@blhsing 建议的解决方案或安装PyPi 正则表达式模块并r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
与regex.split
.
回答by blhsing
re.split
is meant for cases where you have a certain delimiter pattern. Use re.findall
with a lookahead pattern instead:
re.split
适用于具有特定分隔符模式的情况。re.findall
与前瞻模式一起使用:
import re
data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"
d = r'\d{4}-\d?\d-\d?\d (?:2[0-3]|[01]?[0-9]):[0-5]?[0-9]:[0-5]?[0-9]'
print(re.findall(r'{0}.*?(?=\s*{0}|$)'.format(d), data, re.DOTALL))
This outputs:
这输出:
['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']
回答by Chris
An similar, but alternative solution using a group instead:
使用组的类似但替代解决方案:
import re
data="2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"
print(re.findall(r'(.*?\D{2,})', data))
Which gives:
这使:
['2018-03-14 06:08:18, he went on ', '2018-03-15 06:08:18, lets play']