pandas 在python中匹配日期时间的正则表达式模式

Question

提问by pyd

I have a string contains datetimes, I am trying to split the string based on the datetime occurances,

我有一个包含日期时间的字符串，我试图根据日期时间出现拆分字符串，

data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"

what I am doing,

我在做什么，

out=re.split('^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$',data)

what I get

我得到了什么

["2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"]

What I want:

我想要的是：

["2018-03-14 06:08:18, he went on","2018-03-15 06:08:18, lets play"]

Answer 1

采纳答案by Wiktor Stribi?ew

You want to split with at least 1 whitespace followed with a date like pattern, thus, you may use

您想用至少 1 个空格分割，然后是类似日期的模式，因此，您可以使用

re.split(r'\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)', s)

See the regex demo

查看正则表达式演示

Details

细节

\s+- 1+ whitespace chars
(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)- a positive lookaheadthat makes sure, that immediately to the left of the current location, there are
- \d{2}(?:\d{2})?- 2 or 4 digits
- -- a hyphen
- \d{1,2}- 1 or 2 digits
- -\d{1,2}- again a hyphen and 1 or 2 digits
- \b- a word boundary (if not necessary, remove it, or replace with (?!\d)in case you may have dates glued to letters or other text)

\s+- 1+ 个空白字符
(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)- 一个积极的前瞻，确保在当前位置的左侧，有
- \d{2}(?:\d{2})?- 2 或 4 位数字
- -- 一个连字符
- \d{1,2}- 1 或 2 位数字
- -\d{1,2}- 又是一个连字符和 1 或 2 位数字
- \b- 单词边界（如果不需要，请将其删除，或者替换为(?!\d)以防您将日期粘在字母或其他文本上）

Python demo:

Python 演示：

import re
rex = r"\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
s = "2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"
print(re.split(rex, s))
# => ['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']

NOTEIf there can be no whitespace before the date, in Python 3.7 and newer you may use r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"(note the *quantifier with \s*that will allow zero-length matches). For older versions, you will need to use a solution as @blhsing suggestsor install PyPi regex moduleand use r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"with regex.split.

注意如果日期前不能有空格，则在 Python 3.7 和更新版本中您可以使用r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"（注意*带有的量词\s*将允许零长度匹配）。对于旧版本，您需要使用@blhsing 建议的解决方案或安装PyPi 正则表达式模块并r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"与regex.split.

Answer 2

回答by blhsing

re.splitis meant for cases where you have a certain delimiter pattern. Use re.findallwith a lookahead pattern instead:

re.split适用于具有特定分隔符模式的情况。re.findall与前瞻模式一起使用：

import re
data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"
d = r'\d{4}-\d?\d-\d?\d (?:2[0-3]|[01]?[0-9]):[0-5]?[0-9]:[0-5]?[0-9]'
print(re.findall(r'{0}.*?(?=\s*{0}|$)'.format(d), data, re.DOTALL))

This outputs:

这输出：

['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']

Answer 3

回答by Chris

An similar, but alternative solution using a group instead:

使用组的类似但替代解决方案：

import re

data="2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"

print(re.findall(r'(.*?\D{2,})', data))

Which gives:

这使：

['2018-03-14 06:08:18, he went on ', '2018-03-15 06:08:18, lets play']

pandas 在python中匹配日期时间的正则表达式模式

提问by pyd

采纳答案by Wiktor Stribi?ew

回答by blhsing

回答by Chris

相关推荐

最近更新

标签

pandas 在python中匹配日期时间的正则表达式模式

提问by pyd

采纳答案by Wiktor Stribi?ew

回答by blhsing

回答by Chris

相关推荐

pandas Seaborn 条形图中 X 轴上的日期排序和格式

pandas 将对象类型的数据框列转换为浮动

pandas 当 json_normalize 无法遍历列以展平时如何修复它？

Pandas - 将列名添加到 groupby 的结果中

相关推荐

最近更新

标签