从Python中的字符串中提取日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3276180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:22:06  来源:igfitidea点击:

Extracting date from a string in Python

pythonstringdate

提问by dmpop

How can I extract the date from a string like "monkey 2010-07-10 love banana"? Thanks!

如何从“猴子 2010-07-10 爱香蕉”之类的字符串中提取日期?谢谢!

采纳答案by lunaryorn

If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:

如果日期以固定形式给出,您可以简单地使用正则表达式来提取日期和“datetime.datetime.strptime”来解析日期:

import re
from datetime import datetime

match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()

Otherwise, if the date is given in an arbitrary form, you can't extract it easily.

否则,如果日期以任意形式给出,则无法轻松提取。

回答by unutbu

Using python-dateutil:

使用python-dateutil

In [1]: import dateutil.parser as dparser

In [18]: dparser.parse("monkey 2010-07-10 love banana",fuzzy=True)
Out[18]: datetime.datetime(2010, 7, 10, 0, 0)

Invalid dates raise a ValueError:

无效日期引发一个ValueError

In [19]: dparser.parse("monkey 2010-07-32 love banana",fuzzy=True)
# ValueError: day is out of range for month

It can recognize dates in many formats:

它可以识别多种格式的日期:

In [20]: dparser.parse("monkey 20/01/1980 love banana",fuzzy=True)
Out[20]: datetime.datetime(1980, 1, 20, 0, 0)

Note that it makes a guess if the date is ambiguous:

请注意,如果日期不明确,它会进行猜测:

In [23]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True)
Out[23]: datetime.datetime(1980, 10, 1, 0, 0)

But the way it parses ambiguous dates is customizable:

但它解析不明确日期的方式是可定制的:

In [21]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True, dayfirst=True)
Out[21]: datetime.datetime(1980, 1, 10, 0, 0)

回答by Finny Abraham

For extracting the date from a string in Python; the best module available is the datefindermodule.

用于从 Python 中的字符串中提取日期;可用的最佳模块是datefinder模块。

You can use it in your Python project by following the easy steps given below.

您可以按照下面给出的简单步骤在您的 Python 项目中使用它。

Step 1: Install datefinder Package

第 1 步:安装 datefinder 包

pip install datefinder

Step 2: Use It In Your Project

第 2 步:在您的项目中使用它

import datefinder

input_string = "monkey 2010-07-10 love banana"
# a generator will be returned by the datefinder module. I'm typecasting it to a list. Please read the note of caution provided at the bottom.
matches = list(datefinder.find_dates(input_string))

if len(matches) > 0:
    # date returned will be a datetime.datetime object. here we are only using the first match.
    date = matches[0]
    print date
else:
    print 'No dates found'

note:if you are expecting a large number of matches; then typecasting to list won't be a recommended way as it will be having a big performance overhead.

注意:如果您期待大量匹配;然后将类型转换为 list 将不是推荐的方式,因为它会产生很大的性能开销。

回答by Aubrey Lavigne

Using Pygrok, you can define abstracted extensions to the Regular Expression syntax.

使用 Pygrok,您可以定义正则表达式语法的抽象扩展。

The custom patterns can be included in your regex in the format %{PATTERN_NAME}.

自定义模式可以格式包含在您的正则表达式中%{PATTERN_NAME}

You can also create a label for that pattern, by separating with a colon: %s{PATTERN_NAME:matched_string}. If the pattern matches, the value will be returned as part of the resulting dictionary (e.g. result.get('matched_string'))

您还可以通过用冒号分隔来为该模式创建标签:%s{PATTERN_NAME:matched_string}。如果模式匹配,该值将作为结果字典的一部分返回(例如result.get('matched_string')

For example:

例如:

from pygrok import Grok

input_string = 'monkey 2010-07-10 love banana'
date_pattern = '%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}'

grok = Grok(date_pattern)
print(grok.match(input_string))

The resulting value will be a dictionary:

结果值将是一个字典:

{'month': '07', 'day': '10', 'year': '2010'}

{'month': '07', 'day': '10', 'year': '2010'}

If the date_pattern does not exist in the input_string, the return value will be None. By contrast, if your pattern does not have any labels, it will return an empty dictionary {}

如果 input_string 中不存在 date_pattern,则返回值将为None。相比之下,如果您的模式没有任何标签,它将返回一个空字典{}

References:

参考:

回答by dsod

If you know the position of the date object in the string (for example in a log file), you can use .split()[index] to extract the date without fully knowing the format.

如果您知道日期对象在字符串中的位置(例如在日志文件中),您可以使用 .split()[index] 来提取日期,而无需完全了解格式。

For example:

例如:

>>> string = 'monkey 2010-07-10 love banana'
>>> date = string.split()[1]
>>> date
'2010-07-10'

回答by adbar

You could also try the dateparsermodule, which may be slower than datefinderon free text but which should cover more potential cases and date formats, as well as a significant number of languages.

您还可以尝试使用dateparser模块,它可能比自由文本上的datefinder慢,但它应该涵盖更多潜在的案例和日期格式,以及大量的语言。