Java 在解析之前确定字符串是否为有效日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/962953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 21:38:02  来源:igfitidea点击:

Determine if a String is a valid date before parsing

javadateparsing

提问by Morgul Master

I have this situation where I am reading about 130K records containing dates stored as String fields. Some records contain blanks (nulls), some contain strings like this: 'dd-MMM-yy' and some contain this 'dd/MM/yyyy'.

我有这种情况,我正在阅读大约 130K 条记录,其中包含存储为字符串字段的日期。一些记录包含空格(空值),一些包含这样的字符串:'dd-MMM-yy',一些包含这个 'dd/MM/yyyy'。

I have written a method like this:

我写了一个这样的方法:

public Date parsedate(String date){

   if(date !== null){
      try{
        1. create a SimpleDateFormat object using 'dd-MMM-yy' as the pattern
        2. parse the date
        3. return the parsed date
      }catch(ParseException e){
          try{
              1. create a SimpleDateFormat object using 'dd/MM/yyy' as the pattern
              2. parse the date
              3. return parsed date
           }catch(ParseException e){
              return null
           }
      }
   }else{
      return null
   }

} 

So you may have already spotted the problem. I am using the try .. catch as part of my logic. It would be better is I can determine before hand that the String actually contains a parseable date in some format then attempt to parse it.

所以你可能已经发现了这个问题。我正在使用 try .. catch 作为我逻辑的一部分。最好是我可以事先确定字符串实际上包含某种格式的可解析日期,然后尝试解析它。

So, is there some API or library that can help with this? I do not mind writing several different Parse classes to handle the different formats and then creating a factory to select the correct6 one, but, how do I determine which one?

那么,是否有一些 API 或库可以帮助解决这个问题?我不介意编写几个不同的 Parse 类来处理不同的格式,然后创建一个工厂来选择正确的 6 个,但是,我如何确定哪一个?

Thanks.

谢谢。

采纳答案by Apocalisp

See Lazy Error Handling in Javafor an overview of how to eliminate try/catch blocks using an Optiontype.

有关如何使用类型消除 try/catch 块的概述,请参阅Java中的延迟错误处理Option

Functional Javais your friend.

函数式 Java是你的朋友。

In essence, what you want to do is to wrap the date parsing in a function that doesn't throw anything, but indicates in its return type whether parsing was successful or not. For example:

本质上,您想要做的是将日期解析包装在一个不抛出任何内容的函数中,但在其返回类型中指示解析是否成功。例如:

import fj.F; import fj.F2;
import fj.data.Option;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import static fj.Function.curry;
import static fj.Option.some;
import static fj.Option.none;
...

F<String, F<String, Option<Date>>> parseDate =
  curry(new F2<String, String, Option<Date>>() {
    public Option<Date> f(String pattern, String s) {
      try {
        return some(new SimpleDateFormat(pattern).parse(s));
      }
      catch (ParseException e) {
        return none();
      }
    }
  });

OK, now you've a reusable date parser that doesn't throw anything, but indicates failure by returning a value of type Option.None. Here's how you use it:

好的,现在您有一个可重用的日期解析器,它不会抛出任何内容,但通过返回一个 type 值来指示失败Option.None。以下是您如何使用它:

import fj.data.List;
import static fj.data.Stream.stream;
import static fj.data.Option.isSome_;
....
public Option<Date> parseWithPatterns(String s, Stream<String> patterns) { 
  return stream(s).apply(patterns.map(parseDate)).find(isSome_()); 
}

That will give you the date parsed with the first pattern that matches, or a value of type Option.None, which is type-safe whereas null isn't.

这将为您提供使用匹配的第一个模式解析的日期,或 Option.None 类型的值,这是类型安全的,而 null 不是。

If you're wondering what Streamis... it's a lazy list. This ensures that you ignore patterns after the first successful one. No need to do too much work.

如果你想知道什么Stream是......这是一个懒惰的列表。这可确保您在第一个成功模式后忽略模式。无需做太多工作。

Call your function like this:

像这样调用你的函数:

for (Date d: parseWithPatterns(someString, stream("dd/MM/yyyy", "dd-MM-yyyy")) {
  // Do something with the date here.
}

Or...

或者...

Option<Date> d = parseWithPatterns(someString,
                                   stream("dd/MM/yyyy", "dd-MM-yyyy"));
if (d.isNone()) {
  // Handle the case where neither pattern matches.
} 
else {
  // Do something with d.some()
}

回答by Colin Burnett

Looks like three options if you only have two, known formats:

如果您只有两种已知格式,则看起来像三个选项:

  • check for the presence of -or /first and start with that parsing for that format.
  • check the length since "dd-MMM-yy" and "dd/MM/yyyy" are different
  • use precompiled regular expressions
  • 首先检查-or的存在/并从该格式的解析开始。
  • 检查长度,因为“dd-MMM-yy”和“dd/MM/yyyy”不同
  • 使用预编译的正则表达式

The latter seems unnecessary.

后者似乎没有必要。

回答by van

Use regular expressions to parse your string. Make sure that you keep both regex's pre-compiled (not create new on every method call, but store them as constants), and compare if it actually is faster then the try-catchyou use.

使用正则表达式来解析您的字符串。确保您保留两个正则表达式的预编译(不是在每次方法调用时都创建新的,而是将它们存储为常量),并比较它是否实际上比try-catch您使用的更快。

I still find it strange that your method returns nullif both versions fail rather then throwing an exception.

null如果两个版本都失败而不是抛出异常,我仍然觉得你的方法返回很奇怪。

回答by cletus

Don't be too hard on yourself about using try-catch in logic: this is one of those situations where Java forces you to so there's not a lot you can do about it.

在逻辑中使用 try-catch 时不要对自己太苛刻:这是 Java 强迫您使用的情况之一,因此您无能为力。

But in this case you could instead use DateFormat.parse(String, ParsePosition).

但在这种情况下,您可以改为使用DateFormat.parse(String, ParsePosition).

回答by van

If you formats are exact (June 7th 1999 would be either 07-Jun-99 or 07/06/1999: you are sure that you have leading zeros), then you could just check for the length of the stringbefore trying to parse.

如果您的格式是准确的(1999 年 6 月 7 日将是 07-Jun-99 或 07/06/1999:您确定您有前导零),那么您可以在尝试解析之前检查字符串的长度

Be careful with the short month name in the first version, because Jun may not be June in another language.

小心第一个版本中的短月份名称,因为在另一种语言中,Jun 可能不是 June。

But if your data is coming from one database, then I would just convert all dates to the common format (it is one-off, but then you control the data and its format).

但是,如果您的数据来自一个数据库,那么我只会将所有日期转换为通用格式(这是一次性的,但您可以控制数据及其格式)。

回答by John Fisher

You can take advantage of regular expressions to determine which format the string is in, and whether it matches any valid format. Something like this (not tested):

您可以利用正则表达式来确定字符串采用哪种格式,以及它是否与任何有效格式匹配。像这样的东西(未测试):

(Oops, I wrote this in C# before checking to see what language you were using.)

(糟糕,在检查您使用的是哪种语言之前,我是用 C# 编写的。)

Regex test = new Regex(@"^(?:(?<formatA>\d{2}-[a-zA-Z]{3}-\d{2})|(?<formatB>\d{2}/\d{2}/\d{3}))$", RegexOption.Compiled);
Match match = test.Match(yourString);
if (match.Success)
{
    if (!string.IsNullOrEmpty(match.Groups["formatA"]))
    {
        // Use format A.
    }
    else if (!string.IsNullOrEmpty(match.Groups["formatB"]))
    {
        // Use format B.
    }
    ...
}

回答by objects

you could use split to determine which format to use

您可以使用 split 来确定要使用的格式

String[] parts = date.split("-");
df = (parts.length==3 ? format1 : format2);

That assumes they are all in one or the other format, you could improve the checking if need be

假设它们都采用一种或另一种格式,如果需要,您可以改进检查

回答by Arelius

In this limited situation, the best (and fastest method) is certinally to parse out the day, then based on the next char either '/' or '-' try to parse out the rest. and if at any point there is unexpected data, return NULL then.

在这种有限的情况下,最好(也是最快的方法)当然是解析一天,然后根据下一个字符 '/' 或 '-' 尝试解析其余部分。如果在任何时候有意外数据,则返回 NULL。

回答by Eddie

Assuming the patterns you gave are the only likely choices, I would look at the String passed in to see which format to apply.

假设您提供的模式是唯一可能的选择,我会查看传入的 String 以查看要应用的格式。

public Date parseDate(final String date) {
  if (date == null) {
    return null;
  }

  SimpleDateFormat format = (date.charAt(2) == '/') ? new SimpleDateFormat("dd/MMM/yyyy")
                                                   : new SimpleDateFormat("dd-MMM-yy");
  try {
    return format.parse(date);
  } catch (ParseException e) {
    // Log a complaint and include date in the complaint
  }
  return null;
}

As others have mentioned, if you can guaranteethat you will neveraccess the DateFormats in a multi-threaded manner, you can make class-level or static instances.

正如其他人所说,如果你能保证,你将永远不会进入DateFormatS IN多线程的方式,可以使类级别或静态实例。

回答by akf

An alternative to creating a SimpleDateFormat (or two) per iteration would be to lazily populate a ThreadLocal container for these formats. This will solve both Thread safety concerns and concerns around object creation performance.

每次迭代创建一个 SimpleDateFormat(或两个)的替代方法是延迟填充这些格式的 ThreadLocal 容器。这将解决线程安全问题和对象创建性能问题。