java 获取 URL 的上次修改日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7999258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 22:37:35  来源:igfitidea点击:

Get the Last Modified date of an URL

java

提问by arsenal

I have three code. This is the first one in which I get the metadata information of any url and in that metadata I have LastModified date also. If I run this class then I get last modified date of url as--

我有三个代码。这是我获取任何 url 的元数据信息的第一个,并且在该元数据中我也有 LastModified 日期。如果我运行这个类,那么我会得到 url 的最后修改日期——

key:- Last-Modified
value:- 2011-10-21T03:18:28Z

First one

第一

public class App {

    private static Map<String, String> metaData;    

public static void main(String[] args) {

        Tika t = new Tika();

        Metadata md = new Metadata();
        URL u = null;
        try {
            u = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");

            String content1= t.parseToString(u);
            System.out.println("hello" +content1);
        } catch (MalformedURLException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (TikaException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        try {
            Reader r = t.parse(u.openStream(), md);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        try {
        for (String name : md.names()){
            String value = md.get(name);
            System.out.println("key:- " +name);
            System.out.println("value:- " +value);
            //getMetaData().put(name.toLowerCase(), md.get(name));
        }
        }
        catch(Exception e) {
            e.printStackTrace();
        }

    }

}

But for second example just below this when I run this code and with the same url. I get different Last Modified date of that URL. How to make sure which one is right. As I tried opening that pdf in the browser but instead of getting open in the browser. it is getting open with Adobe PDF on the computer not on the browser so I am not able to check through firebug.

但是对于下面的第二个示例,当我运行此代码并使用相同的 url 时。我得到了该 URL 的不同上次修改日期。如何确定哪一个是正确的。当我尝试在浏览器中打开该 pdf 而不是在浏览器中打开时。它在计算机上而非浏览器上使用 Adob​​e PDF 打开,因此我无法通过 firebug 检查。

Second Way--

第二种方式——

public class LastMod{
  public static void main(String args[]) throws Exception {
    URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");

    System.out.println("URL:- " +url);
    URLConnection connection = url.openConnection();


    System.out.println(connection.getHeaderField("Last-Modified"));
    }
}

For the above one I get Las Mod date as-

对于上面的一个,我得到 Las Mod 日期为-

Thu, 03 Nov 2011 16:59:41 +0000

Third Way--

第三种方式——

public class Main{
  public static void main(String args[]) throws Exception {
    URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");
    HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();

    long date = httpCon.getLastModified();
    if (date == 0)
      System.out.println("No last-modified information.");
    else
      System.out.println("Last-Modified: " + new Date(date));

 }
}

And by third method I get it like this--

通过第三种方法,我是这样理解的——

Last-Modified: Thu Nov 03 09:59:41 PDT 2011

I am confuse which one is right. I think first one is right. Any suggestions will be appreciated..

我很困惑哪个是对的。我觉得第一个是对的。任何建议将不胜感激..

采纳答案by Andreas Veithen

The first piece of code extracts the date from the metadata of the PDF file, while the two other ones get the information from the HTTP headers returned by the Web server. The first one is probably more accurate if you want to know when the document was created/modified.

第一段代码从 PDF 文件的元数据中提取日期,而另外两段代码从 Web 服务器返回的 HTTP 标头中获取信息。如果您想知道文档是何时创建/修改的,第一个可能更准确。

回答by Bozho

The best option is the third one - connection.getLastModified(), because it is the most easy-to-use method and has the highest level of abstraction. All the rest are on lower levels of abstraction: the first reads the raw response, and the second reads the raw header. The third reads the header and converts it to long.

最好的选择是第三个 - connection.getLastModified(),因为它是最易于使用的方法,并且具有最高的抽象级别。其余的都在较低的抽象级别上:第一个读取原始响应,第二个读取原始标头。第三个读取标题并将其转换为long。

The difference between the outputs is due to the timezone. Using new Date()you use the VM default timezone. Prefer Calendar, or best - joda-time DateTimewhich support custom time zones.

输出之间的差异是由于时区。使用new Date()您使用 VM 默认时区。首选日历,或最好的 -DateTime支持自定义时区的joda-time 。

回答by Ondrej Bozek

The last modified date should be in GMT (RFC 2822) so you should get get it like this:

最后修改日期应该在 GMT (RFC 2822) 中,所以你应该像这样得到它:

HttpURLConnection connection = (HttpURLConnection) url.openConnection();
Long dateTime = connection.getLastModified();
connection.disconnect();
ZonedDateTime urlLastModified = ZonedDateTime.ofInstant(Instant.ofEpochMilli(dateTime), ZoneId.of("GMT"));