Java 使用 \Z 与 \z 作为扫描仪分隔符的行为

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22350037/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 15:08:15  来源:igfitidea点击:

Behavior of using \Z vs \z as Scanner delimiter

javacharacter-encodingeof

提问by letowianka

[Edit] I found the answer, but I can't answer the question due to restrictions on new users. Either way, this is a known bug in Java.

[编辑] 我找到了答案,但由于对新用户的限制,我无法回答问题。无论哪种方式,这都是 Java 中的一个已知错误。

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8028387

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8028387

I'm trying to read a file into a string in Java 6 on 64 bit ubuntu. Java is giving me the very strange result that with "\\Z"it reads the entire file, but with "\\z"it reads the entire string up to 1024 characters. I've read the Java 6 API for all the classes and I am at a loss.

我正在尝试在 64 位 ubuntu 上将文件读入 Java 6 中的字符串。Java 给了我一个非常奇怪的结果,"\\Z"它读取整个文件,但"\\z"读取整个字符串最多 1024 个字符。我已经阅读了所有类的 Java 6 API,但我不知所措。

Description of \Z and \z can be found at:

\Z 和 \z 的说明可以在以下位置找到:

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#lt

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#lt

What could be causing this strange behavior?

什么可能导致这种奇怪的行为?

String fileString = new Scanner(new File(fileName)).useDelimiter("\z").next();
String fileString2 = new Scanner(new File(fileName)).useDelimiter("\Z").next();
System.out.println("using Z : " + fileString2.length());
System.out.println("Using z "+ fileString.length());

Output: using Z : 9720 Using z : 1024

输出:使用 Z : 9720 使用 z : 1024

Thanks!

谢谢!

Details about the file/java-version:

有关文件/java 版本的详细信息:

Running Ubuntu with java-6-openjdk-amd64 (tested also with oracle java6) File is simple text file UTF-8 encoded.

使用 java-6-openjdk-amd64 运行 Ubuntu(也用 oracle java6 测试) 文件是 UTF-8 编码的简单文本文件。

回答by Pshemo

As Patterndocumentation states

正如模式文档所述

  • \zThe end of the input
  • \ZThe end of the input but for the final terminator, if any
  • \z输入结束
  • \Z输入的结尾,但对于最后的终止符,如果有的话

I suspect that since Scanners buffer size is set to 1024,

我怀疑由于 Scanners 缓冲区大小设置为1024

354  private static final int BUFFER_SIZE = 1024; // change to 1024;

Scanner reads this amount of characters and uses it as current input, so \zcan be used here to represent its end, while \Zcan't because it is not "final terminator" (there are more elements in entire input to read).

Scanner 读取这个数量的字符并将其用作当前输入,所以\z这里可以使用它来表示它的结束,而\Z不能因为它不是“最终终止符”(整个输入中有更多元素要读取)。