如何在 Java 中识别/处理文本文件换行符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/3022407/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to identify/handle text file newlines in Java?
提问by rafrafUk
I get files in different formats coming from different systems that I need to import into our database. Part of the import process it to check the line length to make sure the format is correct. We seem to be having issues with files coming from UNIX systems where one character is added. I suspect this is due to the return carriage being encoded differently on UNIX and windows platform.
我从不同系统获取不同格式的文件,我需要将这些文件导入到我们的数据库中。导入过程的一部分是检查行长度以确保格式正确。我们似乎遇到了来自 UNIX 系统的文件的问题,其中添加了一个字符。我怀疑这是由于回车在 UNIX 和 Windows 平台上的编码方式不同。
Is there a way to detect on which file system a file was created, other than checking the last character on the line? Or maybe a way of reading the files as text and not binary which I suspect is the issue?
除了检查行上的最后一个字符之外,有没有办法检测文件是在哪个文件系统上创建的?或者也许是一种将文件作为文本而不是二进制读取的方法,我怀疑这是问题所在?
Thanks Guys !
多谢你们 !
回答by ThiefMaster
Unix systems use \nline endings while windows uses \r\nand mac uses \r.
You cannot detect the file system since it doesn't matter at all. I can use \n on windows if my editor supports it for example. It's just the standard on those OS, not a requirement.
Unix 系统使用\n行结尾,而 windows\r\n使用\r. 您无法检测到文件系统,因为它根本无关紧要。例如,如果我的编辑器支持它,我可以在 Windows 上使用 \n。这只是这些操作系统的标准,而不是要求。
The proper way - assuming you don't have a function which properly tokenizes no matter what line ending the file uses - is to search for a \n OR a \r and then end the current line and strip all chars from the remaining data which are either \r or \n before you begin the next line. However, this will cause issues if you have blank lines and need to keep them. In this case you have to look at linebreaks more carefully:
正确的方法 - 假设您没有一个函数可以正确标记文件使用哪一行结束 - 是搜索 \n 或 \r 然后结束当前行并从剩余数据中去除所有字符在开始下一行之前是 \r 或 \n。但是,如果您有空行并且需要保留它们,这将导致问题。在这种情况下,您必须更仔细地查看换行符:
- when reading a \n, end the current line and start the next line
 - when reading a \r, end the current line and, if the next char is \n, skip it, and start the next line, otherwise start the new line immediately.
 
- 读取\n时,结束当前行并开始下一行
 - 读取 \r 时,结束当前行,如果下一个字符是 \n,则跳过它,并开始下一行,否则立即开始新行。
 
回答by Craig Trader
Most of the time Java will handle differing types of line endings automatically, silently parsing \n(unix) \r\n(windows) and \r(mac) without bothering you (as long as you're using a character stream).  See the docs for java.io.FileReaderand friends.  Using a character stream will also handle all of the possible Unicode encoding schemes.
大多数情况下,Java 会自动处理不同类型的行尾,静默解析\n(unix) \r\n(windows) 和\r(mac) 而不会打扰您(只要您使用字符流)。请参阅java.io.FileReader和朋友的文档。使用字符流还可以处理所有可能的 Unicode 编码方案。
If you want to read the line separators explicitly, you'll need to read the file as a byte stream.  See the docs for java.io.DataInputStreamand friends.
如果要显式读取行分隔符,则需要将文件作为字节流读取。请参阅java.io.DataInputStream和朋友的文档。
回答by Stephen C
Is there a way to detect on which file system a file was created, other than checking the last character on the line?
除了检查行上的最后一个字符之外,有没有办法检测文件是在哪个文件系统上创建的?
No. And even checking the line termination sequence is only a hint. We can easily create files with DOS line termination on UNIX, and vice versa.
不。甚至检查行终止序列也只是一个提示。我们可以轻松地在 UNIX 上创建带有 DOS 行终止的文件,反之亦然。
Or maybe a way of reading the files as text and not binary which I suspect is the issue?
或者也许是一种将文件作为文本而不是二进制读取的方法,我怀疑这是问题所在?
Yes.  Open the file using a file reader, wrap it in a buffered reader, and use the readLine()method to read the file a line at a time.  This method recognizes a "\n", "\r"or "\r\n"as a line separator, and hence works for DOS, UNIX and Mac files.
是的。使用文件阅读器打开文件,将其包装在缓冲阅读器中,并使用该readLine()方法一次读取文件一行。此方法将"\n","\r"或识别"\r\n"为行分隔符,因此适用于 DOS、UNIX 和 Mac 文件。
Here's some typical code:
下面是一些典型的代码:
    Reader r = new FileReader("somefile");
    try {
        BufferedReader br = new BufferedReader(r);
        String line;
        while ((line = r.readLine()) != null) {
            // process line
        }
    } finally {
        r.close();
    }

