如何从文件内容创建 Java 字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/326390/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I create a Java string from the contents of a file?
提问by OscarRyz
I've been using the idiom below for some time now. And it seems to be the most wide-spread, at least on the sites I've visited.
我已经使用下面的成语有一段时间了。它似乎是最普遍的,至少在我访问过的网站上是这样。
Is there a better/different way to read a file into a string in Java?
有没有更好/不同的方式将文件读入 Java 中的字符串?
private String readFile(String file) throws IOException {
BufferedReader reader = new BufferedReader(new FileReader (file));
String line = null;
StringBuilder stringBuilder = new StringBuilder();
String ls = System.getProperty("line.separator");
try {
while((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
return stringBuilder.toString();
} finally {
reader.close();
}
}
回答by Willi aus Rohr
If you're willing to use an external library, check out Apache Commons IO(200KB JAR). It contains an org.apache.commons.io.FileUtils.readFileToString()
method that allows you to read an entire File
into a String
with one line of code.
如果您愿意使用外部库,请查看Apache Commons IO(200KB JAR)。它包含一种org.apache.commons.io.FileUtils.readFileToString()
方法,允许您用一行代码将整个读File
入String
。
Example:
例子:
import java.io.*;
import java.nio.charset.*;
import org.apache.commons.io.*;
public String readFile() throws IOException {
File file = new File("data.txt");
return FileUtils.readFileToString(file, StandardCharsets.UTF_8);
}
回答by Claudiu
Java attempts to be extremely general and flexible in all it does. As a result, something which is relatively simple in a scripting language (your code would be replaced with "open(file).read()
" in python) is a lot more complicated. There doesn't seem to be any shorter way of doing it, except using an external library (like Willi aus Rohrmentioned). Your options:
Java 试图在所有方面都非常通用和灵活。结果,在脚本语言中相对简单的东西(open(file).read()
在 python 中你的代码将被替换为“ ”)要复杂得多。除了使用外部库(如Willi aus Rohr提到的)之外,似乎没有任何更短的方法可以做到这一点。您的选择:
- Use an external library.
- Copy this code into all your projects.
- Create your own mini-library which contains functions you use often.
- 使用外部库。
- 将此代码复制到您的所有项目中。
- 创建您自己的迷你库,其中包含您经常使用的功能。
Your best bet is probably the 2nd one, as it has the least dependencies.
您最好的选择可能是第二个,因为它具有最少的依赖性。
回答by erickson
Read all text from a file
从文件中读取所有文本
Java 11 added the readString()method to read small files as a String
, preserving line terminators:
Java 11 添加了readString()方法来读取小文件作为String
,保留行终止符:
String content = Files.readString(path, StandardCharsets.US_ASCII);
For versions between Java 7 and 11, here's a compact, robust idiom, wrapped up in a utility method:
对于 Java 7 和 11 之间的版本,这里有一个紧凑、健壮的习惯用法,包含在一个实用程序方法中:
static String readFile(String path, Charset encoding)
throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
Read lines of text from a file
从文件中读取文本行
Java 7 added a convenience method to read a file as lines of text,represented as a List<String>
. This approach is "lossy" because the line separators are stripped from the end of each line.
Java 7 添加了一种方便的方法来读取文件作为文本行,表示为List<String>
. 这种方法是“有损的”,因为行分隔符从每行的末尾剥离。
List<String> lines = Files.readAllLines(Paths.get(path), encoding);
Java 8 added the Files.lines()
method to produce a Stream<String>
. Again, this method is lossy because line separators are stripped. If an IOException
is encountered while reading the file, it is wrapped in an UncheckedIOException
, since Stream
doesn't accept lambdas that throw checked exceptions.
Java 8 添加了Files.lines()
生成Stream<String>
. 同样,这种方法是有损的,因为行分隔符被剥离了。如果IOException
在读取文件时遇到an ,则将其包装在 an 中UncheckedIOException
,因为Stream
它不接受抛出已检查异常的 lambda。
try (Stream<String> lines = Files.lines(path, encoding)) {
lines.forEach(System.out::println);
}
This Stream
does need a close()
call; this is poorly documented on the API, and I suspect many people don't even notice Stream
has a close()
method. Be sure to use an ARM-block as shown.
这Stream
确实需要一个close()
电话;这在 API 上的记录很差,我怀疑很多人甚至没有注意到Stream
有一个close()
方法。请务必使用如图所示的 ARM 模块。
If you are working with a source other than a file, you can use the lines()
method in BufferedReader
instead.
如果您正在使用文件以外的源,则可以使用lines()
in 方法BufferedReader
。
Memory utilization
内存利用率
The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.
第一种保留换行符的方法可能会临时需要数倍于文件大小的内存,因为在短时间内原始文件内容(字节数组)和解码后的字符(即使经过编码也是 16 位)作为文件中的 8 位)一次驻留在内存中。应用于您知道相对于可用内存较小的文件是最安全的。
The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.
第二种方法,读取行,通常内存效率更高,因为用于解码的输入字节缓冲区不需要包含整个文件。但是,它仍然不适合相对于可用内存非常大的文件。
For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String>
is one way to do this, if your input "records" happen to be individual lines. (Using the readLine()
method of BufferedReader
is the procedural equivalent to this approach.)
为了读取大文件,您需要为您的程序进行不同的设计,从流中读取一大块文本,处理它,然后移动到下一个,重用相同的固定大小的内存块。在这里,“大”取决于计算机规格。如今,这个阈值可能是数 GB 的 RAM。Stream<String>
如果您的输入“记录”恰好是单独的行,则使用 a 的第三种方法是执行此操作的一种方法。(使用 的readLine()
方法BufferedReader
与此方法在程序上等效。)
Character encoding
字符编码
One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.
原始帖子中的示例中缺少的一件事是字符编码。在某些特殊情况下,平台默认值正是您想要的,但这种情况很少见,您应该能够证明您的选择是合理的。
The StandardCharsets
class define some constants for the encodings required of all Java runtimes:
在StandardCharsets
类中定义的所有Java运行时所需要的编码的一些常量:
String content = readFile("test.txt", StandardCharsets.UTF_8);
The platform default is available from the Charset
classitself:
该平台默认可从该Charset
类本身:
String content = readFile("test.txt", Charset.defaultCharset());
Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.
注意:这个答案很大程度上取代了我的 Java 6 版本。Java 7 的实用程序安全地简化了代码,并且使用映射字节缓冲区的旧答案阻止了读取的文件被删除,直到映射缓冲区被垃圾收集。您可以通过此答案上的“已编辑”链接查看旧版本。
回答by Dónal
If you're looking for an alternative that doesn't involve a third-party library (e.g. Commons I/O), you can use the Scannerclass:
如果您正在寻找不涉及第三方库(例如Commons I/O)的替代方案,您可以使用Scanner类:
private String readFile(String pathname) throws IOException {
File file = new File(pathname);
StringBuilder fileContents = new StringBuilder((int)file.length());
try (Scanner scanner = new Scanner(file)) {
while(scanner.hasNextLine()) {
fileContents.append(scanner.nextLine() + System.lineSeparator());
}
return fileContents.toString();
}
}
回答by Jon Skeet
That code will normalize line breaks, which may or may not be what you really want to do.
该代码将规范换行符,这可能是也可能不是您真正想要做的。
Here's an alternative which doesn't do that, and which is (IMO) simpler to understand than the NIO code (although it still uses java.nio.charset.Charset
):
这是一个不这样做的替代方案,并且(IMO)比 NIO 代码更容易理解(尽管它仍然使用java.nio.charset.Charset
):
public static String readFile(String file, String csName)
throws IOException {
Charset cs = Charset.forName(csName);
return readFile(file, cs);
}
public static String readFile(String file, Charset cs)
throws IOException {
// No real need to close the BufferedReader/InputStreamReader
// as they're only wrapping the stream
FileInputStream stream = new FileInputStream(file);
try {
Reader reader = new BufferedReader(new InputStreamReader(stream, cs));
StringBuilder builder = new StringBuilder();
char[] buffer = new char[8192];
int read;
while ((read = reader.read(buffer, 0, buffer.length)) > 0) {
builder.append(buffer, 0, read);
}
return builder.toString();
} finally {
// Potential issue here: if this throws an IOException,
// it will mask any others. Normally I'd use a utility
// method which would log exceptions and swallow them
stream.close();
}
}
回答by Dan Dyer
There is a variation on the same theme that uses a for loop, instead of a while loop, to limit the scope of the line variable. Whether it's "better" is a matter of personal taste.
同一主题有一个变体,它使用 for 循环而不是 while 循环来限制 line 变量的范围。是否“更好”是个人品味的问题。
for(String line = reader.readLine(); line != null; line = reader.readLine()) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
回答by Scott S. McCoy
public static String slurp (final File file)
throws IOException {
StringBuilder result = new StringBuilder();
BufferedReader reader = new BufferedReader(new FileReader(file));
try {
char[] buf = new char[1024];
int r = 0;
while ((r = reader.read(buf)) != -1) {
result.append(buf, 0, r);
}
}
finally {
reader.close();
}
return result.toString();
}
回答by finnw
Guavahas a method similar to the one from Commons IOUtils that Willi aus Rohr mentioned:
Guava有一种类似于 Willi aus Rohr 提到的 Commons IOUtils 的方法:
import com.google.common.base.Charsets;
import com.google.common.io.Files;
// ...
String text = Files.toString(new File(path), Charsets.UTF_8);
EDIT by PiggyPigletFiles#toString
is deprecated, and due for removal Octobor 2019. Instead use
Files.asCharSource(new File(path), StandardCharsets.UTF_8).read();
PiggyPiglet 的 EDITFiles#toString
已弃用,将于 2019 年 10 月移除。改为使用
Files.asCharSource(new File(path), StandardCharsets.UTF_8).read();
EDIT by Oscar Reyes
奥斯卡雷耶斯编辑
This is the (simplified) underlying code on the cited library:
这是引用库上的(简化的)底层代码:
InputStream in = new FileInputStream(file);
byte[] b = new byte[file.length()];
int len = b.length;
int total = 0;
while (total < len) {
int result = in.read(b, total, len - total);
if (result == -1) {
break;
}
total += result;
}
return new String( b , Charsets.UTF_8 );
Edit(by Jonik): The above doesn't match the source code of recent Guava versions. For the current source, see the classes Files, CharStreams, ByteSourceand CharSourcein com.google.common.iopackage.
编辑(由 Jonik):以上与最近的 Guava 版本的源代码不匹配。对于电流源,看到类文件,CharStreams,ByteSource和CharSource在com.google.common.io包。
回答by Peter Lawrey
To read a File as binary and convert at the end
将文件读取为二进制文件并在最后进行转换
public static String readFileAsString(String filePath) throws IOException {
DataInputStream dis = new DataInputStream(new FileInputStream(filePath));
try {
long len = new File(filePath).length();
if (len > Integer.MAX_VALUE) throw new IOException("File "+filePath+" too large, was "+len+" bytes.");
byte[] bytes = new byte[(int) len];
dis.readFully(bytes);
return new String(bytes, "UTF-8");
} finally {
dis.close();
}
}
回答by Pablo Grisafi
A very lean solution based on Scanner
:
一个非常精简的解决方案,基于Scanner
:
Scanner scanner = new Scanner( new File("poem.txt") );
String text = scanner.useDelimiter("\A").next();
scanner.close(); // Put this call in a finally block
Or, if you want to set the charset:
或者,如果要设置字符集:
Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" );
String text = scanner.useDelimiter("\A").next();
scanner.close(); // Put this call in a finally block
Or, with a try-with-resourcesblock, which will call scanner.close()
for you:
或者,使用try-with-resources块,它将调用scanner.close()
您:
try (Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" )) {
String text = scanner.useDelimiter("\A").next();
}
Remember that the Scanner
constructor can throw an IOException
. And don't forget to import java.io
and java.util
.
请记住,Scanner
构造函数可以抛出一个IOException
. 并且不要忘记导入java.io
和java.util
。
Source: Pat Niemeyer's blog
资料来源:Pat Niemeyer 的博客