Java 使用非常大的字符串不好?(爪哇)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1494772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bad to use very large strings? (Java)
提问by Lchi
Are there any negatives to creating huge strings? For instance, if we're reading in text from a potentially huge text file:
创建巨大的字符串有什么负面影响吗?例如,如果我们从一个潜在的巨大文本文件中读取文本:
while (scanner.hasNext()) {
someString += scanner.next();
}
// do something cool with someString
Would processing the file line by line be (generally) a better solution, and why?
逐行处理文件(通常)是更好的解决方案,为什么?
Thanks
谢谢
采纳答案by Jon Skeet
Streaming vs not
流媒体与非
When you can stream, you can handle files of anysize (assuming you really can forget all the data you've already seen). You end up with a naturally O(n) complexity, which is a very good thing. You don't break by running out of memory.
当您可以流式传输时,您可以处理任何大小的文件(假设您真的可以忘记您已经看到的所有数据)。你最终会得到一个自然的 O(n) 复杂度,这是一件非常好的事情。你不会因为内存不足而崩溃。
Streaming is lovely... but doesn't work in every scenario.
流媒体很可爱……但并非在所有情况下都有效。
StringBuilder
字符串生成器
As it seems there's been a certain amount of controversy over the StringBuilder
advice, here's a benchmark to show the effects. I had to reduce the size of the benchmark in order to get the slow version to even finish in a reasonable time.
由于似乎对StringBuilder
建议存在一定的争议,因此这里有一个显示效果的基准。我不得不减少基准测试的大小,以便让慢速版本在合理的时间内完成。
Results first, then code. This is a very rough and ready benchmark, but the results are dramatic enough to make the point...
先出结果,再写代码。这是一个非常粗略和准备好的基准测试,但结果足以说明这一点......
c:\Users\Jon\Test>java Test slow
Building a string of length 120000 without StringBuilder took 21763ms
c:\Users\Jon\Test>java Test fast
Building a string of length 120000 with StringBuilder took 7ms
And the code...
还有代码...
class FakeScanner
{
private int linesLeft;
private final String line;
public FakeScanner(String line, int count)
{
linesLeft = count;
this.line = line;
}
public boolean hasNext()
{
return linesLeft > 0;
}
public String next()
{
linesLeft--;
return line;
}
}
public class Test
{
public static void main(String[] args)
{
FakeScanner scanner = new FakeScanner("test", 30000);
boolean useStringBuilder = "fast".equals(args[0]);
// Accurate enough for this test
long start = System.currentTimeMillis();
String someString;
if (useStringBuilder)
{
StringBuilder builder = new StringBuilder();
while (scanner.hasNext())
{
builder.append(scanner.next());
}
someString = builder.toString();
}
else
{
someString = "";
while (scanner.hasNext())
{
someString += scanner.next();
}
}
long end = System.currentTimeMillis();
System.out.println("Building a string of length "
+ someString.length()
+ (useStringBuilder ? " with" : " without")
+ " StringBuilder took " + (end - start) + "ms");
}
}
回答by Keith Adler
Use the StringBuilder. Your approach is creating potentially thousands of throw-away objects. Strings are immutable objects, meaning that once you create one you can't change it ... you can only create a new String and assign the reference to your current instance. StringBuilder will be hundreds if not thousands of times more effecient in speed and memory.
使用 StringBuilder。您的方法可能是创建数以千计的一次性对象。字符串是不可变的对象,这意味着一旦你创建了一个你就不能改变它……你只能创建一个新的字符串并将引用分配给当前的实例。StringBuilder 在速度和内存方面的效率将提高数百甚至数千倍。
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
Most Java compilers however will now optimize things out for you, but it's a good practice to code right upfront.
然而,大多数 Java 编译器现在都会为您优化,但预先编写代码是一种很好的做法。
回答by hrnt
I believe that creates a new String object every time you do a +=. Use StringBuilder
instead.
我相信每次执行 += 时都会创建一个新的 String 对象。使用StringBuilder
来代替。
回答by aperkins
As Jon Skeet said, streaming is a more robust way of handling data. Also, Strings have a finite size of Max_INT characters - so if your files are likely to be larger than that, you should consider handling the data streaming if at all possible.
正如 Jon Skeet 所说,流式传输是一种更强大的数据处理方式。此外,字符串具有有限大小的 Max_INT 字符 - 因此,如果您的文件可能大于此大小,则应尽可能考虑处理数据流。
回答by Michael Aaron Safyan
What if the input is larger than the system's memory (e.g. the input is being generated by another computer over an HTTP connection)? If you process one line at a time, you are always making progress, and you will eventually process the entire input, assuming that the input is finite. However, if you wait to see the entire input, before performing any processing, you will run out of memory and break.
如果输入大于系统内存怎么办(例如,输入是由另一台计算机通过 HTTP 连接生成的)?如果你一次处理一行,你总是在进步,你最终会处理整个输入,假设输入是有限的。但是,如果在执行任何处理之前等待查看整个输入,则会耗尽内存并中断。
In general, it is good to process data in a streaming manner. This also applies to performing processing using iterators rather than random-access, when possible. It will allow your program to scale up to very large input sizes, and it also allows your program to be pipelined (i.e. another program can start processing your programs output, while your program is still in the middle of processing its own input). In this day and age of large media transmissions between many different computers, this is almost always a good idea to support.
一般来说,以流式方式处理数据是好的。如果可能,这也适用于使用迭代器而不是随机访问来执行处理。它将允许您的程序扩展到非常大的输入大小,并且它还允许您的程序被流水线化(即另一个程序可以开始处理您的程序输出,而您的程序仍在处理自己的输入)。在当今许多不同计算机之间进行大型媒体传输的时代,这几乎总是一个值得支持的好主意。
回答by Adamski
A couple of extra points:
补充几点:
- If you read a very large amount of data into
StringBuilder
and then calltoString()
the JVM will temporarily require double the amount ofchar[]
storage spaceduring the conversion. If you can process the data as aCharSequence
(StringBuilder
implementsCharSequence
) you can avoid this. - Another thing you try if you do need to read all data into memory is to represent the
String
as a list of words (i.e.List<String>
) and callintern()
on each word. If the data contains large numbers of repeated words this will represent a significant saving in memory.
- 如果读入非常大量的数据
StringBuilder
,然后调用toString()
JVM 将在转换期间临时需要两倍的char[]
存储空间量。如果您可以将数据作为CharSequence
(StringBuilder
实现CharSequence
)处理,则可以避免这种情况。 - 如果确实需要将所有数据读入内存,则尝试的另一件事是将 表示
String
为单词列表(即List<String>
)并调用intern()
每个单词。如果数据包含大量重复的单词,这将显着节省内存。