创建特定大小 (MB) 的 Java 变量(字符串)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2474486/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:09:58  来源:igfitidea点击:

Create A Java Variable (String) of a specific size (MB's)

java

提问by Bernie Perez

I am trying to benchmark some code. I am sending a String msg over sockets. I want to send 100KB, 2MB, and 10MB String variables. Is there an easy way to create a variable of these sizes?

我正在尝试对一些代码进行基准测试。我正在通过套接字发送一个字符串味精。我想发送 100KB、2MB 和 10MB 的字符串变量。有没有一种简单的方法来创建这些大小的变量?

Currently I am doing this.

目前我正在这样做。

private static String createDataSize(int msgSize) {
    String data = "a";
    while(data.length() < (msgSize*1024)-6) {
        data += "a";
    }
    return data;
}

But this takes a very long time. Is there a better way?

但这需要很长时间。有没有更好的办法?

UPDATE: Thanks, I am doing this now.

更新:谢谢,我现在正在这样做。

/**
 * Creates a message of size @msgSize in KB.
 */
private static String createDataSize(int msgSize) {
    // Java chars are 2 bytes
    msgSize = msgSize/2;
    msgSize = msgSize * 1024;
    StringBuilder sb = new StringBuilder(msgSize);
    for (int i=0; i<msgSize; i++) {
        sb.append('a');
    }
    return sb.toString();
  }

采纳答案by cletus

Java chars are 2 bytes (16 bits unsigned) in size. So if you want 2MB you need one million characters. There are two obvious issues with your code:

Javachar的大小为 2 个字节(16 位无符号)。因此,如果您想要 2MB,则需要一百万个字符。您的代码有两个明显的问题:

  1. Repeatedly calling length()is unnecessary. Add any character to a Java Stringand it's length goes up by 1, regardless of what the character is. Perhaps you're confusing this with the size in bytes. It doesn't mean that; and
  2. You have huge memory fragmentation issues with that code.
  1. 重复调用length()是不必要的。将任何字符添加到 Java String,无论字符是什么,它的长度都会增加 1。也许您将其与以字节为单位的大小混淆了。不是这个意思;和
  2. 该代码存在巨大的内存碎片问题。

To further explain (2), the String concatenation operator (+) in Java causes a new Stringto be created because Java Strings are immutable. So:

为了进一步解释 (2),+Java 中的字符串连接运算符 ( ) 会导致String创建new ,因为 JavaString是不可变的。所以:

String a = "a";
a += "b";

actually means:

实际上的意思是:

String a = "a";
String a = a + "b";

This sometimes confuses former C++ programmers as strings work differently in C++.

这有时会使前 C++ 程序员感到困惑,因为字符串在 C++ 中的工作方式不同。

So your code is actually allocating a million strings for a message size of one million. Only the last one is kept. The others are garbage that will be cleaned up but there is no need for it.

因此,您的代码实际上是为一百万个消息大小分配一百万个字符串。只保留最后一个。其他是垃圾,将被清理,但没有必要。

A better version is:

更好的版本是:

private static String createDataSize(int msgSize) {
  StringBuilder sb = new StringBuilder(msgSize);
  for (int i=0; i<msgSize; i++) {
    sb.append('a');
  }
  return sb.toString();
}

The key difference is that:

主要区别在于:

  1. A StringBuilderis mutable so doesn't need to be reallocated with each change; and
  2. The StringBuilderis preallocated to the right size in this code sample.
  1. AStringBuilder是可变的,因此不需要在每次更改时重新分配;和
  2. StringBuilder被预分配给这个代码示例中的权利的大小。

Note:the astute may have noticed I've done:

注意:精明的人可能已经注意到我已经完成了:

sb.append('a');

rather than:

而不是:

sb.append("a");

'a'of course is a single character, "a"is a String. You could use either in this case.

'a'当然是单个字符,"a"是一个String。在这种情况下,您可以使用其中任何一个。

However, it's not that simple because it depends on how the bytes are encoded. Typically unless you specify it otherwise it'll use UTF8, which is variable width characters. So one million characters might be anywhere from 1MB to 4MB in size depending on you end up encoding it and your question doesn't contain details of that.

然而,这并不那么简单,因为这取决于字节的编码方式。通常,除非您指定它,否则它将使用 UTF8,这是可变宽度字符。因此,一百万个字符的大小可能从 1MB 到 4MB 不等,这取决于您最终对其进行编码,而您的问题不包含详细信息。

If you need data of a specific size and that data doesn't matter, my advice would be to simply use a bytearray of the right size.

如果您需要特定大小的数据并且该数据无关紧要,我的建议是简单地使用byte正确大小的数组。

回答by Hyman

yes, there is.. using a buffered string object:

是的,有.. 使用缓冲字符串对象:

StringBuilder stringB = new StringBuilder(2000000); //for the 2mb one
String paddingString = "abcdefghijklmnopqrs";

while (stringB.length() + paddingString.length() < 2000000)
 stringB.append(paddingString);

//use it
stringB.toString()

回答by Chris Jester-Young

You can simply create a large character array.

您可以简单地创建一个大字符数组。

char[] data = new char[1000000];

If you need to make a real Stringobject, you can:

如果你需要制作一个真实的String物体,你可以:

String str = new String(data);

Don't use +=to build strings in a loop. That has O(n2) memory and time usage, as Stringobjects are immutable (so that each time you call +=, a newStringobject has to be made, copying the entire contents of the old string in the process).

不要用于+=在循环中构建字符串。这具有 O(n2) 内存和时间使用量,因为String对象是不可变的(因此每次调用 时+=,都必须创建一个String对象,在此过程中复制旧字符串的全部内容)。

回答by Thilo

Use a char[] either directly, or to build the String.

直接使用 char[] 或构建字符串。

char[] chars = new char[size];
Arrays.fill(chars, 'a');

String str = new String(chars);

Also note that one char uses up two bytes internally. How long the String will be over the wire depends on the encoding (the letter a should be just one byte, though).

另请注意,一个字符在内部会占用两个字节。字符串在网络上的长度取决于编码(尽管字母 a 应该只是一个字节)。