java 如何修剪java stringbuilder?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5212928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:59:18  来源:igfitidea点击:

How to trim a java stringbuilder?

javaandroidoptimizationstringstringbuilder

提问by CodeFusionMobile

I have a StringBuilder object that needs to be trimmed (i.e. all whitespace chars /u0020 and below removed from either end).

我有一个需要修剪的 StringBuilder 对象(即从任一端删除所有空白字符 /u0020 及以下)。

I can't seem to find a method in string builder that would do this.

我似乎无法在字符串生成器中找到可以执行此操作的方法。

Here's what I'm doing now:

这是我现在正在做的事情:

String trimmedStr = strBuilder.toString().trim();

This gives exactly the desired output, but it requires two Strings to be allocated instead of one. Is there a more efficient to trim the string while it's still in the StringBuilder?

这正好提供了所需的输出,但它需要分配两个字符串而不是一个。当字符串仍在 StringBuilder 中时,是否有更有效的修剪字符串的方法?

回答by Zaven Nahapetyan

You should not use the deleteCharAt approach.

您不应使用 deleteCharAt 方法。

As Boris pointed out, the deleteCharAt method copies the array over every time. The code in the Java 5 that does this looks like this:

正如鲍里斯指出的那样,deleteCharAt 方法每次都会复制数组。Java 5 中执行此操作的代码如下所示:

public AbstractStringBuilder deleteCharAt(int index) {
    if ((index < 0) || (index >= count))
        throw new StringIndexOutOfBoundsException(index);
    System.arraycopy(value, index+1, value, index, count-index-1);
    count--;
    return this;
}

Of course, speculation alone is not enough to choose one method of optimization over another, so I decided to time the 3 approaches in this thread: the original, the delete approach, and the substring approach.

当然,仅凭推测还不足以选择一种优化方法而不是另一种优化方法,因此我决定对本线程中的 3 种方法进行计时:原始方法、删除方法和子字符串方法。

Here is the code I tested for the orignal:

这是我为原始测试的代码:

public static String trimOriginal(StringBuilder sb) {
    return sb.toString().trim();
}

The delete approach:

删除方法:

public static String trimDelete(StringBuilder sb) {
    while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0))) {
        sb.deleteCharAt(0);
    }
    while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1))) {
        sb.deleteCharAt(sb.length() - 1);
    }
    return sb.toString();
}

And the substring approach:

和子串方法:

public static String trimSubstring(StringBuilder sb) {
    int first, last;

    for (first=0; first<sb.length(); first++)
        if (!Character.isWhitespace(sb.charAt(first)))
            break;

    for (last=sb.length(); last>first; last--)
        if (!Character.isWhitespace(sb.charAt(last-1)))
            break;

    return sb.substring(first, last);
}

I performed 100 tests, each time generating a million-character StringBuffer with ten thousand trailing and leading spaces. The testing itself is very basic, but it gives a good idea of how long the methods take.

我执行了 100 次测试,每次都生成一百万个字符的 StringBuffer,其中包含一万个尾随和前导空格。测试本身非常基础,但它很好地说明了这些方法需要多长时间。

Here is the code to time the 3 approaches:

这是对 3 种方法计时的代码:

public static void main(String[] args) {

    long originalTime = 0;
    long deleteTime = 0;
    long substringTime = 0;

    for (int i=0; i<100; i++) {

        StringBuilder sb1 = new StringBuilder();
        StringBuilder sb2 = new StringBuilder();
        StringBuilder sb3 = new StringBuilder();

        for (int j=0; j<10000; j++) {
            sb1.append(" ");
            sb2.append(" ");
            sb3.append(" ");
        }
        for (int j=0; j<980000; j++) {
            sb1.append("a");
            sb2.append("a");
            sb3.append("a");
        }
        for (int j=0; j<10000; j++) {
            sb1.append(" ");
            sb2.append(" ");
            sb3.append(" ");
        }

        long timer1 = System.currentTimeMillis();
        trimOriginal(sb1);
        originalTime += System.currentTimeMillis() - timer1;

        long timer2 = System.currentTimeMillis();
        trimDelete(sb2);
        deleteTime += System.currentTimeMillis() - timer2;

        long timer3 = System.currentTimeMillis();
        trimSubstring(sb3);
        substringTime += System.currentTimeMillis() - timer3;
    }

    System.out.println("original:  " + originalTime + " ms");
    System.out.println("delete:    " + deleteTime + " ms");
    System.out.println("substring: " + substringTime + " ms");
}

I got the following output:

我得到以下输出:

original:  176 ms
delete:    179242 ms
substring: 154 ms

As we see, the substring approach provides a very slight optimization over the original "two String" approach. However, the delete approach is extremely slow and should be avoided.

正如我们所看到的,子字符串方法对原始的“双字符串”方法进行了非常轻微的优化。但是,删除方法极其缓慢,应避免使用。

So to answer your question: you are fine trimming your StringBuilder the way you suggested in the question. The very slight optimization that the substring method offers probably does not justify the excess code.

所以回答你的问题:你可以按照你在问题中建议的方式修剪你的 StringBuilder 。substring 方法提供的非常轻微的优化可能并不能证明多余的代码是合理的。

回答by shams

I've used Zaven's analysis approach and StringBuilder's delete(start, end)method which performs far better than the deleteCharAt(index)approach, but slightly worse than the substring()approach. This method also uses the array copy, but array copy is called far fewer times (only twice in the worst case). In addition, this avoids creating multiple instancesof intermediate Strings in case trim() is called repeatedly on the same StringBuilder object.

我使用了 Zaven 的分析方法和 StringBuilder 的delete(start, end)方法,它们的性能远好于deleteCharAt(index)方法,但比substring()方法略差。此方法也使用数组复制,但调用数组复制的次数要少得多(在最坏的情况下仅调用两次)。此外,这避免了在同一个 StringBuilder 对象上重复调用 trim() 的情况下创建中间字符串的多个实例

public class Main {

    public static String trimOriginal(StringBuilder sb) {
        return sb.toString().trim();
    }

    public static String trimDeleteRange(StringBuilder sb) {
        int first, last;

        for (first = 0; first < sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;

        for (last = sb.length(); last > first; last--)
            if (!Character.isWhitespace(sb.charAt(last - 1)))
                break;

        if (first == last) {
            sb.delete(0, sb.length());
        } else {
           if (last < sb.length()) {
              sb.delete(last, sb.length());
           }
           if (first > 0) {
              sb.delete(0, first);
           }
        }
        return sb.toString();
    }


    public static String trimSubstring(StringBuilder sb) {
        int first, last;

        for (first = 0; first < sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;

        for (last = sb.length(); last > first; last--)
            if (!Character.isWhitespace(sb.charAt(last - 1)))
                break;

        return sb.substring(first, last);
    }

    public static void main(String[] args) {
        runAnalysis(1000);
        runAnalysis(10000);
        runAnalysis(100000);
        runAnalysis(200000);
        runAnalysis(500000);
        runAnalysis(1000000);
    }

    private static void runAnalysis(int stringLength) {
        System.out.println("Main:runAnalysis(string-length=" + stringLength + ")");

        long originalTime = 0;
        long deleteTime = 0;
        long substringTime = 0;

        for (int i = 0; i < 200; i++) {

            StringBuilder temp = new StringBuilder();
            char[] options = {' ', ' ', ' ', ' ', 'a', 'b', 'c', 'd'};
            for (int j = 0; j < stringLength; j++) {
                temp.append(options[(int) ((Math.random() * 1000)) % options.length]);
            }
            String testStr = temp.toString();

            StringBuilder sb1 = new StringBuilder(testStr);
            StringBuilder sb2 = new StringBuilder(testStr);
            StringBuilder sb3 = new StringBuilder(testStr);

            long timer1 = System.currentTimeMillis();
            trimOriginal(sb1);
            originalTime += System.currentTimeMillis() - timer1;

            long timer2 = System.currentTimeMillis();
            trimDeleteRange(sb2);
            deleteTime += System.currentTimeMillis() - timer2;

            long timer3 = System.currentTimeMillis();
            trimSubstring(sb3);
            substringTime += System.currentTimeMillis() - timer3;
        }

        System.out.println("  original:     " + originalTime + " ms");
        System.out.println("  delete-range: " + deleteTime + " ms");
        System.out.println("  substring:    " + substringTime + " ms");
    }

}

Output:

输出:

Main:runAnalysis(string-length=1000)
  original:     0 ms
  delete-range: 4 ms
  substring:    0 ms
Main:runAnalysis(string-length=10000)
  original:     4 ms
  delete-range: 9 ms
  substring:    4 ms
Main:runAnalysis(string-length=100000)
  original:     22 ms
  delete-range: 33 ms
  substring:    43 ms
Main:runAnalysis(string-length=200000)
  original:     57 ms
  delete-range: 93 ms
  substring:    110 ms
Main:runAnalysis(string-length=500000)
  original:     266 ms
  delete-range: 220 ms
  substring:    191 ms
Main:runAnalysis(string-length=1000000)
  original:     479 ms
  delete-range: 467 ms
  substring:    426 ms

回答by Bozho

Don't worry about having two strings. It's a microoptimization.

不要担心有两个字符串。这是一个微优化。

If you really have detected a bottleneck, you can have a nearly-constant-time trimming - just iterate the first N chars, until they are Character.isWhitespace(c)

如果您确实检测到瓶颈,则可以进行几乎恒定时间的修剪 - 只需迭代前 N 个字符,直到它们成为 Character.isWhitespace(c)

回答by Kevin Liu

I had exactly your question at first, however, after 5-minute's second thought, I realized actually you never need to trim the StringBuffer! You only need to trim the string you append into the StringBuffer.

一开始我正好有你的问题,但是,经过 5 分钟的第二次思考,我意识到实际上你永远不需要修剪 StringBuffer!您只需要修剪附加到 StringBuffer 中的字符串

If you want to trim an initial StringBuffer, you can do this:

如果要修剪初始 StringBuffer,可以执行以下操作:

StringBuffer sb = new StringBuffer(initialStr.trim());

If you want to trim StringBuffer on-the-fly, you can do this during append:

如果您想即时修剪 StringBuffer,您可以在附加期间执行此操作:

Sb.append(addOnStr.trim());

回答by bob

only one of you have taken into account that when you convert the String builder to a "string" and then "trim" that you create an immutable object twice that has to be garbage collected, so the total allocation is:

只有你们中的一个人考虑到,当您将 String builder 转换为“string”然后“trim”时,您创建了一个必须被垃圾收集的不可变对象两次,因此总分配为:

  1. Stringbuilder object
  2. immutable string of the SB object 1 immutable object of the string that has been trimmed.
  1. 字符串生成器对象
  2. SB 对象的不可变字符串 1 已​​修剪字符串的不可变对象。

So whilst it may "appear" that the trim is faster, in the real world and with a loaded memory scheme it will in fact be worse.

因此,虽然可能“看起来”修剪速度更快,但在现实世界中并且使用加载的内存方案实际上会更糟。

回答by Erran Morad

I made some code. It works and the test cases are there for you to see. Let me know if this is okay.

我做了一些代码。它有效并且测试用例在那里供您查看。让我知道这是否可以。

Main code -

主要代码——

public static StringBuilder trimStringBuilderSpaces(StringBuilder sb) {

    int len = sb.length();

    if (len > 0) {

            int start = 0;
            int end = 1;
            char space = ' ';
            int i = 0;

            // Remove spaces at start
            for (i = 0; i < len; i++) {
                if (sb.charAt(i) != space) {
                    break;
                }
            }

            end = i;
            //System.out.println("s = " + start + ", e = " + end);
            sb.delete(start, end);

            // Remove the ending spaces
            len = sb.length();

            if (len > 1) {

                for (i = len - 1; i > 0; i--) {
                    if (sb.charAt(i) != space) {
                        i = i + 1;
                        break;
                    }
                }

                start = i;
                end = len;// or len + any positive number !

                //System.out.println("s = " + start + ", e = " + end);
                sb.delete(start, end);

            }

    }

    return sb;
}

The full code with test -

带有测试的完整代码 -

package source;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.ArrayList;

public class StringBuilderTrim {

    public static void main(String[] args) {
        testCode();
    }

    public static void testCode() {

        StringBuilder s1 = new StringBuilder("");
        StringBuilder s2 = new StringBuilder(" ");
        StringBuilder s3 = new StringBuilder("  ");
        StringBuilder s4 = new StringBuilder(" 123");
        StringBuilder s5 = new StringBuilder("  123");
        StringBuilder s6 = new StringBuilder("1");
        StringBuilder s7 = new StringBuilder("123 ");
        StringBuilder s8 = new StringBuilder("123  ");
        StringBuilder s9 = new StringBuilder(" 123 ");
        StringBuilder s10 = new StringBuilder("  123  ");

        /*
         * Using a rough form of TDD here. Initially, one one test input
         * "test case" was added and rest were commented. Write no code for the
         * method being tested. So, the test will fail. Write just enough code
         * to make it pass. Then, enable the next test. Repeat !!!
         */
        ArrayList<StringBuilder> ins = new ArrayList<StringBuilder>();
        ins.add(s1);
        ins.add(s2);
        ins.add(s3);
        ins.add(s4);
        ins.add(s5);
        ins.add(s6);
        ins.add(s7);
        ins.add(s8);
        ins.add(s9);
        ins.add(s10);

        // Run test
        for (StringBuilder sb : ins) {
            System.out
                    .println("\n\n---------------------------------------------");
            String expected = sb.toString().trim();
            String result = trimStringBuilderSpaces(sb).toString();
            System.out.println("In [" + sb + "]" + ", Expected [" + expected
                    + "]" + ", Out [" + result + "]");
            if (result.equals(expected)) {
                System.out.println("Success!");
            } else {
                System.out.println("FAILED!");
            }
            System.out.println("---------------------------------------------");
        }

    }

    public static StringBuilder trimStringBuilderSpaces(StringBuilder inputSb) {

        StringBuilder sb = new StringBuilder(inputSb);
        int len = sb.length();

        if (len > 0) {

            try {

                int start = 0;
                int end = 1;
                char space = ' ';
                int i = 0;

                // Remove spaces at start
                for (i = 0; i < len; i++) {
                    if (sb.charAt(i) != space) {
                        break;
                    }
                }

                end = i;
                //System.out.println("s = " + start + ", e = " + end);
                sb.delete(start, end);

                // Remove the ending spaces
                len = sb.length();

                if (len > 1) {

                    for (i = len - 1; i > 0; i--) {
                        if (sb.charAt(i) != space) {
                            i = i + 1;
                            break;
                        }
                    }

                    start = i;
                    end = len;// or len + any positive number !

                    //System.out.println("s = " + start + ", e = " + end);
                    sb.delete(start, end);

                }

            } catch (Exception ex) {

                StringWriter sw = new StringWriter();
                PrintWriter pw = new PrintWriter(sw);
                ex.printStackTrace(pw);
                sw.toString(); // stack trace as a string

                sb = new StringBuilder("\nNo Out due to error:\n" + "\n" + sw);
                return sb;
            }

        }

        return sb;
    }
}

回答by Vincent Cijmns

strBuilder.replace(0,strBuilder.length(),strBuilder.toString().trim());

回答by Axel

You get two strings, but I'd expect the data to be only allocated once. Since Strings in Java are immutable, I'd expect the trim implementation to give you an object that shares the same character data, but with different start- and end indices. At least that's what the substr method does. So, anything you try to optimise this most certainly will have the opposite effect, since you add overhead that is not needed.

你得到两个字符串,但我希望数据只分配一次。由于 Java 中的字符串是不可变的,我希望修剪实现为您提供一个共享相同字符数据但具有不同开始和结束索引的对象。至少这就是 substr 方法所做的。因此,您尝试优化的任何事情肯定会产生相反的效果,因为您添加了不需要的开销。

Just step through the trim() method with your debugger.

只需使用调试器逐步执行 trim() 方法即可。