如何有效地(性能)从 Java 中的 List 中删除许多项目?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2043783/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 02:56:34  来源:igfitidea点击:

How to efficiently (performance) remove many items from List in Java?

javaperformancelistcollections

提问by WildWezyr

I have quite large List named items (>= 1,000,000 items) and some condition denoted by <cond> that selects items to be deleted and <cond> is true for many (maybe half) of items on my list.

我有相当大的 List 命名项目(>= 1,000,000 个项目)和一些由 <cond> 表示的条件,用于选择要删除的项目,并且 <cond> 对于我列表中的许多(可能是一半)项目来说是真的。

My goal is to efficiently remove items selected by <cond> and retain all other items, source list may be modified, new list may be created - best way to do it should be chosen considering performance.

我的目标是有效地删除由 <cond> 选择的项目并保留所有其他项目,可以修改源列表,可以创建新列表 - 应该考虑性能选择最好的方法。

Here is my testing code:

这是我的测试代码:

    System.out.println("preparing items");
    List<Integer> items = new ArrayList<Integer>(); // Integer is for demo
    for (int i = 0; i < 1000000; i++) {
        items.add(i * 3); // just for demo
    }

    System.out.println("deleting items");
    long startMillis = System.currentTimeMillis();
    items = removeMany(items);
    long endMillis = System.currentTimeMillis();

    System.out.println("after remove: items.size=" + items.size() + 
            " and it took " + (endMillis - startMillis) + " milli(s)");

and naive implementation:

和幼稚的实现:

public static <T> List<T> removeMany(List<T> items) {
    int i = 0;
    Iterator<T> iter = items.iterator();
    while (iter.hasNext()) {
        T item = iter.next();
        // <cond> goes here
        if (/*<cond>: */i % 2 == 0) {
            iter.remove();
        }
        i++;
    }
    return items;
}

As you can see I used item index modulo 2 == 0 as remove condition (<cond>) - just for demonstation purposes.

如您所见,我使用项目索引模 2 == 0 作为删除条件 (<cond>) - 仅用于演示目的。

What better version of removeManymay be provided and why this better version is actually better?

removeMany可以提供什么更好的版本,为什么这个更好的版本实际上更好?

采纳答案by WildWezyr

OK, it's time for test results of proposed approaches. Here what approaches I have tested (name of each approach is also class name in my sources):

好的,现在是测试建议方法的时候了。这是我测试过的方法(每种方法的名称也是我的来源中的类名):

  1. NaiveRemoveManyPerformer- ArrayListwith iterator and remove - first and naive implementation given in my question.
  2. BetterNaiveRemoveManyPerformer- ArrayListwith backward iteration and removal from end to front.
  3. LinkedRemoveManyPerformer- naive iterator and remove but working on LinkedList. Disadventage: works only for LinkedList.
  4. CreateNewRemoveManyPerformer- ArrayListis made as a copy (only retained elements are added), iterator is used to traverse input ArrayList.
  5. SmartCreateNewRemoveManyPerformer- better CreateNewRemoveManyPerformer- initial size (capacity) of result ArrayListis set to final list size. Disadvantage: you must know final size of list when starting.
  6. FasterSmartCreateNewRemoveManyPerformer- even better (?) SmartCreateNewRemoveManyPerformer- use item index (items.get(idx)) instead of iterator.
  7. MagicRemoveManyPerformer- works in place (no list copy) for ArrayListand compacts holes (removed items) from beginning with items from end of the list. Disadventage: this approach changes order of items in list.
  8. ForwardInPlaceRemoveManyPerformer- works in place for ArrayList- move retaining items to fill holes, finally subList is returned (no final removal or clear).
  9. GuavaArrayListRemoveManyPerformer- Google Guava Iterables.removeIfused for ArrayList- almost the same as ForwardInPlaceRemoveManyPerformerbut does final removal of items at the end of list.
  1. NaiveRemoveManyPerformer-ArrayList使用迭代器和删除 - 我的问题中给出的第一个和天真的实现。
  2. BetterNaiveRemoveManyPerformer-ArrayList向后迭代并从头到尾移除。
  3. LinkedRemoveManyPerformer- 天真的迭代器并删除但正在处理LinkedList. 缺点:仅适用于LinkedList.
  4. CreateNewRemoveManyPerformer-ArrayList作为副本制作(仅添加保留元素),迭代器用于遍历 input ArrayList
  5. SmartCreateNewRemoveManyPerformer- 更好CreateNewRemoveManyPerformer- 结果的初始大小(容量)ArrayList设置为最终列表大小。缺点:启动时必须知道列表的最终大小。
  6. FasterSmartCreateNewRemoveManyPerformer- 更好 (?) SmartCreateNewRemoveManyPerformer- 使用项目索引 ( items.get(idx)) 而不是迭代器。
  7. MagicRemoveManyPerformer-ArrayList从列表末尾的项目开始,就地工作(无列表副本)并压缩漏洞(已删除的项目)。缺点:这种方法改变了列表中项目的顺序。
  8. ForwardInPlaceRemoveManyPerformer- 适用于ArrayList- 移动保留项目以填充孔洞,最后返回 subList(没有最终删除或清除)。
  9. GuavaArrayListRemoveManyPerformer- 谷歌番石榴Iterables.removeIf用于ArrayList- 几乎相同,ForwardInPlaceRemoveManyPerformer但最终删除列表末尾的项目。

Full source code is given at the end of this answer.

本答案末尾提供了完整的源代码。

Tests where performed with different list sizes (from 10,000 items to 10,000,000 items) and different remove factors (specifying how many items must be removed from list).

使用不同的列表大小(从 10,000 项到 10,000,000 项)和不同的删除因子(指定必须从列表中删除多少项)执行测试。

As I posted here in comments for other answers - I have thought that copying items from ArrayListto second ArrayListwill be faster than iterating LinkedListand just removing items. Sun's Java Documentation says that constant factor of ArrayListis low compared to that for the LinkedListimplementation, but surprisingly this is not the case in my problem.

正如我在其他答案的评论中发布的那样 - 我认为将项目从ArrayList第二个复制到第二个ArrayList将比迭代LinkedList和删除项目更快。Sun 的 Java 文档说,ArrayListLinkedList实现相比,常数因子 的系数较低,但令人惊讶的是,我的问题并非如此。

In practice LinkedListwith simple iteration and removal has best performance in most cases (this approach is implemented in LinkedRemoveManyPerformer). Usually only MagicRemoveManyPerformerperformance is comparable to LinkedRemoveManyPerformer, other approaches are significantly slower. Google Guava GuavaArrayListRemoveManyPerformeris slower than hand made similar code (because my code does not remove unnecessary items at end of list).

实际上LinkedList,在大多数情况下,简单的迭代和删除具有最佳性能(此方法在 中实现LinkedRemoveManyPerformer)。通常只有MagicRemoveManyPerformer性能与 相媲美LinkedRemoveManyPerformer,其他方法要慢得多。Google GuavaGuavaArrayListRemoveManyPerformer比手工制作的类似代码慢(因为我的代码不会删除列表末尾不必要的项目)。

Example results for removing 500,000 items from 1,000,000 source items:

从 1,000,000 个源项目中删除 500,000 个项目的示例结果:

  1. NaiveRemoveManyPerformer: test not performed - I'm not that patient, but it performs worse than BetterNaiveRemoveManyPerformer.
  2. BetterNaiveRemoveManyPerformer: 226080 milli(s)
  3. LinkedRemoveManyPerformer: 69 milli(s)
  4. CreateNewRemoveManyPerformer: 246 milli(s)
  5. SmartCreateNewRemoveManyPerformer: 112 milli(s)
  6. FasterSmartCreateNewRemoveManyPerformer: 202 milli(s)
  7. MagicRemoveManyPerformer: 74 milli(s)
  8. ForwardInPlaceRemoveManyPerformer: 69 milli(s)
  9. GuavaArrayListRemoveManyPerformer: 118 milli(s)
  1. NaiveRemoveManyPerformer: 未执行测试 - 我不是那么有耐心,但它的表现比BetterNaiveRemoveManyPerformer.
  2. BetterNaiveRemoveManyPerformer: 226080 毫(s)
  3. LinkedRemoveManyPerformer: 69 毫(s)
  4. CreateNewRemoveManyPerformer: 246 毫(秒)
  5. SmartCreateNewRemoveManyPerformer: 112 毫(秒)
  6. FasterSmartCreateNewRemoveManyPerformer: 202 毫(s)
  7. MagicRemoveManyPerformer: 74 毫(秒)
  8. ForwardInPlaceRemoveManyPerformer: 69 毫(s)
  9. GuavaArrayListRemoveManyPerformer: 118 毫(秒)

Example results for removing 1 item from 1,000,000 source items (first item is removed):

从 1,000,000 个源项目中删除 1 个项目的示例结果(删除第一个项目):

  1. BetterNaiveRemoveManyPerformer: 34 milli(s)
  2. LinkedRemoveManyPerformer: 41 milli(s)
  3. CreateNewRemoveManyPerformer: 253 milli(s)
  4. SmartCreateNewRemoveManyPerformer: 108 milli(s)
  5. FasterSmartCreateNewRemoveManyPerformer: 71 milli(s)
  6. MagicRemoveManyPerformer: 43 milli(s)
  7. ForwardInPlaceRemoveManyPerformer: 73 milli(s)
  8. GuavaArrayListRemoveManyPerformer: 78 milli(s)
  1. BetterNaiveRemoveManyPerformer:34 毫秒
  2. LinkedRemoveManyPerformer:41 毫秒
  3. CreateNewRemoveManyPerformer:253 毫秒
  4. SmartCreateNewRemoveManyPerformer:108 毫秒
  5. FasterSmartCreateNewRemoveManyPerformer:71 毫秒
  6. MagicRemoveManyPerformer:43 毫秒
  7. ForwardInPlaceRemoveManyPerformer:73 毫秒
  8. GuavaArrayListRemoveManyPerformer:78 毫秒

Example results for removing 333,334 items from 1,000,000 source items:

从 1,000,000 个源项目中删除 333,334 个项目的示例结果:

  1. BetterNaiveRemoveManyPerformer: 253206 milli(s)
  2. LinkedRemoveManyPerformer: 69 milli(s)
  3. CreateNewRemoveManyPerformer: 245 milli(s)
  4. SmartCreateNewRemoveManyPerformer: 111 milli(s)
  5. FasterSmartCreateNewRemoveManyPerformer: 203 milli(s)
  6. MagicRemoveManyPerformer: 69 milli(s)
  7. ForwardInPlaceRemoveManyPerformer: 72 milli(s)
  8. GuavaArrayListRemoveManyPerformer: 102 milli(s)
  1. BetterNaiveRemoveManyPerformer:253206 毫秒
  2. LinkedRemoveManyPerformer:69 毫秒
  3. CreateNewRemoveManyPerformer:245 毫秒
  4. SmartCreateNewRemoveManyPerformer:111 毫秒
  5. FasterSmartCreateNewRemoveManyPerformer:203 毫秒
  6. MagicRemoveManyPerformer:69 毫秒
  7. ForwardInPlaceRemoveManyPerformer:72 毫秒
  8. GuavaArrayListRemoveManyPerformer:102 毫秒

Example results for removing 1,000,000 (all) items from 1,000,000 source items (all items are removed but with one-by-one processing, if you know a priori that all items are to be removed, list should be simply cleared):

从 1,000,000 个源项目中删除 1,000,000(全部)个项目的示例结果(所有项目都被删除,但进行一一处理,如果您事先知道要删除所有项目,则应简单地清除列表):

  1. BetterNaiveRemoveManyPerformer: 58 milli(s)
  2. LinkedRemoveManyPerformer: 88 milli(s)
  3. CreateNewRemoveManyPerformer: 95 milli(s)
  4. SmartCreateNewRemoveManyPerformer: 91 milli(s)
  5. FasterSmartCreateNewRemoveManyPerformer: 48 milli(s)
  6. MagicRemoveManyPerformer: 61 milli(s)
  7. ForwardInPlaceRemoveManyPerformer: 49 milli(s)
  8. GuavaArrayListRemoveManyPerformer: 133 milli(s)
  1. BetterNaiveRemoveManyPerformer:58 毫秒
  2. LinkedRemoveManyPerformer:88 毫秒
  3. CreateNewRemoveManyPerformer:95 毫秒
  4. SmartCreateNewRemoveManyPerformer:91 毫秒
  5. FasterSmartCreateNewRemoveManyPerformer:48 毫秒
  6. MagicRemoveManyPerformer:61 毫秒
  7. ForwardInPlaceRemoveManyPerformer:49 毫秒
  8. GuavaArrayListRemoveManyPerformer:133 毫秒

My final conclusions: use hybrid approach - if dealing with LinkedList - simple iteration and removal is best, if dealing with ArrayList - it depends if item order is important - use ForwardInPlaceRemoveManyPerformer then, if item order may be changed - best choice is MagicRemoveManyPerformer. If remove factor is known a priori (you know how many items will be removed vs retained) then some more conditionals may be put to select approach performing even better in particular situation. But known remove factor is not usual case... Google Guava Iterables.removeIfis such a hybrid solution but with slightly different assumption (original list must be changed, new one cannot be created and item order always matters) - these are most common assumptions so removeIfis best choice in most real-life cases.

我的最终结论:使用混合方法 - 如果处理 LinkedList - 简单的迭代和删除是最好的,如果处理 ArrayList - 这取决于项目顺序是否重要 - 然后使用 ForwardInPlaceRemoveManyPerformer,如果项目顺序可能会改变 - 最佳选择是 MagicRemoveManyPerformer。如果删除因子是先验已知的(你知道有多少项目将被删除和保留),那么可能会放置更多的条件来选择在特定情况下表现更好的方法。但是已知的删除因素不是通常的情况......谷歌番石榴Iterables.removeIf是一种混合解决方案,但假设略有不同(必须更改原始列表,不能创建新列表并且项目顺序总是很重要) - 这些是最常见的假设,所以removeIf最好大多数现实生活中的选择。

Notice also that all good approaches (naive is not good!) are good enough - any one of them shold do just fine in real application, but naive approach must be avoided.

还请注意,所有好的方法(幼稚并不好!)都足够好——它们中的任何一种在实际应用中都可以做得很好,但必须避免幼稚的方法。

At last - my source code for testing.

最后 - 我的测试源代码。

package WildWezyrListRemovalTesting;

import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;

public class RemoveManyFromList {

    public static abstract class BaseRemoveManyPerformer {

        protected String performerName() {
            return getClass().getSimpleName();
        }

        protected void info(String msg) {
            System.out.println(performerName() + ": " + msg);
        }

        protected void populateList(List<Integer> items, int itemCnt) {
            for (int i = 0; i < itemCnt; i++) {
                items.add(i);
            }
        }

        protected boolean mustRemoveItem(Integer itemVal, int itemIdx, int removeFactor) {
            if (removeFactor == 0) {
                return false;
            }
            return itemIdx % removeFactor == 0;
        }

        protected abstract List<Integer> removeItems(List<Integer> items, int removeFactor);

        protected abstract List<Integer> createInitialList();

        public void testMe(int itemCnt, int removeFactor) {
            List<Integer> items = createInitialList();
            populateList(items, itemCnt);
            long startMillis = System.currentTimeMillis();
            items = removeItems(items, removeFactor);
            long endMillis = System.currentTimeMillis();
            int chksum = 0;
            for (Integer item : items) {
                chksum += item;
            }
            info("removing took " + (endMillis - startMillis)
                    + " milli(s), itemCnt=" + itemCnt
                    + ", removed items: " + (itemCnt - items.size())
                    + ", remaining items: " + items.size()
                    + ", checksum: " + chksum);
        }
    }
    private List<BaseRemoveManyPerformer> rmps =
            new ArrayList<BaseRemoveManyPerformer>();

    public void addPerformer(BaseRemoveManyPerformer rmp) {
        rmps.add(rmp);
    }
    private Runtime runtime = Runtime.getRuntime();

    private void runGc() {
        for (int i = 0; i < 5; i++) {
            runtime.gc();
        }
    }

    public void testAll(int itemCnt, int removeFactor) {
        runGc();
        for (BaseRemoveManyPerformer rmp : rmps) {
            rmp.testMe(itemCnt, removeFactor);
        }
        runGc();
        System.out.println("\n--------------------------\n");
    }

    public static class NaiveRemoveManyPerformer
            extends BaseRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            if (items.size() > 300000 && items instanceof ArrayList) {
                info("this removeItems is too slow, returning without processing");
                return items;
            }
            int i = 0;
            Iterator<Integer> iter = items.iterator();
            while (iter.hasNext()) {
                Integer item = iter.next();
                if (mustRemoveItem(item, i, removeFactor)) {
                    iter.remove();
                }
                i++;
            }
            return items;
        }

        @Override
        public List<Integer> createInitialList() {
            return new ArrayList<Integer>();
        }
    }

    public static class BetterNaiveRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
//            if (items.size() > 300000 && items instanceof ArrayList) {
//                info("this removeItems is too slow, returning without processing");
//                return items;
//            }

            for (int i = items.size(); --i >= 0;) {
                Integer item = items.get(i);
                if (mustRemoveItem(item, i, removeFactor)) {
                    items.remove(i);
                }
            }
            return items;
        }
    }

    public static class LinkedRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> createInitialList() {
            return new LinkedList<Integer>();
        }
    }

    public static class CreateNewRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            List<Integer> res = createResultList(items, removeFactor);
            int i = 0;

            for (Integer item : items) {
                if (mustRemoveItem(item, i, removeFactor)) {
                    // no-op
                } else {
                    res.add(item);
                }
                i++;
            }

            return res;
        }

        protected List<Integer> createResultList(List<Integer> items, int removeFactor) {
            return new ArrayList<Integer>();
        }
    }

    public static class SmartCreateNewRemoveManyPerformer
            extends CreateNewRemoveManyPerformer {

        @Override
        protected List<Integer> createResultList(List<Integer> items, int removeFactor) {
            int newCapacity = removeFactor == 0 ? items.size()
                    : (int) (items.size() * (removeFactor - 1L) / removeFactor + 1);
            //System.out.println("newCapacity=" + newCapacity);
            return new ArrayList<Integer>(newCapacity);
        }
    }

    public static class FasterSmartCreateNewRemoveManyPerformer
            extends SmartCreateNewRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            List<Integer> res = createResultList(items, removeFactor);

            for (int i = 0; i < items.size(); i++) {
                Integer item = items.get(i);
                if (mustRemoveItem(item, i, removeFactor)) {
                    // no-op
                } else {
                    res.add(item);
                }
            }

            return res;
        }
    }

    public static class ForwardInPlaceRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            int j = 0; // destination idx
            for (int i = 0; i < items.size(); i++) {
                Integer item = items.get(i);
                if (mustRemoveItem(item, i, removeFactor)) {
                    // no-op
                } else {
                    if (j < i) {
                        items.set(j, item);
                    }
                    j++;
                }
            }

            return items.subList(0, j);
        }
    }

    public static class MagicRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            for (int i = 0; i < items.size(); i++) {
                if (mustRemoveItem(items.get(i), i, removeFactor)) {
                    Integer retainedItem = removeSomeFromEnd(items, removeFactor, i);
                    if (retainedItem == null) {
                        items.remove(i);
                        break;
                    }
                    items.set(i, retainedItem);
                }
            }

            return items;
        }

        private Integer removeSomeFromEnd(List<Integer> items, int removeFactor, int lowerBound) {
            for (int i = items.size(); --i > lowerBound;) {
                Integer item = items.get(i);
                items.remove(i);
                if (!mustRemoveItem(item, i, removeFactor)) {
                    return item;
                }
            }
            return null;
        }
    }

    public static class GuavaArrayListRemoveManyPerformer
            extends BaseRemoveManyPerformer {

        @Override
        protected List<Integer> removeItems(List<Integer> items, final int removeFactor) {
            Iterables.removeIf(items, new Predicate<Integer>() {

                public boolean apply(Integer input) {
                    return mustRemoveItem(input, input, removeFactor);
                }
            });

            return items;
        }

        @Override
        protected List<Integer> createInitialList() {
            return new ArrayList<Integer>();
        }
    }

    public void testForOneItemCnt(int itemCnt) {
        testAll(itemCnt, 0);
        testAll(itemCnt, itemCnt);
        testAll(itemCnt, itemCnt - 1);
        testAll(itemCnt, 3);
        testAll(itemCnt, 2);
        testAll(itemCnt, 1);
    }

    public static void main(String[] args) {
        RemoveManyFromList t = new RemoveManyFromList();
        t.addPerformer(new NaiveRemoveManyPerformer());
        t.addPerformer(new BetterNaiveRemoveManyPerformer());
        t.addPerformer(new LinkedRemoveManyPerformer());
        t.addPerformer(new CreateNewRemoveManyPerformer());
        t.addPerformer(new SmartCreateNewRemoveManyPerformer());
        t.addPerformer(new FasterSmartCreateNewRemoveManyPerformer());
        t.addPerformer(new MagicRemoveManyPerformer());
        t.addPerformer(new ForwardInPlaceRemoveManyPerformer());
        t.addPerformer(new GuavaArrayListRemoveManyPerformer());

        t.testForOneItemCnt(1000);
        t.testForOneItemCnt(10000);
        t.testForOneItemCnt(100000);
        t.testForOneItemCnt(200000);
        t.testForOneItemCnt(300000);
        t.testForOneItemCnt(500000);
        t.testForOneItemCnt(1000000);
        t.testForOneItemCnt(10000000);
    }
}

回答by LBushkin

I would imagine that building a new list, rather than modifying the existing list, would be more performant - especially when the number of items is as large as you indicate. This assumes, your list is an ArrayList, not a LinkedList. For a non-circular LinkedList, insertion is O(n), but removal at an existing iterator position is O(1); in which case your naive algorithm should be sufficiently performant.

我认为构建一个新列表而不是修改现有列表会更高效 - 特别是当项目数量与您指示的一样大时。这假设,您的列表是ArrayList,而不是LinkedList。对于 non-circular LinkedList,插入是 O(n),但在现有迭代器位置移除是 O(1);在这种情况下,您的幼稚算法应该具有足够的性能。

Unless the list is a LinkedList, the cost of shifting the list each time you call remove()is likely one of the most expensive parts of the implementation. For array lists, I would consider using:

除非列表是LinkedList,否则每次调用时移动列表的成本remove()可能是实现中最昂贵的部分之一。对于数组列表,我会考虑使用:

public static <T> List<T> removeMany(List<T> items) {
    List<T> newList = new ArrayList<T>(items.size());
    Iterator<T> iter = items.iterator();
    while (iter.hasNext()) {
        T item = iter.next();
        // <cond> goes here
        if (/*<cond>: */i++ % 2 != 0) {
            newList.add(item);
        }
    }
    return newList;
}

回答by Fabian Steeg

One thing you could try is to use a LinkedListinstead of an ArrayList, as with an ArrayListall other elements need to be copied if elements are removed from within the list.

您可以尝试的一件事是使用 aLinkedList而不是 an ArrayList,因为ArrayList如果从列表中删除元素,则需要复制所有其他元素。

回答by notnoop

Removing a lot of elements from an ArrayListis an O(n^2)operation. I would recommend simply using a LinkedListthat's more optimized for insertion and removal (but not for random access). LinkedList has a bit of a memory overhead.

从 an 中删除很多元素ArrayList是一个O(n^2)操作。我建议简单地使用一个LinkedList更适合插入和删除(但不是随机访问)的方法。LinkedList 有一点内存开销。

If you do need to keep ArrayList, then you are better off creating a new list.

如果您确实需要保留ArrayList,那么最好创建一个新列表。

Update: Comparing with creating a new list:

更新:与创建新列表相比:

Reusing the same list, the main cost is coming from deleting the node and updating the appropriate pointers in LinkedList. This is a constant operation for any node.

重用相同的列表,主要成本来自删除节点和更新 LinkedList 中的适当指针。这对于任何节点都是一个持续的操作。

When constructing a new list, the main cost is coming from creating the list, and initializing array entries. Both are cheap operations. You might incurre the cost of resizing the new list backend array as well; assuming that the final array is larger than half of the incoming array.

构建新列表时,主要成本来自创建列表和初始化数组条目。两者都是廉价的操作。您也可能会产生调整新列表后端数组大小的成本;假设最终数组大于传入数组的一半。

So if you were to remove only one element, then LinkedListapproach is probably faster. If you were to delete all nodes except for one, probably the new list approach is faster.

所以如果你只删除一个元素,那么LinkedList方法可能会更快。如果您要删除除一个之外的所有节点,则新列表方法可能会更快。

There are more complications when you bring memory management and GC. I'd like to leave these out.

当你带来内存管理和 GC 时,会有更多的复杂性。我想把这些排除在外。

The best option is to implement the alternatives yourself and benchmark the results when running your typical load.

最好的选择是自己实施替代方案,并在运行典型负载时对结果进行基准测试。

回答by Chinmay Kanchi

I would make a new Listto add the items to, since removing an item from the middle of a List is quite expensive.

我会做一个新List的添加项目,因为从列表中间删除一个项目是非常昂贵的。

public static List<T> removeMany(List<T> items) {
    List<T> tempList = new ArrayList<T>(items.size()/2); //if about half the elements are going to be removed
    Iterator<T> iter = items.iterator();
    while (item : items) {
        // <cond> goes here
        if (/*<cond>: */i % 2 != 0) {
            tempList.add(item);
        }
    }
    return tempList;
}

EDIT: I haven't tested this, so there may very well be small syntax errors.

编辑:我没有测试过这个,所以很可能会有小的语法错误。

Second EDIT: Using a LinkedListis better when you don't need random access but fast add times.

第二次编辑:LinkedList当您不需要随机访问但需要快速添加时间时,使用 a会更好。

BUT...

但...

The constant factor for ArrayListis smaller than that for LinkedList(Ref). Since you can make a reasonable guess of how many elements will be removed (you said "about half" in your question), adding an element to the end of an ArrayListis O(1) as long as you don't have to re-allocate it. So, if you canmake a reasonable guess, I would expect the ArrayListto be marginally faster than the LinkedListin most cases. (This applies to the code I have posted. In your naive implementatation, I think LinkedListwill be faster).

的常数因子ArrayList小于LinkedList( Ref)的常数因子。由于您可以合理猜测将删除多少元素(您在问题中说“大约一半”),因此在 an 的末尾添加一个元素ArrayList是 O(1),只要您不必重新 -分配它。所以,如果你做出合理的猜测,我希望在大多数情况下ArrayList比 快一点LinkedList。(这适用于我发布的代码。在你天真的实现中,我认为LinkedList会更快)。

回答by AlastairDewar

Try implementing recursion into your algorithm.

尝试在您的算法中实现递归。

回答by Arne Deutsch

Maybe a List isn't the optimal data structure for you? Can you change it? Perhaps you can use a tree where the items are sorted into in a way that deleting one node removes all items that meet the condition? Or that at least speed up your operations?

也许 List 不是最适合您的数据结构?你能改变吗?也许您可以使用一棵树,在其中以删除一个节点将删除所有满足条件的项目的方式对项目进行排序?或者至少可以加快您的操作?

In your simplistic example using two lists (one with the items where i % 2 != 0 is true and one with the items where i % 2 != 0 is false) could serve well. But this is of course very domain dependent.

在您使用两个列表的简单示例中(一个包含 i % 2 != 0 为真的项目,一个包含 i % 2 != 0 为假的项目)可以很好地服务。但这当然是非常依赖领域的。

回答by Kevin Bourrillion

As others have said, your first inclination should be to just build up a second list.

正如其他人所说,您的第一个倾向应该是建立第二个列表。

But, if you'd like to also try out editing the list in-place, the efficient way to do that is to use Iterables.removeIf()from Guava. If its argument is a list, it coalesces the retained elements toward the front then simply chops off the end -- much faster than removing() interior elements one by one.

但是,如果您还想尝试就地编辑列表,那么有效的方法是使用Iterables.removeIf()Guava。如果它的参数是一个列表,它会将保留的元素合并到前面,然后简单地砍掉结尾——比一个一个地删除()内部元素要快得多。

回答by Jherico

Use Apache Commons Collections. Specifically this function. This is implemented in essentially the same way that people are suggesting that you implement it (i.e. create a new list and then add to it).

使用Apache Commons 集合。具体这个功能。这基本上与人们建议您实施它的方式相同(即创建一个新列表,然后添加到其中)。

回答by atk

Since speed is the most important metric, there's the possibility of using more memory and doing less recreation of lists (as mentioned in my comment). Actual performance impact would be fully dependent on how the functionality is used, though.

由于速度是最重要的指标,因此有可能使用更多内存并减少重新创建列表(如我的评论中所述)。不过,实际的性能影响将完全取决于功能的使用方式。

The algorithm assumes that at least one of the following is true:

该算法假设以下至少一项为真:

  • all elements of the original list do not need to be tested. This could happen if we're really looking for the first N elements that match our condition, rather than all elements that match our condition.
  • it's more expensive to copy the list into new memory. This could happen if the original list uses more than 50% of allocated memory, so working in-place could be better or if memory operations turn out to be slower (that would be an unexpected result).
  • the speed penalty of removing elements from the list is too large to accept all at once, but spreading that penalty across multiple operations is acceptable, even if the overall penalty is larger than taking it all at once. This is like taking out a $200K mortgage: paying $1000 per month for 30 years is affordable on a monthly basis and has the benefits of owning a home and equity, even though the overall payment is 360K over the life of the loan.
  • 不需要测试原始列表的所有元素。如果我们真的在寻找与我们的条件匹配的前 N ​​个元素,而不是所有与我们的条件匹配的元素,就会发生这种情况。
  • 将列表复制到新内存中的成本更高。如果原始列表使用超过 50% 的已分配内存,则可能会发生这种情况,因此就地工作可能会更好,或者如果内存操作变得更慢(这将是意外结果)。
  • 从列表中删除元素的速度损失太大而无法一次全部接受,但将这种损失分散到多个操作中是可以接受的,即使总体损失大于一次全部接受。这就像申请 20 万美元的抵押贷款:每月支付 1000 美元,为期 30 年,每月支付 1000 美元是负担得起的,并且拥有拥有房屋和股权的好处,即使在整个贷款期限内总还款额为 36 万美元。

Disclaimer: There's prolly syntax errors - I didn't try compiling anything.

免责声明:有很多语法错误 - 我没有尝试编译任何东西。

First, subclass the ArrayList

首先,继承ArrayList

public class ConditionalArrayList extends ArrayList {

  public Iterator iterator(Condition condition)
  { 
    return listIterator(condition);
  }

  public ListIterator listIterator(Condition condition)
  {
    return new ConditionalArrayListIterator(this.iterator(),condition); 
  }

  public ListIterator listIterator(){ return iterator(); }
  public iterator(){ 
    throw new InvalidArgumentException("You must specify a condition for the iterator"); 
  }
}

Then we need the helper classes:

然后我们需要帮助类:

public class ConditionalArrayListIterator implements ListIterator
{
  private ListIterator listIterator;
  Condition condition;

  // the two following flags are used as a quick optimization so that 
  // we don't repeat tests on known-good elements unnecessarially.
  boolean nextKnownGood = false;
  boolean prevKnownGood = false;

  public ConditionalArrayListIterator(ListIterator listIterator, Condition condition)
  {
    this.listIterator = listIterator;
    this.condition = condition;
  }

  public void add(Object o){ listIterator.add(o); }

  /**
   * Note that this it is extremely inefficient to 
   * call hasNext() and hasPrev() alternatively when
   * there's a bunch of non-matching elements between
   * two matching elements.
   */
  public boolean hasNext()
  { 
     if( nextKnownGood ) return true;

     /* find the next object in the list that 
      * matches our condition, if any.
      */
     while( ! listIterator.hasNext() )
     {
       Object next = listIterator.next();
       if( condition.matches(next) ) {
         listIterator.set(next);
         nextKnownGood = true;
         return true;
       }
     }

     nextKnownGood = false;
     // no matching element was found.
     return false;
  }

  /**
   *  See hasPrevious for efficiency notes.
   *  Copy & paste of hasNext().
   */
  public boolean hasPrevious()
  { 
     if( prevKnownGood ) return true;

     /* find the next object in the list that 
      * matches our condition, if any.
      */
     while( ! listIterator.hasPrevious() )
     {
       Object prev = listIterator.next();
       if( condition.matches(prev) ) {
         prevKnownGood = true;
         listIterator.set(prev);
         return true;
       }
     }

     // no matching element was found.
     prevKnwonGood = false;
     return false;
  }

  /** see hasNext() for efficiency note **/
  public Object next()
  {
     if( nextKnownGood || hasNext() ) 
     { 
       prevKnownGood = nextKnownGood;
       nextKnownGood = false;
       return listIterator.next();
     }

     throw NoSuchElementException("No more matching elements");
  }

  /** see hasNext() for efficiency note; copy & paste of next() **/
  public Object previous()
  {
     if( prevKnownGood || hasPrevious() ) 
     { 
       nextKnownGood = prevKnownGood;
       prevKnownGood = false;
       return listIterator.previous();                        
     }
     throw NoSuchElementException("No more matching elements");
  }

  /** 
   * Note that nextIndex() and previousIndex() return the array index
   * of the value, not the number of results that this class has returned.
   * if this isn't good for you, just maintain your own current index and
   * increment or decriment in next() and previous()
   */
  public int nextIndex(){ return listIterator.previousIndex(); }
  public int previousIndex(){ return listIterator.previousIndex(); }

  public remove(){ listIterator.remove(); }
  public set(Object o) { listIterator.set(o); }
}

and, of course, we need the condition interface:

当然,我们需要条件接口:

/** much like a comparator... **/
public interface Condition
{
  public boolean matches(Object obj);
}

And a condition with which to test

以及测试条件

public class IsEvenCondition {
{
  public boolean matches(Object obj){ return (Number(obj)).intValue() % 2 == 0;
}

and we're finally ready for some test code

我们终于准备好了一些测试代码

    Condition condition = new IsEvenCondition();

    System.out.println("preparing items");
    startMillis = System.currentTimeMillis();
    List<Integer> items = new ArrayList<Integer>(); // Integer is for demo
    for (int i = 0; i < 1000000; i++) {
        items.add(i * 3); // just for demo
    }
    endMillis = System.currentTimeMillis();
    System.out.println("It took " + (endmillis-startmillis) + " to prepare the list. ");

    System.out.println("deleting items");
    startMillis = System.currentTimeMillis();
    // we don't actually ever remove from this list, so 
    // removeMany is effectively "instantaneous"
    // items = removeMany(items);
    endMillis = System.currentTimeMillis();
    System.out.println("after remove: items.size=" + items.size() + 
            " and it took " + (endMillis - startMillis) + " milli(s)");
    System.out.println("--> NOTE: Nothing is actually removed.  This algorithm uses extra"
                       + " memory to avoid modifying or duplicating the original list.");

    System.out.println("About to iterate through the list");
    startMillis = System.currentTimeMillis();
    int count = iterate(items, condition);
    endMillis = System.currentTimeMillis();
    System.out.println("after iteration: items.size=" + items.size() + 
            " count=" + count + " and it took " + (endMillis - startMillis) + " milli(s)");
    System.out.println("--> NOTE: this should be somewhat inefficient."
                       + " mostly due to overhead of multiple classes."
                       + " This algorithm is designed (hoped) to be faster than "
                       + " an algorithm where all elements of the list are used.");

    System.out.println("About to iterate through the list");
    startMillis = System.currentTimeMillis();
    int total = addFirst(30, items, condition);
    endMillis = System.currentTimeMillis();
    System.out.println("after totalling first 30 elements: total=" + total + 
            " and it took " + (endMillis - startMillis) + " milli(s)");

...

private int iterate(List<Integer> items, Condition condition)
{
  // the i++ and return value are really to prevent JVM optimization
  // - just to be safe.
  Iterator iter = items.listIterator(condition);
  for( int i=0; iter.hasNext()); i++){ iter.next(); }
  return i;
}

private int addFirst(int n, List<Integer> items, Condition condition)
{
  int total = 0;
  Iterator iter = items.listIterator(condition);
  for(int i=0; i<n;i++)
  {
    total += ((Integer)iter.next()).intValue();
  }
}

回答by fullreset

I'm sorry, but all these answers are missing the point, I think: You probably don't have to, and probably shouldn't, use a List.

对不起,但所有这些答案都没有抓住重点,我认为:您可能不必,也可能不应该使用 List。

If this kind of "query" is common, why not build an ordered data structure that eliminates the need to traverse all the data nodes? You don't tell us enough about the problem, but given the example you provide a simple tree could do the trick. There's an insertion overhead per item, but you can very quickly find the subtree containing nodes that match , and you therefore avoid most of the comparisons you're doing now.

如果这种“查询”很常见,为什么不构建一个无需遍历所有数据节点的有序数据结构呢?您没有告诉我们足够的问题,但鉴于您提供的示例,一个简单的树可以解决问题。每个项目都有一个插入开销,但是您可以非常快速地找到包含匹配节点的子树,因此您可以避免现在进行的大多数比较。

Furthermore:

此外:

  • Depending on the exact problem, and the exact data structure you set up, you can speed up deletion -- if the nodes you want to kill do reduce to a subtree or something of the sort, you just dropthat subtree, rather than updating a whole slew of list nodes.

  • Each time you remove a list item, you are updating pointers -- eg lastNode.nextand nextNode.prevor something -- but if it turns out you also want to remove the nextNode, then the pointer update you just caused is thrown away by a new update.)

  • 根据确切的问题和您设置的确切数据结构,您可以加快删除速度——如果您想删除的节点确实减少到子树或类似的东西,您只需删除该子树,而不是更新整个列表节点。

  • 每次删除列表项时,您都在更新指针——例如lastNode.nextnextNode.prev或其他内容——但如果事实证明您还想删除nextNode,那么您刚刚引起的指针更新将被丢弃一个新的更新。)