Java 将一个集合分成更小的子集并作为批处理进行处理

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19423326/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 17:01:40  来源:igfitidea点击:

Partition a Set into smaller Subsets and process as batch

javamultithreading

提问by Pawan

I have a continous running thread in my Application , which consists of an HashSet to store all the symbols inside the Application . As per the design at the time it was written , inside the Thread's while true condition it will iterate the hashset continosly and updates the Database for all the symbols contained inside HashSet .

我的 Application 中有一个连续运行的线程,它由一个 HashSet 组成,用于存储 Application 中的所有符号。根据编写时的设计,在 Thread 的 while true 条件中,它将不断迭代 hashset 并更新 HashSet 中包含的所有符号的数据库。

The max symbols that might be present inside the hashset will be around 6000 . I dont the db with all the 6000 symbols at once , but divide this hashset into different subsets of 500 each (12 Sets ) and execute each Subset individually and have a Thread sleep after each Subset for 15 minutes , so taht i can reduce the pressure on Database .

哈希集中可能存在的最大符号数约为 6000 。我没有同时使用所有 6000 个符号的数据库,但将此哈希集划分为每个 500 个的不同子集(12 个集合)并单独执行每个子集,并在每个子集后让线程休眠 15 分钟,这样我就可以减轻压力在数据库上。

This is my code , (sample code snippet )

这是我的代码,(示例代码片段)

How can i Partition a Set into smaller Subsets and process , ( i have seen the examples for partioning ArrayList , TreeSet , but didn't find any example related to HashSet )

我如何将 Set 划分为更小的子集并进行处理,(我已经看到了划分 ArrayList 、 TreeSet 的示例,但没有找到任何与 HashSet 相关的示例)

package com.ubsc.rewji.threads;

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;

public class TaskerThread extends Thread {
    private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
    String symbols[] = new String[] { "One", "Two", "Three", "Four" };
    Set<String> allSymbolsSet = Collections
            .synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));

    public void addsymbols(String commaDelimSymbolsList) {
        if (commaDelimSymbolsList != null) {
            String[] symAr = commaDelimSymbolsList.split(",");
            for (int i = 0; i < symAr.length; i++) {
                priorityBlocking.add(symAr[i]);
            }
        }
    }

    public void run() {
        while (true) {
            try {
                while (priorityBlocking.peek() != null) {
                    String symbol = priorityBlocking.poll();
                    allSymbolsSet.add(symbol);
                }
                Iterator<String> ite = allSymbolsSet.iterator();
                System.out.println("=======================");
                while (ite.hasNext()) {
                    String symbol = ite.next();
                    if (symbol != null && symbol.trim().length() > 0) {
                        try {
                            updateDB(symbol);

                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }
                }
                Thread.sleep(2000);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

    public void updateDB(String symbol) {
        System.out.println("THE SYMBOL BEING UPDATED IS" + "  " + symbol);
    }

    public static void main(String args[]) {
        TaskerThread taskThread = new TaskerThread();
        taskThread.start();

        String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
        taskThread.addsymbols(commaDelimSymbolsList);

    }

}

采纳答案by Amir Pashazadeh

Do something like

做类似的事情

private static final int PARTITIONS_COUNT = 12;

List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
    theSets.add(new HashSet<Type>());
}

int index = 0;
for (Type object : originalSet) {
    theSets.get(index++ % PARTITIONS_COUNT).add(object);
}

Now you have partitioned the originalSetinto 12 other HashSets.

现在您已将originalSetHashSet划分为 12 个其他 HashSet。

回答by TwoThe

A very simple way for your actual problem would be to change your code as follows:

对于您的实际问题,一个非常简单的方法是更改​​您的代码,如下所示:

Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
int i = 500;
while ((--i > 0) && ite.hasNext()) {

A general method would be to use the iterator to take the elements out one by one in a simple loop:

一般的方法是使用迭代器在一个简单的循环中一个一个地取出元素:

int i = 500;
while ((--i > 0) && ite.hasNext()) {
  sublist.add(ite.next());
  ite.remove();
}

回答by Andrey Chaschev

With Guava:

随着番石榴

for (List<String> partition : Iterables.partition(yourSet, 500)) {
    // ... handle partition ...
}

回答by PipoTells

We can use the following approach to divide a Set.

我们可以使用下面的方法来划分一个 Set。

We will get the output as [a, b] [c, d] [e]`

我们将得到输出为 [a, b] [c, d] [e]`

private static List<Set<String>> partitionSet(Set<String> set, int     partitionSize)
{
    List<Set<String>> list = new ArrayList<>();
    int setSize = set.size();

    Iterator iterator = set.iterator();

    while(iterator.hasNext())
    {
        Set newSet = new HashSet();
        for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
        {
            String s = (String)iterator.next();
            newSet.add(s);
        }
        list.add(newSet);
    }
    return list;
}

public static void main(String[] args)
{
    Set<String> set = new HashSet<>();
    set.add("a");
    set.add("b");
    set.add("c");
    set.add("d");
    set.add("e");

    int size = 2;
    List<Set<String>> list = partitionSet(set, 2);

    for(int i = 0; i < list.size(); i++)
    {
        Set<String> s = list.get(i);
        System.out.println(s);
    }
}

回答by Aman

The Guava solution from @Andrey_chaschev seems the best, but in case it is not possible to use it, I believe the following would help

@Andrey_chaschev 的 Guava 解决方案似乎是最好的,但如果无法使用它,我相信以下内容会有所帮助

public static List<Set<String>> partition(Set<String> set, int chunk) {
        if(set == null || set.isEmpty() || chunk < 1)
            return new ArrayList<>();

        List<Set<String>> partitionedList = new ArrayList<>();
        double loopsize = Math.ceil((double) set.size() / (double) chunk);

        for(int i =0; i < loopsize; i++) {
            partitionedList.add(set.stream().skip((long)i * chunk).limit(chunk).collect(Collectors.toSet()));
        }

        return partitionedList;
    }