java Spring批处理中的网格大小

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7759156/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 21:18:26  来源:igfitidea点击:

Grid Size in Spring batch

javaspringspring-batch

提问by nobody

I have batch job which reads data from bulk files, process it and insert in DB.

我有批处理作业,它从批量文件中读取数据,处理它并插入到数据库中。

I'm using spring's partitioning features using the default partition handler.

我正在使用默认分区处理程序使用 spring 的分区功能。

    <bean class="org.spr...TaskExecutorPartitionHandler">
          <property name="taskExecutor" ref="taskExecutor"/>
          <property name="step" ref="readFromFile" />
          <property name="gridSize" value="10" />
    </bean>

What is the significance of the gridSizehere ? I have configured in such a way that it is equal to the concurrency in taskExecutor.

这里的意义是gridSize什么?我以这样的方式配置它等于taskExecutor中的并发。

回答by tolitius

gridSizespecifies the number of data blocksto create to be processed by (usually) the same number of workers. Think about it as a number of mapped data blocks in a map/reduce.

gridSize指定data blocks要由(通常)相同数量的 处理的要创建的数量workers。将其视为映射/减少中的许多映射数据块。

Using a StepExecutionSplitter, given the data, PartitionHandler"partitions" / splits the data to a gridSizeparts, and sends each part to an independent worker => threadin your case.

使用 a StepExecutionSplitter,给定数据,PartitionHandler“分区”/将数据拆分为gridSize部分,并将每个部分发送给独立的工作人员 =>thread在您的情况下。

For example, you have 10 rows in DB that need to be processed. If you set the gridSizeto be 5, and you are using a straightforward partition logic, you'd end up with 10 / 5 = 2 rows per thread => 5threads working concurrently on 2 rows each.

例如,您在 DB 中有 10 行需要处理。如果您将 设置gridSize5,并且您使用的是简单的分区逻辑,那么您最终会得到 10 / 5 = 每个线程 2 行 => 5 个线程同时在 2 行上工作。

回答by Jim Kiley

Per the API,

根据 API,

Passed to the StepExecutionSplitter in the handle(StepExecutionSplitter, StepExecution) method, instructing it how many StepExecution instances are required, ideally. The StepExecutionSplitter is allowed to ignore the grid size in the case of a restart, since the input data partitions must be preserved.

传递给 handle(StepExecutionSplitter, StepExecution) 方法中的 StepExecutionSplitter,指示它需要多少 StepExecution 实例,理想情况下。在重新启动的情况下,允许 StepExecutionSplitter 忽略网格大小,因为必须保留输入数据分区。

回答by Koushal

grid size is nothing but set of task (assume as sack of bags ) a single partitioned step will lift for processing. After done with all taken task( sack of bags) it will come back for next set of task (sack of bags).

网格大小只不过是一组任务(假设为一袋袋子),单个分区步骤将提升以进行处理。完成所有采取的任务(袋袋)后,它将返回下一组任务(袋袋)。