java Spring批处理中的网格大小
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7759156/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grid Size in Spring batch
提问by nobody
I have batch job which reads data from bulk files, process it and insert in DB.
我有批处理作业,它从批量文件中读取数据,处理它并插入到数据库中。
I'm using spring's partitioning features using the default partition handler.
我正在使用默认分区处理程序使用 spring 的分区功能。
<bean class="org.spr...TaskExecutorPartitionHandler">
<property name="taskExecutor" ref="taskExecutor"/>
<property name="step" ref="readFromFile" />
<property name="gridSize" value="10" />
</bean>
What is the significance of the gridSize
here ? I have configured in such a way that it is equal to the concurrency in taskExecutor.
这里的意义是gridSize
什么?我以这样的方式配置它等于taskExecutor中的并发。
回答by tolitius
gridSize
specifies the number of data blocks
to create to be processed by (usually) the same number of workers
. Think about it as a number of mapped data blocks in a map/reduce.
gridSize
指定data blocks
要由(通常)相同数量的 处理的要创建的数量workers
。将其视为映射/减少中的许多映射数据块。
Using a StepExecutionSplitter
, given the data, PartitionHandler
"partitions" / splits the data to a gridSize
parts, and sends each part to an independent worker => thread
in your case.
使用 a StepExecutionSplitter
,给定数据,PartitionHandler
“分区”/将数据拆分为gridSize
部分,并将每个部分发送给独立的工作人员 =>thread
在您的情况下。
For example, you have 10 rows in DB that need to be processed. If you set the gridSize
to be 5, and you are using a straightforward partition logic, you'd end up with 10 / 5 = 2 rows per thread => 5threads working concurrently on 2 rows each.
例如,您在 DB 中有 10 行需要处理。如果您将 设置gridSize
为5,并且您使用的是简单的分区逻辑,那么您最终会得到 10 / 5 = 每个线程 2 行 => 5 个线程同时在 2 行上工作。
回答by Jim Kiley
Per the API,
根据 API,
Passed to the StepExecutionSplitter in the handle(StepExecutionSplitter, StepExecution) method, instructing it how many StepExecution instances are required, ideally. The StepExecutionSplitter is allowed to ignore the grid size in the case of a restart, since the input data partitions must be preserved.
传递给 handle(StepExecutionSplitter, StepExecution) 方法中的 StepExecutionSplitter,指示它需要多少 StepExecution 实例,理想情况下。在重新启动的情况下,允许 StepExecutionSplitter 忽略网格大小,因为必须保留输入数据分区。
回答by Koushal
grid size is nothing but set of task (assume as sack of bags ) a single partitioned step will lift for processing. After done with all taken task( sack of bags) it will come back for next set of task (sack of bags).
网格大小只不过是一组任务(假设为一袋袋子),单个分区步骤将提升以进行处理。完成所有采取的任务(袋袋)后,它将返回下一组任务(袋袋)。