java 块大小和页面大小 Spring Batch 之间的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46452527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between chunk size and page size Spring Batch
提问by Andrew
I have a spring batch job in which I set the chunk size to 1000 and the reader in that job is a JpaPagingItemReader
.
我有一个 spring 批处理作业,其中我将块大小设置为 1000,并且该作业中的读取器是JpaPagingItemReader
.
In the reader I set the page size to 20
. Does this mean that every chuck of 1000 items read the reader must take out of the db 20 items?
在阅读器中,我将页面大小设置为20
. 这是否意味着读取器读取的每 1000 个项目都必须从 db 中取出 20 个项目?
If not what is the difference between them?
如果不是,它们之间有什么区别?
回答by Sabir Khan
With your current configuration, if your every read item makes to writer ( i.e. if it doesn't get filtered out in processor ) then you will need 1000/20 = 50 database reads to reach a chunk size i.e. when you actually call the writer for writing.
使用您当前的配置,如果您的每个读取项都写入编写器(即,如果它没有在处理器中被过滤掉),那么您将需要 1000/20 = 50 次数据库读取才能达到块大小,即当您实际调用编写器时写作。
Spring Batch holds processed items in memory till you reach a chunk size and holding items costs memory.
Spring Batch 将已处理的项目保存在内存中,直到达到块大小并保存项目会消耗内存。
Your current configuration is holding data in memory and making unnecessary database calls while we wish to reduce both of those things.
您当前的配置是将数据保存在内存中并进行不必要的数据库调用,而我们希望减少这两件事。
So your configuration needs to be reverse of what you are doing i.e. increase reader page size to a minimum equal to chunk size / commit interval or more so data once read gets processed in small small chunks then you go again and read database .
因此,您的配置需要与您正在执行的操作相反,即将读取器页面大小增加到等于块大小/提交间隔或更多的最小值,因此一旦读取的数据以小块的形式处理,然后您再次读取数据库。
So as you have noticed till current write up, conceptually these are unrelated concepts - reader page size is to minimize database calls ( and this concept is not a spring batch concept but reader specific - if its not a paging reader, this concept doesn't come into picture ) while chunk size is about committing processed data in small small chunks to reduce memory foot print.
因此,正如您在当前撰写之前已经注意到的那样,从概念上讲,这些是不相关的概念 - 阅读器页面大小是为了最小化数据库调用(并且这个概念不是 spring 批处理概念而是特定于阅读器的 - 如果它不是分页阅读器,则此概念不会进入图片)而块大小是关于将处理过的数据以小块的形式提交以减少内存占用。
回答by hira waseem
Yes the commit interval determines how many record would be processed in a Chunk.
是的,提交间隔决定了一个块中将处理多少条记录。
The database page size determines how many record would be fetched from the database in one go. It's more of an optimization setting between how big the buffer would you like to have v/s the number of trips the driver would how to make to fetch the data from the database.
数据库页面大小决定了一次从数据库中获取的记录数。它更像是您希望缓冲区有多大与驱动程序从数据库中获取数据的行程次数之间的优化设置。