Laravel 游标和 Laravel 块方法有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45464676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 16:26:17  来源:igfitidea点击:

What is the difference between laravel cursor and laravel chunk method?

phpdatabaselaravelcursorlarge-data

提问by Suraj

I would like to know what is the difference between laravel chunk and laravel cursor method. Which method is more suitable to use? What will be the use cases for both of them? I know that you should use cursor to save memory but how it actually works in the backend?

我想知道laravel chunk 和laravel cursor 方法有什么区别。哪种方法更适合使用?两者的用例是什么?我知道您应该使用游标来节省内存,但它在后端实际上是如何工作的?

A detailed explanation with example would be useful because I have searched on stackoverflow and other sites but I didn't found much information.

带有示例的详细解释会很有用,因为我已经在 stackoverflow 和其他站点上进行了搜索,但没有找到太多信息。

Here is the code snippet's from the laravel documentation.

这是来自 laravel 文档的代码片段。

Chunking Results

分块结果

Flight::chunk(200, function ($flights) {
    foreach ($flights as $flight) {
        //
    }
});

Using Cursors

使用光标

foreach (Flight::where('foo', 'bar')->cursor() as $flight) {
    //
}

回答by Oluwatobi Samuel Omisakin

Indeed This question might attract some opinionated answer, however the simple answer is here in Laravel Docs

事实上,这个问题可能会吸引一些固执的答案,但简单的答案在Laravel Docs 中

Just for reference:

仅供参考:

This is chunk:

这是块:

If you need to process thousands of Eloquent records, use the chunkcommand. The chunkmethod will retrieve a "chunk" of Eloquent models, feeding them to a given Closurefor processing. Using the chunkmethod will conserve memory when working with large result sets:

如果您需要处理数千条 Eloquent 记录,请使用该chunk命令。该chunk方法将检索 Eloquent 模型的“块”,将它们提供给给定Closure的处理。使用该chunk方法将在处理大型结果集时节省内存:

This is Cursor:

这是光标:

The cursormethod allows you to iterate through your database records using a cursor, which will only execute a single query. When processing large amounts of data, the cursormethod may be used to greatly reduce your memory usage:

cursor方法允许您使用游标遍历数据库记录,该游标只会执行一个查询。在处理大量数据时,cursor可以使用该方法来大大减少你的内存使用:

Chunk retrieves the records from the database, and load it into memory while setting a cursor on the last record retrieved so there is no clash.

Chunk 从数据库中检索记录,并将其加载到内存中,同时在检索到的最后一条记录上设置游标,以免发生冲突。

So the advantage here is if you want to reformat the largerecord before they are sent out, or you want to perform an operation on an nth number of records per time then this is useful. An example is if you are building a view out/excel sheet, so you can take the record in counts till they are done so that all of them are not loaded into the memory at once and thereby hitting the memory limit.

所以这里的优点是如果你想在发送之前重新格式化记录,或者你想每次对第 n 个记录执行操作,那么这很有用。一个例子是,如果您正在构建一个视图/excel 表,那么您可以以计数方式记录记录,直到它们完成,这样所有的记录都不会一次加载到内存中,从而达到内存限制。

Cursor uses PHP Generators, you can check the php generatorspage however here is an interesting caption:

Cursor 使用 PHP 生成器,您可以查看php 生成器页面,但这里有一个有趣的标题:

A generator allows you to write code that uses foreachto iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yieldas many times as it needs to in order to provide the values to be iterated over.

生成器允许您编写使用foreach迭代一组数据的代码,而无需在内存中构建数组,这可能会导致您超出内存限制,或者需要大量的处理时间来生成。相反,你可以写一个发电机的功能,这是同作为一个正常的功能,只是不用返回荷兰国际集团一次,发电机可以产生,因为它需要以提供要遍历值多次。

While I cannot guarantee that I understand fully the concept of Cursor, but for Chunk, chunk runs the query at every record size, retrieving it, and passing it into the closure for further works on the records.

虽然我不能保证我完全理解 Cursor 的概念,但对于 Chunk,chunk 会在每个记录大小上运行查询,检索它,并将其传递到闭包中以进一步处理记录。

Hope this is useful.

希望这是有用的。

回答by mohammad asghari

We have a comparison: chunk() vs cursor()

我们有一个比较: chunk() vs cursor()

  • cursor(): High Speed
  • chunk(): Constant Memory Usage
  • 游标():高速
  • chunk():常量内存使用

10,000 records:

10,000 条记录

+-------------+-----------+------------+
|             | Time(sec) | Memory(MB) |
+-------------+-----------+------------+
| get()       |      0.17 |         22 |
| chunk(100)  |      0.38 |         10 |
| chunk(1000) |      0.17 |         12 |
| cursor()    |      0.16 |         14 |
+-------------+-----------+------------+

100,000 records:

100,000 条记录

+--------------+------------+------------+
|              | Time(sec)  | Memory(MB) |
+--------------+------------+------------+
| get()        |        0.8 |     132    |
| chunk(100)   |       19.9 |      10    |
| chunk(1000)  |        2.3 |      12    |
| chunk(10000) |        1.1 |      34    |
| cursor()     |        0.5 |      45    |
+--------------+------------+------------+
  • TestData: users table of Laravel default migration
  • Homestead 0.5.0
  • PHP 7.0.12
  • MySQL 5.7.16
  • Laravel 5.3.22
  • TestData:Laravel 默认迁移的用户表
  • 家园 0.5.0
  • PHP 7.0.12
  • MySQL 5.7.16
  • Laravel 5.3.22

回答by oraoto

chunkis based on pagination, it maintains a page number, and do the looping for you.

chunk基于分页,它维护一个页码,并为你做循环。

For example, DB::table('users')->select('*')->chunk(100, function($e) {})will do multiple queries until the result set is smaller than the chunk size(100):

例如,DB::table('users')->select('*')->chunk(100, function($e) {})将执行多次查询,直到结果集小于块大小(100):

select * from `users` limit 100 offset 0;
select * from `users` limit 100 offset 100;
select * from `users` limit 100 offset 200;
select * from `users` limit 100 offset 300;
select * from `users` limit 100 offset 400;
...

cursoris based on PDOStatement::fetchand Generator.

cursor是基于PDOStatement::fetch和生成器的。

$cursor = DB::table('users')->select('*')->cursor()
foreach ($cursor as $e) { }

will do issue a single query:

将发出单个查询:

select * from `users`

But the driver doesn't fetch the result set at once.

但是驱动程序不会立即获取结果集。

回答by u8329054

Cursor()

Cursor()

  • only single query
  • fetch result by call PDOStatement::fetch()
  • by default buffered query is used and fetch all result at once.
  • turned only current row into eloquent model
  • 只有一个查询
  • 通过调用获取结果 PDOStatement::fetch()
  • 默认情况下使用缓冲查询并一次获取所有结果。
  • 仅将当前行转换为 eloquent 模型

Pros

优点

  • minimize eloquent model memory overhead
  • easy to manipulate
  • 最小化 eloquent 模型内存开销
  • 易于操作

Cons

缺点

  • huge result leads to out of memory
  • buffered or unbuffered is a trade-off
  • 巨大的结果导致内存不足
  • 缓冲或无缓冲是一种权衡

Chunk()

Chunk()

  • chunk query in to queries with limit and offset
  • fetch result by call PDOStatement::fetchAll
  • turned results into eloquent models batchly
  • 块查询到具有限制和偏移量的查询
  • 通过调用获取结果 PDOStatement::fetchAll
  • 将结果批量转化为雄辩的模型

Pros

优点

  • controllable used memory size
  • 可控的已用内存大小

Cons

缺点

  • turned results in to eloquent models batchly may cause some memory overhead
  • queries and memory usage is a traid-off
  • 将结果批量转化为 eloquent 模型可能会导致一些内存开销
  • 查询和内存使用是一种权衡

TL;DR

TL; 博士

I used to think cursor()will do query each time and only keep one row result in memory. So when I saw @mohammad-asghari's comparison table I got really confused. It must be some bufferbehind the scenes.

我曾经认为cursor()每次都会做查询并且只在内存中保留一行结果。所以当我看到@mohammad-asghari 的比较表时,我真的很困惑。它一定是幕后的一些缓冲区

By tracking Laravel Code as below

通过跟踪 Laravel 代码如下

/**
 * Run a select statement against the database and returns a generator.
 *
 * @param  string  $query
 * @param  array  $bindings
 * @param  bool  $useReadPdo
 * @return \Generator
 */
public function cursor($query, $bindings = [], $useReadPdo = true)
{
    $statement = $this->run($query, $bindings, function ($query, $bindings) use ($useReadPdo) {
        if ($this->pretending()) {
            return [];
        }

        // First we will create a statement for the query. Then, we will set the fetch
        // mode and prepare the bindings for the query. Once that's done we will be
        // ready to execute the query against the database and return the cursor.
        $statement = $this->prepared($this->getPdoForSelect($useReadPdo)
                          ->prepare($query));

        $this->bindValues(
            $statement, $this->prepareBindings($bindings)
        );

        // Next, we'll execute the query against the database and return the statement
        // so we can return the cursor. The cursor will use a PHP generator to give
        // back one row at a time without using a bunch of memory to render them.
        $statement->execute();

        return $statement;
    });

    while ($record = $statement->fetch()) {
        yield $record;
    }
}

I understood Laravel build this feature by wrap PDOStatement::fetch(). And by search buffer PDO fetchand MySQL, I found this document.

我了解 Laravel 通过包装PDOStatement::fetch()来构建此功能。通过搜索缓冲区 PDO fetchMySQL,我找到了这个文档。

https://www.php.net/manual/en/mysqlinfo.concepts.buffering.php

https://www.php.net/manual/en/mysqlinfo.concepts.buffering.php

Queries are using the buffered mode by default. This means that query results are immediately transferred from the MySQL Server to PHP and then are kept in the memory of the PHP process.

查询默认使用缓冲模式。这意味着查询结果会立即从 MySQL 服务器传输到 PHP,然后保存在 PHP 进程的内存中。

so by doing PDOStatement::execute() we actually fetch whole result rowsat ones and stored in the memory, not only one row. So if the result is too huge, this will lead to out of memoryexception.

因此,通过执行 PDOStatement::execute() 我们实际上以一个为单位获取整个结果行存储在内存中,而不仅仅是一行。所以如果结果太大,就会导致内存不足异常。

Though the Document shown we could use $pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);to get rid of buffered query. But the drawback should be caution.

虽然显示的文档我们可以$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);用来摆脱缓冲查询。但缺点是要谨慎。

Unbuffered MySQL queries execute the query and then return a resource while the data is still waiting on the MySQL server for being fetched. This uses less memory on the PHP-side, but can increase the load on the server. Unless the full result set was fetched from the server no further queries can be sent over the same connection. Unbuffered queries can also be referred to as "use result".

无缓冲的 MySQL 查询执行查询,然后在数据仍在 MySQL 服务器上等待获取时返回资源。这在 PHP 端使用较少的内存,但会增加服务器的负载。除非从服务器获取完整的结果集,否则无法通过同一连接发送进一步的查询。无缓冲查询也可以称为“使用结果”。

回答by hendra1

I made some benchmark using cursor and where

我使用游标和 where 做了一些基准测试

foreach (\App\Models\Category::where('type','child')->get() as $res){

}

foreach (\App\Models\Category::where('type', 'child')->cursor() as $flight) {
    //
}

return view('welcome');

here is the result: chunk is faster thank using where

结果如下: 块更快,感谢使用 where