MySQL:按块检索大选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34180448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 21:28:22  来源:igfitidea点击:

MySQL : retrieve a large select by chunks

mysqlselectsave

提问by Toren

I have select with more then

我有更多然后选择

70 milion rows

7000 万行

I'd like to save the selected data into the one large csv file on win2012 R2

我想将所选数据保存到 win2012 R2 上的一个大 csv 文件中

Q: How to retrive the data from MySQL by chanks for better performance ?

问:如何通过 chanks 从 MySQL 检索数据以获得更好的性能?

because when I try to save one the large select I got

因为当我尝试保存一个大选择时

out of memory errors

内存不足错误

回答by Ed Gibbs

You could try using the LIMITfeature. If you do this:

您可以尝试使用该LIMIT功能。如果你这样做:

SELECT * FROM MyTable ORDER BY whatever LIMIT 0,1000

You'll get the first 1,000 rows. The first LIMITvalue (0) defines the starting row in the result set. It's zero-indexed, so 0 means "the first row". The second LIMITvalue is the maximum number of rows to retrieve. To get the next few sets of 1,000, do this:

您将获得前 1,000 行。第一个LIMIT值 (0) 定义结果集中的起始行。它是零索引的,所以 0 表示“第一行”。第二个LIMIT值是要检索的最大行数。要获得接下来的几组 1,000,请执行以下操作:

SELECT * FROM MyTable ORDER BY whatever LIMIT 1000,1000 -- rows 1,001 - 2,000
SELECT * FROM MyTable ORDER BY whatever LIMIT 2000,1000 -- rows 2,001 - 3,000

And so on. When the SELECTreturns no rows, you're done.

等等。当SELECT没有返回任何行时,您就完成了。

This isn't enough on its own though, because any changes done to the table while you're processing your 1K rows at a time will throw off the order. To freeze the results in time, start by querying the results into a temporary table:

但是,这本身还不够,因为在一次处理 1K 行时对表所做的任何更改都会破坏订单。要及时冻结结果,首先将结果查询到临时表中:

CREATE TEMPORARY TABLE MyChunkedResult AS (
  SELECT *
  FROM MyTable
  ORDER BY whatever
);

Side note: it's a good idea to make sure the temporary table doesn't exist beforehand:

旁注:最好事先确保临时表不存在:

DROP TEMPORARY TABLE IF EXISTS MyChunkedResult;

At any rate, once the temporary table is in place, pull the row chunks from there:

无论如何,一旦临时表就位,从那里拉出行块:

SELECT * FROM MyChunkedResult LIMIT 0, 1000;
SELECT * FROM MyChunkedResult LIMIT 1000,1000;
SELECT * FROM MyChunkedResult LIMIT 2000,1000;
.. and so on.

I'll leave it to you to create the logic that will calculate the limit value after each chunk and check for the end of results. I'd also recommend much larger chunks than 1,000 records; it's just a number I picked out of the air.

我将留给您创建逻辑,该逻辑将在每个块之后计算极限值并检查结果的结尾。我还建议使用比 1,000 条记录大得多的数据块;这只是我从空中挑选出来的一个数字。

Finally, it's good form to drop the temporary table when you're done:

最后,完成后删除临时表是一种很好的形式:

DROP TEMPORARY TABLE MyChunkedResult;

回答by prafi

The LIMIT OFFSETapproach slows query down when a size of the data is very large. Another approach is to use something called Keyset pagination. It requires a unique id in your query, which you can use as a bookmark to point to the last row of the previous page. The next page is fetched using the last bookmark. For instance:

LIMIT OFFSET当数据的大小非常大时,该方法会减慢查询速度。另一种方法是使用称为 Keyset 分页的东西。它需要在您的查询中使用唯一 id,您可以将其用作书签以指向上一页的最后一行。使用最后一个书签获取下一页。例如:

SELECT user_id, name, date_created
FROM users
WHERE user_id > 0
ORDER BY user_id ASC
LIMIT 10 000;

If the resultset above returns the last row with user_idas 12345, you can use it to fetch the next page as follows:

如果上面的结果集返回带有user_idas的最后一行12345,您可以使用它来获取下一页,如下所示:

SELECT user_id, name, date_created
FROM users
WHERE user_id > 12345
ORDER BY user_id ASC
LIMIT 10 000;

For more details, you may take a look at this page.

有关更多详细信息,您可以查看此页面

回答by Peter

Another approach for such a large dataset, to avoid the need to chunk the output, would be to query the relevant data into its own new table (not a temporary table) containing just the data you need, and then use mysqldumpto handle the export to file.

对于如此大的数据集,为避免需要对输出进行分块,另一种方法是将相关数据查询到仅包含您需要的数据的自己的新表(不是临时表)中,然后用于mysqldump处理导出到文件。