MySQL:按块检索大选择
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34180448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL : retrieve a large select by chunks
提问by Toren
I have select with more then
我有更多然后选择
70 milion rows
7000 万行
I'd like to save the selected data into the one large csv file on win2012 R2
我想将所选数据保存到 win2012 R2 上的一个大 csv 文件中
Q: How to retrive the data from MySQL by chanks for better performance ?
问:如何通过 chanks 从 MySQL 检索数据以获得更好的性能?
because when I try to save one the large select I got
因为当我尝试保存一个大选择时
out of memory errors
内存不足错误
回答by Ed Gibbs
You could try using the LIMIT
feature. If you do this:
您可以尝试使用该LIMIT
功能。如果你这样做:
SELECT * FROM MyTable ORDER BY whatever LIMIT 0,1000
You'll get the first 1,000 rows. The first LIMIT
value (0) defines the starting row in the result set. It's zero-indexed, so 0 means "the first row". The second LIMIT
value is the maximum number of rows to retrieve. To get the next few sets of 1,000, do this:
您将获得前 1,000 行。第一个LIMIT
值 (0) 定义结果集中的起始行。它是零索引的,所以 0 表示“第一行”。第二个LIMIT
值是要检索的最大行数。要获得接下来的几组 1,000,请执行以下操作:
SELECT * FROM MyTable ORDER BY whatever LIMIT 1000,1000 -- rows 1,001 - 2,000
SELECT * FROM MyTable ORDER BY whatever LIMIT 2000,1000 -- rows 2,001 - 3,000
And so on. When the SELECT
returns no rows, you're done.
等等。当SELECT
没有返回任何行时,您就完成了。
This isn't enough on its own though, because any changes done to the table while you're processing your 1K rows at a time will throw off the order. To freeze the results in time, start by querying the results into a temporary table:
但是,这本身还不够,因为在一次处理 1K 行时对表所做的任何更改都会破坏订单。要及时冻结结果,首先将结果查询到临时表中:
CREATE TEMPORARY TABLE MyChunkedResult AS (
SELECT *
FROM MyTable
ORDER BY whatever
);
Side note: it's a good idea to make sure the temporary table doesn't exist beforehand:
旁注:最好事先确保临时表不存在:
DROP TEMPORARY TABLE IF EXISTS MyChunkedResult;
At any rate, once the temporary table is in place, pull the row chunks from there:
无论如何,一旦临时表就位,从那里拉出行块:
SELECT * FROM MyChunkedResult LIMIT 0, 1000;
SELECT * FROM MyChunkedResult LIMIT 1000,1000;
SELECT * FROM MyChunkedResult LIMIT 2000,1000;
.. and so on.
I'll leave it to you to create the logic that will calculate the limit value after each chunk and check for the end of results. I'd also recommend much larger chunks than 1,000 records; it's just a number I picked out of the air.
我将留给您创建逻辑,该逻辑将在每个块之后计算极限值并检查结果的结尾。我还建议使用比 1,000 条记录大得多的数据块;这只是我从空中挑选出来的一个数字。
Finally, it's good form to drop the temporary table when you're done:
最后,完成后删除临时表是一种很好的形式:
DROP TEMPORARY TABLE MyChunkedResult;
回答by prafi
The LIMIT OFFSET
approach slows query down when a size of the data is very large. Another approach is to use something called Keyset pagination. It requires a unique id in your query, which you can use as a bookmark to point to the last row of the previous page. The next page is fetched using the last bookmark. For instance:
LIMIT OFFSET
当数据的大小非常大时,该方法会减慢查询速度。另一种方法是使用称为 Keyset 分页的东西。它需要在您的查询中使用唯一 id,您可以将其用作书签以指向上一页的最后一行。使用最后一个书签获取下一页。例如:
SELECT user_id, name, date_created
FROM users
WHERE user_id > 0
ORDER BY user_id ASC
LIMIT 10 000;
If the resultset above returns the last row with user_id
as 12345
, you can use it to fetch the next page as follows:
如果上面的结果集返回带有user_id
as的最后一行12345
,您可以使用它来获取下一页,如下所示:
SELECT user_id, name, date_created
FROM users
WHERE user_id > 12345
ORDER BY user_id ASC
LIMIT 10 000;
For more details, you may take a look at this page.
有关更多详细信息,您可以查看此页面。
回答by Peter
Another approach for such a large dataset, to avoid the need to chunk the output, would be to query the relevant data into its own new table (not a temporary table) containing just the data you need, and then use mysqldump
to handle the export to file.
对于如此大的数据集,为避免需要对输出进行分块,另一种方法是将相关数据查询到仅包含您需要的数据的自己的新表(不是临时表)中,然后用于mysqldump
处理导出到文件。