SQL 使用数据库表作为队列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2177880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:14:58  来源:igfitidea点击:

Using a database table as a queue

sqldatabasesql-server-2008queue

提问by Shayan

I want to use a database table as a queue. I want to insert in it and take elements from it in the inserted order (FIFO). My main consideration is performance because I have thousands of these transactions each second. So I want to use a SQL query that gives me the first element without searching the whole table. I do not remove a row when I read it. Does SELECT TOP 1 ..... help here? Should I use any special indexes?

我想使用数据库表作为队列。我想插入其中并按插入顺序(FIFO)从中获取元素。我主要考虑的是性能,因为我每秒有数千个这样的交易。所以我想使用一个 SQL 查询,它为我提供第一个元素而不搜索整个表。我在阅读时不会删除一行。SELECT TOP 1 ..... 在这里有帮助吗?我应该使用任何特殊索引吗?

采纳答案by AdaTheDev

I'd use an IDENTITY field as the primary key to provide the uniquely incrementing ID for each queued item, and stick a clustered index on it. This would represent the order in which the items were queued.

我会使用 IDENTITY 字段作为主键,为每个排队的项目提供唯一递增的 ID,并在其上粘贴聚集索引。这将代表项目排队的顺序。

To keep the items in the queue table while you process them, you'd need a "status" field to indicate the current status of a particular item (e.g. 0=waiting, 1=being processed, 2=processed). This is needed to prevent an item be processed twice.

要在处理项目时将项目保留在队列表中,您需要一个“状态”字段来指示特定项目的当前状态(例如 0=等待、1=正在处理、2=正在处理)。这是防止一个项目被处理两次所必需的。

When processing items in the queue, you'd need to find the next item in the table NOT currently being processed. This would need to be in such a way so as to prevent multiple processes picking up the same item to process at the same time as demonstrated below. Note the table hintsUPDLOCK and READPAST which you should be aware of when implementing queues.

处理队列中的项目时,您需要找到当前未处理的表中的下一个项目。这将需要以这样的方式来防止多个进程同时选择同一个项目进行处理,如下所示。请注意表提示UPDLOCK 和 READPAST,您在实现队列时应该注意它们。

e.g. within a sproc, something like this:

例如在一个 sproc 中,像这样:

DECLARE @NextID INTEGER

BEGIN TRANSACTION

-- Find the next queued item that is waiting to be processed
SELECT TOP 1 @NextID = ID
FROM MyQueueTable WITH (UPDLOCK, READPAST)
WHERE StateField = 0
ORDER BY ID ASC

-- if we've found one, mark it as being processed
IF @NextId IS NOT NULL
    UPDATE MyQueueTable SET Status = 1 WHERE ID = @NextId

COMMIT TRANSACTION

-- If we've got an item from the queue, return to whatever is going to process it
IF @NextId IS NOT NULL
    SELECT * FROM MyQueueTable WHERE ID = @NextID

If processing an item fails, do you want to be able to try it again later? If so, you'll need to either reset the status back to 0 or something. That will require more thought.

如果处理项目失败,您是否希望稍后再试一次?如果是这样,您需要将状态重置为 0 或其他。这将需要更多的思考。

Alternatively, don't use a database table as a queue, but something like MSMQ - just thought I'd throw that in the mix!

或者,不要使用数据库表作为队列,而是使用 MSMQ 之类的东西 - 只是想我会把它混为一谈!

回答by Peter Lang

If you do not remove your processed rows, then you are going to need some sort of flag that indicates that a row has already been processed.

如果您不删除已处理的行,那么您将需要某种标志来指示某行已被处理。

Put an index on that flag, and on the column you are going to order by.

在该标志和您要订购的列上放置一个索引。

Partition your table over that flag, so the dequeued transactions are not clogging up your queries.

通过该标志对您的表进行分区,这样出列的事务就不会阻塞您的查询。

If you would really get 1.000messages every second, that would result in 86.400.000rows a day. You might want to think of some way to clean up old rows.

如果您真的1.000每秒收到消息,那将导致86.400.000一天的行数。您可能想想办法清理旧行。

回答by bluszcz

Everything depends on your database engine/implementation.

一切都取决于您的数据库引擎/实现。

For me simple queues on tables with following columns:

对我来说,带有以下列的表上的简单队列:

id / task / priority / date_added

usually works.

通常有效。

I used priority and task to group tasks and in case of doubled task i choosed the one with bigger priority.

我使用优先级和任务对任务进行分组,如果任务加倍,我选择优先级更高的任务。

And don't worry - for modern databases "thousands" is nothing special.

不要担心 - 对于现代数据库来说,“数千”并不是什么特别的。

回答by David Berger

This will not be any trouble at all as long as you use something to keep track of the datetime of the insert. See here for the mysql options. The question is whether you only ever need the absolute most recently submitted item or whether you need to iterate. If you need to iterate, then what you need to do is grab a chunk with an ORDER BYstatement, loop through, and remember the last datetimeso that you can use that when you grab your next chunk.

只要您使用某些东西来跟踪插入的日期时间,这根本不会有任何问题。请参阅此处了解mysql 选项。问题是您是否只需要绝对最近提交的项目,或者您是否需要迭代。如果您需要迭代,那么您需要做的是用ORDER BY语句抓取一个块,循环并记住最后的日期时间,以便您在抓取下一个块时可以使用它。

回答by Quassnoi

Since you don't delete the records from the table, you need to have a composite index on (processed, id), where processedis the column that indicates if the current record had been processed.

由于您没有从表中删除记录,因此您需要在 上有一个复合索引(processed, id),其中processed是指示当前记录是否已被处理的列。

The best thing would be creating a partitioned table for your records and make the PROCESSEDfield the partitioning key. This way, you can keep three or more local indexes.

最好的办法是为您的记录创建一个分区表,并使该PROCESSED字段成为分区键。这样,您可以保留三个或更多本地索引。

However, if you always process the records in idorder, and have only two states, updating the record would mean just taking the record from the first leaf of the index and appending it to the last leaf

但是,如果您总是按id顺序处理记录,并且只有两种状态,则更新记录意味着只需从索引的第一个叶子中取出记录并将其附加到最后一个叶子上

The currently processed record would always have the least idof all unprocessed records and the greatest idof all processed records.

当前处理的记录将始终具有id所有未处理记录中最少的记录和所有已处理记录中最大id的记录。

回答by Reed Debaets

perhaps adding a LIMIT=1 to your select statement would help ... forcing the return after a single match...

也许在你的选择语句中添加一个 LIMIT=1 会有所帮助......在单场比赛后强制返回......

回答by David Schmitt

Create a clustered index over a date (or autoincrement) column. This will keep the rows in the table roughly in index order and allow fast index-based access when you ORDER BYthe indexed column. Using TOP X(or LIMIT X, depending on your RDMBS) will then only retrieve the first x items from the index.

在日期(或自动增量)列上创建聚集索引。这将使表中的行大致按索引顺序排列,并ORDER BY在索引列时允许基于索引的快速访问。使用TOP X(或LIMIT X,取决于您的 RDMBS) 将仅从索引中检索前 x 项。

Performance warning: you should always review the execution plans of your queries (on real data) to verify that the optimizer doesn't do unexpected things. Also try to benchmark your queries (again on real data) to be able to make informed decisions.

性能警告:您应该始终查看查询的执行计划(在真实数据上)以验证优化器没有做意外的事情。还要尝试对您的查询(再次基于真实数据)进行基准测试,以便能够做出明智的决定。

回答by Daniel Kaplan

I had the same general question of "how do I turn a table into a queue" and couldn't find the answer I wanted anywhere.

我有同样的一般性问题“如何将桌子变成队列”,但在任何地方都找不到我想要的答案。

Here is what I came up with for Node/SQLite/better-sqlite3. Basically just modify the inner WHEREand ORDER BYclauses for your use case.

这是我为 Node/SQLite/better-sqlite3 提出的。基本上只需修改您的用例的内部WHEREORDER BY子句。

module.exports.pickBatchInstructions = (db, batchSize) => {
  const buf = crypto.randomBytes(8); // Create a unique batch identifier

  const q_pickBatch = `
    UPDATE
      instructions
    SET
      status = '${status.INSTRUCTION_INPROGRESS}',  
      run_id = '${buf.toString("hex")}',
      mdate = datetime(datetime(), 'localtime')
    WHERE
      id IN (SELECT id 
        FROM instructions 
        WHERE 
          status is not '${status.INSTRUCTION_COMPLETE}'
          and run_id is null
        ORDER BY
          length(targetpath), id
        LIMIT ${batchSize});
  `;
  db.run(q_pickBatch); // Change the status and set the run id

  const q_getInstructions = `
    SELECT
      *
    FROM
      instructions
    WHERE
      run_id = '${buf.toString("hex")}'
  `;
  const rows = db.all(q_getInstructions); // Get all rows with this batch id

  return rows;
};

回答by George Mavritsakis

A very easy solution for this in order not to have transactions, locks etc is to use the change tracking mechanisms (not data capture). It utilizes versioning for each added/updated/removed row so you can track what changes happened after a specific version.

为了没有事务、锁等,一个非常简单的解决方案是使用更改跟踪机制(而不是数据捕获)。它对每个添加/更新/删除的行使用版本控制,因此您可以跟踪特定版本之后发生的更改。

So, you persist the last version and query the new changes.

因此,您保留上一个版本并查询新更改。

If a query fails, you can always go back and query data from the last version. Also, if you want to not get all changes with one query, you can get top n order by last version and store the greatest version I'd you have got to query again.

如果查询失败,您可以随时返回并查询上一版本的数据。此外,如果您不想通过一次查询获得所有更改,您可以按上一版本获得前 n 个顺序并存储我必须再次查询的最大版本。

See this for example Using Change Tracking in SQL Server 2008

请参阅此示例在 SQL Server 2008 中使用更改跟踪