SQL 如果我停止长时间运行的查询,它会回滚吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/161960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:44:39  来源:igfitidea点击:

If I stop a long running query, does it rollback?

sqlsql-serverduplicate-data

提问by RyanKeeter

A query that is used to loop through 17 millions records to remove duplicateshas been running now for about 16 hoursand I wanted to know if the query is stopped right now if it will finalize the delete statements or if it has been deleting while running this query? Indeed, if I do stop it, does it finalize the deletes or rolls back?

一个用于循环遍历1700 万条记录以删除重复项的查询 现在已经运行了大约16 个小时,我想知道该查询是否立即停止,它是否会完成删除语句,或者它是否在运行时被删除询问?确实,如果我停止它,它会完成删除还是回滚?

I have found that when I do a

我发现当我做一个

 select count(*) from myTable

That the rows that it returns (while doing this query) is about 5 less than what the starting row count was. Obviously the server resources are extremely poor, so does that mean that this process has taken 16 hours to find 5 duplicates (when there are actually thousands), and this could be running for days?

它返回的行(在执行此查询时)比起始行数少约 5。显然服务器资源极其匮乏,所以这是否意味着这个过程花了 16 个小时才能找到 5 个重复项(实际上有数千个),并且这可能会运行数天?

This query took 6 seconds on 2000 rows of test data, and it works great on that set of data, so I figured it would take 15 hours for the complete set.

这个查询在 2000 行测试数据上花费了 6 秒,并且它在这组数据上运行得很好,所以我认为完整的数据集需要 15 个小时。

Any ideas?

有任何想法吗?

Below is the query:

下面是查询:

--Declare the looping variable
DECLARE @LoopVar char(10)


    DECLARE
     --Set private variables that will be used throughout
      @long DECIMAL,
      @lat DECIMAL,
      @phoneNumber char(10),
      @businessname varchar(64),
      @winner char(10)

    SET @LoopVar = (SELECT MIN(RecordID) FROM MyTable)

    WHILE @LoopVar is not null
    BEGIN

      --initialize the private variables (essentially this is a .ctor)
      SELECT 
        @long = null,
        @lat = null,
        @businessname = null,
        @phoneNumber = null,
        @winner = null

      -- load data from the row declared when setting @LoopVar  
      SELECT
        @long = longitude,
        @lat = latitude,
        @businessname = BusinessName,
        @phoneNumber = Phone
      FROM MyTable
      WHERE RecordID = @LoopVar

      --find the winning row with that data. The winning row means 
      SELECT top 1 @Winner = RecordID
      FROM MyTable
      WHERE @long = longitude
        AND @lat = latitude
        AND @businessname = BusinessName
        AND @phoneNumber = Phone
      ORDER BY
        CASE WHEN webAddress is not null THEN 1 ELSE 2 END,
        CASE WHEN caption1 is not null THEN 1 ELSE 2 END,
        CASE WHEN caption2 is not null THEN 1 ELSE 2 END,
        RecordID

      --delete any losers.
      DELETE FROM MyTable
      WHERE @long = longitude
        AND @lat = latitude
        AND @businessname = BusinessName
        AND @phoneNumber = Phone
        AND @winner != RecordID

      -- prep the next loop value to go ahead and perform the next duplicate query.
      SET @LoopVar = (SELECT MIN(RecordID) 
    FROM MyTable
    WHERE @LoopVar < RecordID)
    END

回答by user10635

no, sql server will not roll back the deletes it has already performed if you stop query execution. oracle requires an explicit committal of action queries or the data gets rolled back, but not mssql.

不,如果您停止查询执行,sql server 将不会回滚它已经执行的删除。oracle 需要明确提交操作查询或数据被回滚,但不需要 mssql。

with sql server it will not roll back unless you are specifically running in the context of a transaction and you rollback that transaction, or the connection closes without the transaction having been committed. but i don't see a transaction context in your above query.

使用 sql server 它不会回滚,除非您专门在事务的上下文中运行并且回滚该事务,或者连接在没有提交事务的情况下关闭。但我在上面的查询中没有看到交易上下文。

you could also try re-structuring your query to make the deletes a little more efficient, but essentially if the specs of your box are not up to snuff then you might be stuck waiting it out.

您也可以尝试重新构建您的查询以提高删除效率,但基本上如果您的盒子的规格不符合要求,那么您可能会等待它。

going forward, you should create a unique index on the table to keep yourself from having to go through this again.

展望未来,您应该在表上创建一个唯一索引,以免自己再次经历这个过程。

回答by jwanagel

Your query is not wrapped in a transaction, so it won't rollback the changes already made by the individual delete statements.

您的查询未包含在事务中,因此它不会回滚各个删除语句已经进行的更改。

I specifically tested this myself on my own SQL Server using the following query, and the ApplicationLog table was empty even though I cancelled the query:

我在自己的 SQL Server 上使用以下查询专门对此进行了测试,即使我取消了查询,ApplicationLog 表也是空的:

declare @count int
select @count = 5
WHILE @count > 0
BEGIN
  print @count
  delete from applicationlog;
  waitfor time '20:00';
  select @count = @count -1
END

However your query is likely to take many days or weeks, much longer then 15 hours. Your estimate that you can process 2000 records every 6 seconds is wrong because each iteration in your while loop will take significantly longer with 17 million rows then it does with 2000 rows. So unless your query takes significantly less then a second for 2000 rows, it will take days for all 17 million.

但是,您的查询可能需要数天或数周的时间,比 15 小时长得多。您估计每 6 秒可以处理 2000 条记录是错误的,因为 while 循环中的每次迭代在处理 1700 万行时比处理 2000 行花费的时间要长得多。因此,除非您的查询对 2000 行花费的时间明显少于一秒,否则所有 1700 万行都需要几天时间。

You should ask a new question on how you can delete duplicate rows efficiently.

您应该提出一个关于如何有效删除重复行的新问题。

回答by Rob Walker

If you don't do anything explicit about transactions then the connection will be in autocommit transactionsmode. In this mode every SQL statement is considered a transaction.

如果您没有对事务做任何明确的事情,那么连接将处于自动提交事务模式。在这种模式下,每条 SQL 语句都被视为一个事务。

The question is whether this means the individual SQL statements are transactions and are therefore being committed as you go, or whether the outer WHILE loop counts as a transaction.

问题是这是否意味着单个 SQL 语句是事务并因此在您进行时提交,或者外部 WHILE 循环是否算作事务。

There doesn't seem to be any discussion of this in the description of the WHILE construct on MSDN. However, since a WHILE statement can't directly modify the database it would seem logical that it doesn'tstart an auto-commit transaction.

MSDN上 WHILE 构造的描述中似乎没有对此进行任何讨论。但是,由于 WHILE 语句不能直接修改数据库,因此它不启动自动提交事务似乎是合乎逻辑的。

回答by Ricardo C

Implicit transactions

隐式交易

If no 'Implicit transactions' has been set, then each iteration in your loop committed the changes.

如果未设置“隐式事务”,则循环中的每次迭代都会提交更改。

It is possible for any SQL Server to be set with 'Implicit transactions'. This is a database setting (by default is OFF). You can also have implicit transactions in the properties of a particular query inside of Management Studio (right click in query pane>options), by default settings in the client, or a SET statement.

任何 SQL Server 都可以设置为“隐式事务”。这是一个数据库设置(默认为关闭)。您还可以在 Management Studio 内的特定查询的属性中使用隐式事务(在查询窗格中右键单击>选项)、客户端中的默认设置或 SET 语句。

SET IMPLICIT_TRANSACTIONS ON;

Either way, if this was the case, you would still need to execute an explicit COMMIT/ROLLBACK regardless of interruption of the query execution.

无论哪种方式,如果是这种情况,无论查询执行是否中断,您仍然需要执行显式 COMMIT/ROLLBACK。



Implicit transactions reference:

隐式交易参考:

http://msdn.microsoft.com/en-us/library/ms188317.aspx

http://msdn.microsoft.com/en-us/library/ms188317.aspx

http://msdn.microsoft.com/en-us/library/ms190230.aspx

http://msdn.microsoft.com/en-us/library/ms190230.aspx

回答by Corey Trager

I inherited a system which had logic something like yours implemented in SQL. In our case, we were trying to link together rows using fuzzy matching that had similar names/addresses, etc, and that logic was done purely in SQL. At the time I inherited it we had about 300,000 rows in the table and according to the timings, we calculated it would take A YEAR to match them all.

我继承了一个系统,它的逻辑类似于您在 SQL 中实现的逻辑。在我们的例子中,我们试图使用具有相似名称/地址等的模糊匹配将行链接在一起,并且该逻辑纯粹是在 SQL 中完成的。在我继承它的时候,表中有大约 300,000 行,根据时间安排,我们计算出匹配它们需要一年时间。

As an experiment to see how much faster I could do it outside of SQL, I wrote a program to dump the db table into flat files, read the flat files into a C++ program, build my own indexes, and do the fuzzy logic there, then reimport the flat files into the database. What took A YEAR in SQL took about 30 seconds in the C++ app.

作为一个实验,看看我可以在 SQL 之外做多快,我编写了一个程序将 db 表转储到平面文件中,将平面文件读入 C++ 程序,构建我自己的索引,并在那里执行模糊逻辑,然后将平面文件重新导入数据库。在 SQL 中花费一年的时间在 C++ 应用程序中花费了大约 30 秒。

So, my advice is, don't even try what you are doing in SQL. Export, process, re-import.

所以,我的建议是,甚至不要尝试你在 SQL 中所做的事情。出口、加工、再进口。

回答by Amy B

DELETES that have been performed up to this point will not be rolled back.

到目前为止已执行的 DELETES 不会回滚。



As the original author of the code in question, and having issued the caveat that performance will be dependant on indexes, I would propose the following items to speed this up.

作为相关代码的原始作者,并且发出了性能将取决于索引的警告,我将提出以下项目来加快速度。

RecordId better be PRIMARY KEY. I don't mean IDENTITY, I mean PRIMARY KEY. Confirm this using sp_help

RecordId 最好是 PRIMARY KEY。我不是说身份,我是说主键。使用 sp_help 确认这一点

Some index should be used in evaluating this query. Figure out which of these four columns has the least repeats and index that...

在评估此查询时应使用某些索引。找出这四列中哪一列具有最少的重复和索引...

SELECT *
FROM MyTable
WHERE @long = longitude
  AND @lat = latitude
  AND @businessname = BusinessName
  AND @phoneNumber = Phone

Before and After adding this index, check the query plan to see if index scanning has been added.

添加此索引前后,查看查询计划,查看是否添加了索引扫描。

回答by Aheho

I think this query would be much more efficient if it was re-written using a single-pass algorithm using a cursor. You would order you cursor table by longitude,latitude,BusinessName AND @phoneNumber. You'd step through the rows one at a time. If a row has the same longitude, latitude, businessname, and phonenumber as the previous row, then delete it.

我认为如果使用游标使用单遍算法重新编写此查询,效率会更高。您可以按经度、纬度、企业名称和 @phoneNumber 对光标表进行排序。您将逐行逐行。如果一行与前一行具有相同的经度、纬度、商家名称和电话号码,则将其删除。

回答by mancaus

As a loop your query will struggle to scale well, even with appropriate indexes. The query should be rewritten to a single statement, as per the suggestions in your previous questionon this.

作为循环,即使使用适当的索引,您的查询也很难很好地扩展。根据您上一个问题中关于此问题的建议,应将查询重写为单个语句。

If you're not running it explicitly within a transaction it will only roll back the executing statement.

如果您没有在事务中显式运行它,它只会回滚正在执行的语句。

回答by HLGEM

I think you need to seriously consider your methodolology. You need to start thinking in sets (although for performance you may need batch processing, but not row by row against a 17 million record table.)

我认为你需要认真考虑你的方法论。您需要开始考虑集合(尽管为了性能您可能需要批处理,但不是针对 1700 万条记录表逐行处理。)

First do all of your records have duplicates? I suspect not, so the first thing you wan to do is limit your processing to only those records which have duplicates. Since this is a large table and you may need to do the deletes in batches over time depending on what other processing is going on, you first pull the records you want to deal with into a table of their own that you then index. You can also use a temp table if you are going to be able to do this all at the same time without ever stopping it other wise create a table in your database and drop at the end.

首先你的所有记录都有重复吗?我怀疑不是,所以你要做的第一件事就是将你的处理限制在那些有重复的记录上。由于这是一个大表,您可能需要随着时间的推移进行批量删除,具体取决于正在进行的其他处理,因此您首先将要处理的记录拉入它们自己的表中,然后进行索引。您也可以使用临时表,如果您要能够同时执行所有这些操作而不会停止它,否则在您的数据库中创建一个表并在最后删除。

Something like (Note I didn't write the create index statments, I figure you can look that up yourself):

类似的东西(注意我没有写创建索引语句,我想你可以自己查一下):

SELECT min(m.RecordID), m.longitude, m.latitude, m.businessname, m.phone  
     into  #RecordsToKeep    
FROM MyTable   m
join 
(select longitude, latitude, businessname, phone
from MyTable
group by longitude, latitude, businessname, phone
having count(*) >1) a 
on a.longitude = m.longitude and a.latitude = m.latitude and
a.businessname = b.businessname and a.phone = b.phone 
group by  m.longitude, m.latitude, m.businessname, m.phone   
ORDER BY CASE WHEN m.webAddress is not null THEN 1 ELSE 2 END,        
    CASE WHEN m.caption1 is not null THEN 1 ELSE 2 END,        
    CASE WHEN m.caption2 is not null THEN 1 ELSE 2 END



while (select count(*) from #RecordsToKeep) > 0
begin
select top 1000 * 
into #Batch
from #RecordsToKeep

Delete m
from mytable m
join #Batch b 
        on b.longitude = m.longitude and b.latitude = m.latitude and
        b.businessname = b.businessname and b.phone = b.phone 
where r.recordid <> b.recordID

Delete r
from  #RecordsToKeep r
join #Batch b on r.recordid = b.recordid

end

Delete m
from mytable m
join #RecordsToKeep r 
        on r.longitude = m.longitude and r.latitude = m.latitude and
        r.businessname = b.businessname and r.phone = b.phone 
where r.recordid <> m.recordID

回答by endo64

Also try thinking another method to remove duplicate rows:

还可以尝试考虑另一种方法来删除重复的行:

delete t1 from table1 as t1 where exists (
    select * from table1 as t2 where
        t1.column1=t2.column1 and
        t1.column2=t2.column2 and
        t1.column3=t2.column3 and
        --add other colums if any
        t1.id>t2.id
)

I suppose that you have an integer id column in your table.

我想你的表中有一个整数 id 列。