具有 SQL Server 数据库调用的多线程 C# 应用程序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9952137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 11:22:23  来源:igfitidea点击:

Multi threading C# application with SQL Server database calls

c#sqlsql-servermultithreadingarchitecture

提问by Barka

I have a SQL Server database with 500,000 records in table main. There are also three other tables called child1, child2, and child3. The many to many relationships between child1, child2, child3, and mainare implemented via the three relationship tables: main_child1_relationship, main_child2_relationship, and main_child3_relationship. I need to read the records in main, update main, and also insert into the relationship tables new rows as well as insert new records in the child tables. The records in the child tables have uniqueness constraints, so the pseudo-code for the actual calculation (CalculateDetails) would be something like:

我有一个 SQL Server 数据库,表中有 500,000 条记录main。还有其他三个表叫child1child2child3。很多之间一对多的关系child1child2child3,并main通过三个关系表来实现:main_child1_relationshipmain_child2_relationship,和main_child3_relationship。我需要读取main、 update 中的记录main,并在关系表中插入新行以及在子表中插入新记录。子表中的记录具有唯一性约束,因此实际计算的伪代码 (CalculateDetails) 将类似于:

for each record in main
{
   find its child1 like qualities
   for each one of its child1 qualities
   {
      find the record in child1 that matches that quality
      if found
      {
          add a record to main_child1_relationship to connect the two records
      }
      else
      {
          create a new record in child1 for the quality mentioned
          add a record to main_child1_relationship to connect the two records
      }
   }
   ...repeat the above for child2
   ...repeat the above for child3 
}

This works fine as a single threaded app. But it is too slow. The processing in C# is pretty heavy duty and takes too long. I want to turn this into a multi-threaded app.

这作为单线程应用程序运行良好。但它太慢了。C# 中的处理非常繁重并且需要很长时间。我想把它变成一个多线程应用程序。

What is the best way to do this? We are using Linq to Sql.

做这个的最好方式是什么?我们正在使用 Linq 到 Sql。

So far my approach has been to create a new DataContextobject for each batch of records from mainand use ThreadPool.QueueUserWorkItemto process it. However these batches are stepping on each other's toes because one thread adds a record and then the next thread tries to add the same one and ... I am getting all kinds of interesting SQL Server dead locks.

到目前为止,我的方法是DataContext为每批记录创建一个新对象main并用于ThreadPool.QueueUserWorkItem处理它。然而,这些批次正在互相踩踏,因为一个线程添加了一条记录,然后下一个线程尝试添加相同的记录……我遇到了各种有趣的 SQL Server 死锁。

Here is the code:

这是代码:

    int skip = 0;
    List<int> thisBatch;
    Queue<List<int>> allBatches = new Queue<List<int>>();
    do
    {
        thisBatch = allIds
                .Skip(skip)
                .Take(numberOfRecordsToPullFromDBAtATime).ToList();
        allBatches.Enqueue(thisBatch);
        skip += numberOfRecordsToPullFromDBAtATime;

    } while (thisBatch.Count() > 0);

    while (allBatches.Count() > 0)
    {
        RRDataContext rrdc = new RRDataContext();

        var currentBatch = allBatches.Dequeue();
        lock (locker)  
        {
            runningTasks++;
        }
        System.Threading.ThreadPool.QueueUserWorkItem(x =>
                    ProcessBatch(currentBatch, rrdc));

        lock (locker) 
        {
            while (runningTasks > MAX_NUMBER_OF_THREADS)
            {
                 Monitor.Wait(locker);
                 UpdateGUI();
            }
        }
    }

And here is ProcessBatch:

这是 ProcessBatch:

    private static void ProcessBatch( 
        List<int> currentBatch, RRDataContext rrdc)
    {
        var topRecords = GetTopRecords(rrdc, currentBatch);
        CalculateDetails(rrdc, topRecords);
        rrdc.Dispose();

        lock (locker)
        {
            runningTasks--;
            Monitor.Pulse(locker);
        };
    }

And

    private static List<Record> GetTopRecords(RecipeRelationshipsDataContext rrdc, 
                                              List<int> thisBatch)
    {
        List<Record> topRecords;

        topRecords = rrdc.Records
                    .Where(x => thisBatch.Contains(x.Id))
                    .OrderBy(x => x.OrderByMe).ToList();
        return topRecords;
    }

CalculateDetailsis best explained by the pseudo-code at the top.

CalculateDetails最好通过顶部的伪代码来解释。

I think there must be a better way to do this. Please help. Many thanks!

我认为必须有更好的方法来做到这一点。请帮忙。非常感谢!

采纳答案by Phil

Here's my take on the problem:

这是我对这个问题的看法:

  • When using multiple threads to insert/update/query data in SQL Server, or any database, then deadlocks are a fact of life. You have to assume they will occur and handle them appropriately.

  • That's not so say we shouldn't attempt to limit the occurence of deadlocks. However, it's easy to read up on the basic causes of deadlocksand take steps to prevent them, but SQL Server will always surprise you :-)

  • 当使用多线程在 SQL Server 或任何数据库中插入/更新/查询数据时,死锁是一个现实。你必须假设它们会发生并适当地处理它们。

  • 这并不是说我们不应该试图限制死锁的发生。但是,很容易了解死锁的基本原因并采取措施防止它们发生,但是 SQL Server 总是会让您感到惊讶:-)

Some reason for deadlocks:

死锁的一些原因:

  • Too many threads - try to limit the number of threads to a minimum, but of course we want more threads for maximum performance.

  • Not enough indexes. If selects and updates aren't selective enough SQL will take out larger range locks than is healthy. Try to specify appropriate indexes.

  • Too many indexes. Updating indexes causes deadlocks, so try to reduce indexes to the minimum required.

  • Transaction isolational level too high. The default isolation levelwhen using .NET is 'Serializable', whereas the default using SQL Server is 'Read Committed'. Reducing the isolation level can help a lot (if appropriate of course).

  • 线程太多 - 尝试将线程数限制为最少,但当然我们需要更多线程以获得最大性能。

  • 没有足够的索引。如果选择和更新没有足够的选择性,SQL 将取出比正常范围更大的范围锁。尝试指定适当的索引。

  • 索引太多。更新索引会导致死锁,因此请尝试将索引减少到所需的最小值。

  • 事务隔离级别太高。使用 .NET 时的默认隔离级别是“Serializable”,而使用 SQL Server 时的默认隔离级别是“Read Committed”。降低隔离级别可以有很大帮助(当然,如果合适的话)。

This is how I might tackle your problem:

这就是我可以解决您的问题的方式:

  • I wouldn't roll my own threading solution, I would use the TaskParallel library. My main method would look something like this:

    using (var dc = new TestDataContext())
    {
        // Get all the ids of interest.
        // I assume you mark successfully updated rows in some way
        // in the update transaction.
        List<int> ids = dc.TestItems.Where(...).Select(item => item.Id).ToList();
    
        var problematicIds = new List<ErrorType>();
    
        // Either allow the TaskParallel library to select what it considers
        // as the optimum degree of parallelism by omitting the 
        // ParallelOptions parameter, or specify what you want.
        Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = 8},
                            id => CalculateDetails(id, problematicIds));
    }
    
  • Execute the CalculateDetails method with retries for deadlock failures

    private static void CalculateDetails(int id, List<ErrorType> problematicIds)
    {
        try
        {
            // Handle deadlocks
            DeadlockRetryHelper.Execute(() => CalculateDetails(id));
        }
        catch (Exception e)
        {
            // Too many deadlock retries (or other exception). 
            // Record so we can diagnose problem or retry later
            problematicIds.Add(new ErrorType(id, e));
        }
    }
    
  • The core CalculateDetails method

    private static void CalculateDetails(int id)
    {
        // Creating a new DeviceContext is not expensive.
        // No need to create outside of this method.
        using (var dc = new TestDataContext())
        {
            // TODO: adjust IsolationLevel to minimize deadlocks
            // If you don't need to change the isolation level 
            // then you can remove the TransactionScope altogether
            using (var scope = new TransactionScope(
                TransactionScopeOption.Required,
                new TransactionOptions {IsolationLevel = IsolationLevel.Serializable}))
            {
                TestItem item = dc.TestItems.Single(i => i.Id == id);
    
                // work done here
    
                dc.SubmitChanges();
                scope.Complete();
            }
        }
    }
    
  • And of course my implementation of a deadlock retry helper

    public static class DeadlockRetryHelper
    {
        private const int MaxRetries = 4;
        private const int SqlDeadlock = 1205;
    
        public static void Execute(Action action, int maxRetries = MaxRetries)
        {
            if (HasAmbientTransaction())
            {
                // Deadlock blows out containing transaction
                // so no point retrying if already in tx.
                action();
            }
    
            int retries = 0;
    
            while (retries < maxRetries)
            {
                try
                {
                    action();
                    return;
                }
                catch (Exception e)
                {
                    if (IsSqlDeadlock(e))
                    {
                        retries++;
                        // Delay subsequent retries - not sure if this helps or not
                        Thread.Sleep(100 * retries);
                    }
                    else
                    {
                        throw;
                    }
                }
            }
    
            action();
        }
    
        private static bool HasAmbientTransaction()
        {
            return Transaction.Current != null;
        }
    
        private static bool IsSqlDeadlock(Exception exception)
        {
            if (exception == null)
            {
                return false;
            }
    
            var sqlException = exception as SqlException;
    
            if (sqlException != null && sqlException.Number == SqlDeadlock)
            {
                return true;
            }
    
            if (exception.InnerException != null)
            {
                return IsSqlDeadlock(exception.InnerException);
            }
    
            return false;
        }
    }
    
  • One further possibility is to use a partitioning strategy

  • 我不会推出自己的线程解决方案,我会使用 TaskParallel 库。我的主要方法看起来像这样:

    using (var dc = new TestDataContext())
    {
        // Get all the ids of interest.
        // I assume you mark successfully updated rows in some way
        // in the update transaction.
        List<int> ids = dc.TestItems.Where(...).Select(item => item.Id).ToList();
    
        var problematicIds = new List<ErrorType>();
    
        // Either allow the TaskParallel library to select what it considers
        // as the optimum degree of parallelism by omitting the 
        // ParallelOptions parameter, or specify what you want.
        Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = 8},
                            id => CalculateDetails(id, problematicIds));
    }
    
  • 执行CalculateDetails 方法并重试死锁失败

    private static void CalculateDetails(int id, List<ErrorType> problematicIds)
    {
        try
        {
            // Handle deadlocks
            DeadlockRetryHelper.Execute(() => CalculateDetails(id));
        }
        catch (Exception e)
        {
            // Too many deadlock retries (or other exception). 
            // Record so we can diagnose problem or retry later
            problematicIds.Add(new ErrorType(id, e));
        }
    }
    
  • 核心CalculateDetails方法

    private static void CalculateDetails(int id)
    {
        // Creating a new DeviceContext is not expensive.
        // No need to create outside of this method.
        using (var dc = new TestDataContext())
        {
            // TODO: adjust IsolationLevel to minimize deadlocks
            // If you don't need to change the isolation level 
            // then you can remove the TransactionScope altogether
            using (var scope = new TransactionScope(
                TransactionScopeOption.Required,
                new TransactionOptions {IsolationLevel = IsolationLevel.Serializable}))
            {
                TestItem item = dc.TestItems.Single(i => i.Id == id);
    
                // work done here
    
                dc.SubmitChanges();
                scope.Complete();
            }
        }
    }
    
  • 当然,我实现了一个死锁重试助手

    public static class DeadlockRetryHelper
    {
        private const int MaxRetries = 4;
        private const int SqlDeadlock = 1205;
    
        public static void Execute(Action action, int maxRetries = MaxRetries)
        {
            if (HasAmbientTransaction())
            {
                // Deadlock blows out containing transaction
                // so no point retrying if already in tx.
                action();
            }
    
            int retries = 0;
    
            while (retries < maxRetries)
            {
                try
                {
                    action();
                    return;
                }
                catch (Exception e)
                {
                    if (IsSqlDeadlock(e))
                    {
                        retries++;
                        // Delay subsequent retries - not sure if this helps or not
                        Thread.Sleep(100 * retries);
                    }
                    else
                    {
                        throw;
                    }
                }
            }
    
            action();
        }
    
        private static bool HasAmbientTransaction()
        {
            return Transaction.Current != null;
        }
    
        private static bool IsSqlDeadlock(Exception exception)
        {
            if (exception == null)
            {
                return false;
            }
    
            var sqlException = exception as SqlException;
    
            if (sqlException != null && sqlException.Number == SqlDeadlock)
            {
                return true;
            }
    
            if (exception.InnerException != null)
            {
                return IsSqlDeadlock(exception.InnerException);
            }
    
            return false;
        }
    }
    
  • 另一种可能性是使用分区策略

If your tables can naturally be partitioned into several distinct sets of data, then you can either use SQL Server partitioned tables and indexes, or you could manually splityour existing tables into several sets of tables. I would recommend using SQL Server's partitioning, since the second option would be messy. Also built-in partitioning is only available on SQL Enterprise Edition.

如果您的表可以自然地分区为多个不同的数据集,那么您可以使用SQL Server 分区表和索引,或者您可以手动将现有表拆分为多个表集。我建议使用 SQL Server 的分区,因为第二个选项会很麻烦。此外,内置分区仅适用于 SQL 企业版。

If partitioning is possible for you, you could choose a partion scheme that broke you data in lets say 8 distinct sets. Now you could use your original single threaded code, but have 8 threads each targetting a separate partition. Now there won't be any (or at least a minimum number of) deadlocks.

如果您可以进行分区,您可以选择一个分区方案,将您的数据分成 8 个不同的集合。现在您可以使用原始的单线程代码,但有 8 个线程,每个线程针对一个单独的分区。现在不会有任何(或至少最少数量的)死锁。

I hope that makes sense.

我希望这是有道理的。

回答by Ben

Overview

概述

The root of your problem is that the L2S DataContext, like the Entity Framework's ObjectContext, is not thread-safe. As explained in this MSDN forum exchange, support for asynchronous operations in the .NET ORM solutions is still pending as of .NET 4.0; you'll have to roll your own solution, which as you've discovered isn't always easy to do when your framework assume single-threadedness.

您问题的根源在于 L2S DataContext 与实体框架的 ObjectContext 一样,不是线程安全的。正如在这个 MSDN 论坛交流中解释的那样,从 .NET 4.0 开始,对 .NET ORM 解决方案中的异步操作的支持仍然悬而未决;您将不得不推出自己的解决方案,正如您所发现的,当您的框架采用单线程时,这并不总是那么容易。

I'll take this opportunity to note that L2S is built on top of ADO.NET, which itself fully supports asynchronous operation - personally, I would much prefer to deal directly with that lower layer and write the SQL myself, just to make sure that I fully understood what was transpiring over the network.

我将借此机会指出 L2S 建立在 ADO.NET 之上,ADO.NET 本身完全支持异步操作 - 就我个人而言,我更愿意直接处理该较低层并自己编写 SQL,只是为了确保我完全理解网络上正在发生的事情。

SQL Server Solution?

SQL Server 解决方案?

That being said, I have to ask - must this be a C# solution? If you can compose your solution out of a set of insert/update statements, you can just send over the SQL directly and your threading and performance problems vanish.* It seems to me that your problems are related not to the actual data transformations to be made, but center around making them performant from .NET. If .NET is removed from the equation, your task becomes simpler. After all, the best solution is often the one that has you writing the smallest amount of code, right? ;)

话虽如此,我不得不问 - 这必须是 C# 解决方案吗?如果您可以从一组插入/更新语句中组合出您的解决方案,您就可以直接发送 SQL,您的线程和性能问题就会消失。* 在我看来,您的问题与要进行的实际数据转换无关制作,但集中在使它们从 .NET 中提高性能。如果从等式中删除 .NET,您的任务将变得更简单。毕竟,最好的解决方案通常是让您编写最少代码的解决方案,对吗?;)

Even if your update/insert logic can't be expressed in a strictly set-relational manner, SQL Server does have a built-in mechanism for iterating over records and performing logic - while they are justly maligned for many use cases, cursors may in fact be appropriate for your task.

即使您的更新/插入逻辑不能以严格的设置关系方式表达,SQL Server 确实具有用于迭代记录和执行逻辑的内置机制 - 虽然它们在许多用例中受到了公正的诽谤,但游标可能在事实上适合您的任务。

If this is a task that has to happen repeatedly, you could benefit greatly from coding it as a stored procedure.

如果这是一项必须重复发生的任务,您可以将其编码为存储过程,从中受益匪浅。

*of course, long-running SQL brings its own problems like lock escalation and index usage that you'll have to contend with.

*当然,长时间运行的 SQL 会带来其自身的问题,例如您必须应对的锁升级和索引使用。

C# Solution

C# 解决方案

Of course, it may be that doing this in SQL is out of the question - maybe your code's decisions depend on data that comes from elsewhere, for example, or maybe your project has a strict 'no-SQL-allowed' convention. You mention some typical multithreading bugs, but without seeing your code I can't really be helpful with them specifically.

当然,在 SQL 中执行此操作可能是不可能的 - 例如,您的代码决策可能取决于来自其他地方的数据,或者您的项目可能具有严格的“禁止 SQL”约定。您提到了一些典型的多线程错误,但没有看到您的代码,我无法真正对它们提供帮助。

Doing this from C# is obviously viable, but you need to deal with the fact that a fixed amount of latency will exist for each and every call you make. You can mitigate the effects of network latency by using pooled connections, enabling multiple active result sets, and using the asynchronous Begin/End methods for executing your queries. Even with all of those, you will still have to accept that there is a cost to shipping data from SQL Server to your application.

从 C# 中执行此操作显然是可行的,但您需要处理这样一个事实,即您进行的每个调用都会存在固定数量的延迟。您可以通过使用池连接、启用多个活动结果集以及使用异步开始/结束方法来执行查询来减轻网络延迟的影响。即使有了所有这些,您仍然必须接受将数据从 SQL Server 传送到应用程序的成本。

One of the best ways to keep your code from stepping all over itself is to avoid sharing mutable data between threads as much as possible. That would mean not sharing the same DataContext across multiple threads. The next best approach is to lock critical sections of code that touch the shared data - lockblocks around all DataContext access, from the first read to the final write. That approach might just obviate the benefits of multithreading entirely; you can likely make your locking more fine-grained, but be ye warned that this is a path of pain.

防止代码自始至终单步执行的最佳方法之一是尽可能避免在线程之间共享可变数据。这意味着不能跨多个线程共享相同的 DataContext。下一个最好的方法是锁定接触共享数据的关键代码部分 -lock从第一次读取到最后一次写入,所有 DataContext 访问都被阻塞。这种方法可能会完全消除多线程的好处;您可能可以使您的锁定更细粒度,但请注意,这是一条痛苦的道路。

Far better is to keep your operations separate from each other entirely. If you can partition your logic across 'main' records, that's ideal - that is to say, as long as there aren't relationships between the various child tables, and as long as one record in 'main' doesn't have implications for another, you can split your operations across multiple threads like this:

更好的是让您的操作完全相互独立。如果您可以跨“主”记录划分逻辑,那是理想的 - 也就是说,只要各个子表之间没有关系,并且只要“主”中的一个记录对另一个,您可以将您的操作拆分到多个线程中,如下所示:

private IList<int> GetMainIds()
{
    using (var context = new MyDataContext())
        return context.Main.Select(m => m.Id).ToList();
}

private void FixUpSingleRecord(int mainRecordId)
{
    using (var localContext = new MyDataContext())
    {
        var main = localContext.Main.FirstOrDefault(m => m.Id == mainRecordId);

        if (main == null)
            return;

        foreach (var childOneQuality in main.ChildOneQualities)
        {
            // If child one is not found, create it
            // Create the relationship if needed
        }

        // Repeat for ChildTwo and ChildThree

        localContext.SaveChanges();
    }
}

public void FixUpMain()
{
    var ids = GetMainIds();
    foreach (var id in ids)
    {
        var localId = id; // Avoid closing over an iteration member
        ThreadPool.QueueUserWorkItem(delegate { FixUpSingleRecord(id) });
    }
}

Obviously this is as much a toy example as the pseudocode in your question, but hopefully it gets you thinking about how to scope your tasks such that there is no (or minimal) shared state between them. That, I think, will be the key to a correct C# solution.

显然,这与您问题中的伪代码一样是一个玩具示例,但希望它能让您思考如何确定任务的范围,以便它们之间没有(或最少)共享状态。我认为,这将是正确 C# 解决方案的关键。

EDIT Responding to updates and comments

编辑响应更新和评论

If you're seeing data consistency issues, I'd advise enforcing transaction semantics - you can do this by using a System.Transactions.TransactionScope (add a reference to System.Transactions). Alternately, you might be able to do this on an ADO.NET level by accessing the inner connection and calling BeginTransactionon it (or whatever the DataConnection method is called).

如果您看到数据一致性问题,我建议您强制执行事务语义 - 您可以通过使用 System.Transactions.TransactionScope(添加对 System.Transactions 的引用)来执行此操作。或者,您可以通过访问内部连接并调用BeginTransaction它(或调用任何 DataConnection 方法)在 ADO.NET 级别上执行此操作。

You also mention deadlocks. That you're battling SQL Server deadlocks indicates that the actual SQL queries are stepping on each other's toes. Without knowing what is actually being sent over the wire, it's difficult to say in detail what's happening and how to fix it. Suffice to say that SQL deadlocks result from SQL queries, and not necessarily from C# threading constructs - you need to examine what exactly is going over the wire. My gut tells me that if each 'main' record is truly independent of the others, then there shouldn't be a need for row and table locks, and that Linq to SQL is likely the culprit here.

您还提到了死锁。您正在与 SQL Server 死锁作斗争表明实际的 SQL 查询正在互相踩踏。在不知道实际通过线路发送什么的情况下,很难详细说明发生了什么以及如何修复它。可以说 SQL 死锁是由 SQL 查询引起的,而不一定是由 C# 线程构造引起的——您需要检查网络上到底发生了什么。我的直觉告诉我,如果每个“主”记录都真正独立于其他记录,那么就不需要行锁和表锁,而 Linq to SQL 可能是这里的罪魁祸首。

You can get a dump of the raw SQL emitted by L2S in your code by setting the DataContext.Logproperty to something e.g. Console.Out. Though I've never personally used it, I understand the LINQPad offers L2S facilities and you may be able to get at the SQL there, too.

通过将DataContext.Log属性设置为诸如 Console.Out 之类的内容,您可以在代码中获取 L2S 发出的原始 SQL 的转储。虽然我从未亲自使用过它,但我知道 LINQPad 提供了 L2S 设施,您也可以在那里获得 SQL。

SQL Server Management Studio will get you the rest of the way there - using the Activity Monitor, you can watch for lock escalation in real time. Using the Query Analyzer, you can get a view of exactly how SQL Server will execute your queries. With those, you should be able to get a good notion of what your code is doing server-side, and in turn how to go about fixing it.

SQL Server Management Studio 将帮助您完成剩下的工作 - 使用活动监视器,您可以实时监视锁升级。使用查询分析器,您可以准确了解 SQL Server 将如何执行您的查询。有了这些,您应该能够很好地了解您的代码在服务器端做了什么,以及如何修复它。

回答by Johnv2020

sql server deadlocks are normal & to be expected in this type of scenario - MS's recommendation is that these should be handled on the application siderather than the db side.

sql server 死锁是正常的,在这种情况下是可以预料的 - MS 的建议是这些应该在应用程序端而不是 db 端处理。

However if you do need to make sure that a stored procedure is only called once then you can use a sql mutex lock using sp_getapplock. Here's an example of how to implement this

但是,如果您确实需要确保存储过程只调用一次,那么您可以使用 sp_getapplock 使用 sql 互斥锁。这是一个如何实现这个的例子

BEGIN TRAN
DECLARE @mutex_result int;
EXEC @mutex_result = sp_getapplock @Resource = 'CheckSetFileTransferLock',
 @LockMode = 'Exclusive';

IF ( @mutex_result < 0)
BEGIN
    ROLLBACK TRAN

END

-- do some stuff

EXEC @mutex_result = sp_releaseapplock @Resource = 'CheckSetFileTransferLock'
COMMIT TRAN  

回答by Scott Smith

This may be obvious, but looping through each tuple and doing your work in your servlet container involves a lot of per-record overhead.

这可能是显而易见的,但是循环遍历每个元组并在 servlet 容器中进行工作涉及大量的每条记录开销。

If possible, move some or all of that processing to the SQL server by rewriting your logic as one or more stored procedures.

如果可能,通过将您的逻辑重写为一个或多个存储过程,将部分或全部处理移至 SQL 服务器。

回答by Jirka Hanika

I would recommend moving all the XML processing into the SQL server, too. Not only will all your deadlocks disappear, but you will see such a boost in performance that you will never want to go back.

我也建议将所有 XML 处理转移到 SQL 服务器中。不仅所有的僵局都会消失,而且您会看到性能的大幅提升,以至于您再也不想回去了。

It will be best explained by an example. In this example I assume that the XML blob already is going into your main table (I call it closet). I will assume the following schema:

最好通过一个例子来解释。在这个例子中,我假设 XML blob 已经进入你的主表(我称之为壁橱)。我将假设以下架构:

CREATE TABLE closet (id int PRIMARY KEY, xmldoc ntext) 
CREATE TABLE shoe(id int PRIMARY KEY IDENTITY, color nvarchar(20))
CREATE TABLE closet_shoe_relationship (
    closet_id int REFERENCES closet(id),
    shoe_id int REFERENCES shoe(id)
)

And I expect that your data (main table only) initially looks like this:

我希望您的数据(仅限主表)最初如下所示:

INSERT INTO closet(id, xmldoc) VALUES (1, '<ROOT><shoe><color>blue</color></shoe></ROOT>')
INSERT INTO closet(id, xmldoc) VALUES (2, '<ROOT><shoe><color>red</color></shoe></ROOT>')

Then your whole task is as simple as the following:

那么你的整个任务就像以下一样简单:

INSERT INTO shoe(color) SELECT DISTINCT CAST(CAST(xmldoc AS xml).query('//shoe/color/text()') AS nvarchar) AS color from closet
INSERT INTO closet_shoe_relationship(closet_id, shoe_id) SELECT closet.id, shoe.id FROM shoe JOIN closet ON CAST(CAST(closet.xmldoc AS xml).query('//shoe/color/text()') AS nvarchar) = shoe.color

But given that you will do a lot of similar processing, you can make your life easier by declaring your main blob as XML type, and further simplifying to this:

但考虑到您将进行很多类似的处理,您可以通过将主 blob 声明为 XML 类型并进一步简化为以下内容,从而使您的生活更轻松:

INSERT INTO shoe(color)
    SELECT DISTINCT CAST(xmldoc.query('//shoe/color/text()') AS nvarchar)
    FROM closet
INSERT INTO closet_shoe_relationship(closet_id, shoe_id)
    SELECT closet.id, shoe.id
    FROM shoe JOIN closet
        ON CAST(xmldoc.query('//shoe/color/text()') AS nvarchar) = shoe.color

There are additional performance optimizations possible, like pre-computing repeatedly invoked Xpath results in a temporary or permanent table, or converting the initial population of the main table into a BULK INSERT, but I don't expect that you will really need those to succeed.

还有其他可能的性能优化,例如预先计算重复调用的 Xpath 结果在临时或永久表中,或将主表的初始填充转换为 BULK INSERT,但我不希望您真的需要这些才能成功.

回答by Fabske

If

如果

  • You don't have a lot of time to spend on this issue and need it to fix it right now
  • You are sure that your code is done so that different thread will NOT modify the same record
  • You are not afraid
  • 您没有太多时间花在这个问题上,现在需要它来解决它
  • 您确定您的代码已完成,以便不同的线程不会修改相同的记录
  • 你不怕

Then ... you can just add "WITH NO LOCK" to your queries so that MSSQL doesn't apply the locks.

然后……您可以在查询中添加“WITH NO LOCK”,以便 MSSQL 不应用锁。

To use with caution :)

谨慎使用:)

But anyway, you didn't tell us where the time is lost (in the mono-threaded version). Because if it's in the code, I'll advise you to write everything in the DB directly to avoid continuous data exchange. If it's in the db, I'll advise to check index (too much ?), i/o, cpu etc.

但是无论如何,您没有告诉我们时间在哪里丢失(在单线程版本中)。因为如果是在代码里,我建议你直接把所有的东西都写在DB里,避免连续的数据交换。如果它在数据库中,我会建议检查索引(太多?)、i/o、cpu 等。