在 Java 中，AtomicInteger compareAndSet() 与 synchronized 关键字的性能如何？

Question

提问by Alan Kent

I was implementing a FIFO queue of requests instances (preallocated request objects for speed) and started with using the "synchronized" keyword on the add method. The method was quite short (check if room in fixed size buffer, then add value to array). Using visualVM it appeared the thread was blocking more often than I liked ("monitor" to be precise). So I converted the code over to use AtomicInteger values for things such as keeping track of the current size, then using compareAndSet() in while loops (as AtomicInteger does internally for methods such as incrementAndGet()). The code now looks quite a bit longer.

我正在实现一个请求实例的 FIFO 队列（为了速度而预先分配的请求对象），并从在 add 方法上使用“同步”关键字开始。该方法很短（检查固定大小的缓冲区中是否有空间，然后将值添加到数组）。使用visualVM，线程似乎比我喜欢的更频繁地阻塞（准确地说是“监视”）。因此，我将代码转换为使用 AtomicInteger 值来处理诸如跟踪当前大小之类的事情，然后在 while 循环中使用 compareAndSet()（就像 AtomicInteger 在内部为 incrementAndGet() 等方法所做的那样）。代码现在看起来有点长。

What I was wondering is what is the performance overhead of using synchronized and shorter code versus longer code without the synchronized keyword (so should never block on a lock).

我想知道的是，使用synchronized 和较短的代码与不使用synchronized 关键字的较长代码相比，性能开销是多少（所以永远不要阻塞锁）。

Here is the old get method with the synchronized keyword:

这是带有 synchronized 关键字的旧 get 方法：

public synchronized Request get()
{
    if (head == tail)
    {
        return null;
    }
    Request r = requests[head];
    head = (head + 1) % requests.length;
    return r;
}

Here is the new get method without the synchronized keyword:

这是没有同步关键字的新 get 方法：

public Request get()
{
    while (true)
    {
        int current = size.get();
        if (current <= 0)
        {
            return null;
        }
        if (size.compareAndSet(current, current - 1))
        {
            break;
        }
    }

    while (true)
    {
        int current = head.get();
        int nextHead = (current + 1) % requests.length;
        if (head.compareAndSet(current, nextHead))
        {
            return requests[current];
        }
    }
}

My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc), even though the code is shorter.

我的猜测是，synchronized 关键字更糟糕，因为即使代码更短，也有阻塞锁的风险（可能导致线程上下文切换等）。

Thanks!

谢谢！

Answer 1

回答by Péter T?r?k

My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc)

我的猜测是 synchronized 关键字更糟，因为有阻塞锁的风险（可能导致线程上下文切换等）

Yes, in the common case you are right. Java Concurrency in Practicediscusses this in section 15.3.2:

是的，在一般情况下，您是对的。Java 并发实践在 15.3.2 节中讨论了这一点：

[...] at high contention levels locking tends to outperform atomic variables, but at more realistic contention levels atomic variables outperform locks. This is because a lock reacts to contention by suspending threads, reducing CPU usage and synchronization traffic on the shared memory bus. (This is similar to how blocking producers in a producer-consumer design reduces the load on consumers and thereby lets them catch up.) On the other hand, with atomic variables, contention management is pushed back to the calling class. Like most CAS-based algorithms, AtomicPseudoRandomreacts to contention by trying again immediately, which is usually the right approach but in a high-contention environment just creates more contention.
Before we condemn AtomicPseudoRandomas poorly written or atomic variables as a poor choice compared to locks, we should realize that the level of contention in Figure 15.1 is unrealistically high: no real program does nothing but contend for a lock or atomic variable. In practice, atomics tend to scale better than locks because atomics deal more effectively with typical contention levels.
The performance reversal between locks and atomics at differing levels of contention illustrates the strengths and weaknesses of each. With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance. (CAS-based algorithms also outperform lock-based ones on single-CPU systems, since a CAS always succeeds on a single-CPU system except in the unlikely case that a thread is preempted in the middle of the read-modify-write operation.)

[...] 在高争用级别，锁定的性能往往优于原子变量，但在更现实的争用级别，原子变量的性能优于锁。这是因为锁通过挂起线程来响应争用，减少共享内存总线上的 CPU 使用率和同步流量。（这类似于在生产者-消费者设计中阻塞生产者如何减少消费者的负载，从而让他们赶上。）另一方面，对于原子变量，争用管理被推回到调用类。与大多数基于 CAS 的算法一样，AtomicPseudoRandom通过立即重试来应对争用，这通常是正确的方法，但在高争用环境中只会产生更多争用。
在我们谴责AtomicPseudoRandom写得不好或原子变量与锁相比是一个糟糕的选择之前，我们应该意识到图 15.1 中的争用程度高得不切实际：没有真正的程序除了争用锁或原子变量之外什么都不做。在实践中，原子倾向于比锁更好地扩展，因为原子更有效地处理典型的争用级别。
在不同级别的争用中，锁和原子之间的性能逆转说明了各自的优势和劣势。对于中低争用，原子提供更好的可扩展性；对于高争用，锁提供更好的争用避免。（基于 CAS 的算法在单 CPU 系统上也优于基于锁的算法，因为 CAS 在单 CPU 系统上总是成功的，除非在读-修改-写操作中间线程被抢占的不太可能的情况。 )

(On the figures referred to by the text, Figure 15.1 shows that the performance of AtomicInteger and ReentrantLock is more or less equal when contention is high, while Figure 15.2 shows that under moderate contention the former outperforms the latter by a factor of 2-3.)

（在文中引用的图上，图15.1显示AtomicInteger和ReentrantLock的性能在高竞争时或多或少相等，而图15.2显示在中等竞争下前者的性能优于后者2-3倍.)

Update: on nonblocking algorithms

更新：关于非阻塞算法

As others have noted, nonblocking algorithms, although potentially faster, are more complex, thus more difficult to get right. A hint from section 15.4 of JCiA:

正如其他人所指出的，非阻塞算法虽然可能更快，但更复杂，因此更难正确处理。JCiA 第 15.4 节的提示：

Good nonblocking algorithms are known for many common data structures, including stacks, queues, priority queues, and hash tables, though designing new ones is a task best left to experts.
Nonblocking algorithms are considerably more complicated than their lock-based equivalents. The key to creating nonblocking algorithms is figuring out how to limit the scope of atomic changes to a single variable while maintaining data consistency. In linked collection classes such as queues, you can sometimes get away with expressing state transformations as changes to individual links and using an AtomicReferenceto represent each link that must be updated atomically.

良好的非阻塞算法以许多常见的数据结构而闻名，包括堆栈、队列、优先级队列和哈希表，尽管设计新算法最好留给专家来完成。
非阻塞算法比基于锁的等价算法复杂得多。创建非阻塞算法的关键是弄清楚如何在保持数据一致性的同时将原子更改的范围限制为单个变量。在链接的集合类（例如队列）中，有时您可以将状态转换表示为对单个链接的更改，并使用AtomicReference来表示必须以原子方式更新的每个链接。

Answer 2

回答by irreputable

I wonder if jvm already does a few spin before really suspending the thread. It anticipate that well written critical sections, like yours, are very short and complete almost immediately. Therefore it should optimistically busy-wait for, I don't know, dozens of loops, before giving up and suspending the thread. If that's the case, it should behave the same as your 2nd version.

我想知道 jvm 在真正挂起线程之前是否已经进行了一些旋转。它预计像您一样写得很好的关键部分非常短并且几乎立即完成。因此，在放弃和挂起线程之前，它应该乐观地忙碌等待，我不知道，几十个循环。如果是这种情况，它的行为应该与您的第二个版本相同。

what a profiler shows might be very different from what's realy happending in a jvm at full speed, with all kinds of crazy optimizations. it's better to measure and compare throughputs withoutprofiler.

分析器显示的内容可能与全速运行在 jvm 中的实际情况大不相同，并进行了各种疯狂的优化。最好在没有分析器的情况下测量和比较吞吐量。

Answer 3

回答by Alexander Pogrebnyak

Before doing this kind of synchronization optimizations, you really need a profiler to tell you that it's absolutely necessary.

在进行这种同步优化之前，您确实需要一个分析器来告诉您这是绝对必要的。

Yes, synchronized under some conditions may be slower than atomic operation, but compare your original and replacement methods. The former is really clear and easy to maintain, the latter, well it's definitely more complex. Because of this there may be very subtle concurrency bugs, that you will not find during initial testing. I already see one problem, sizeand headcan really get out of sync, because, though each of these operations is atomic, the combination is not, and sometimes this may lead to an inconsistent state.

是的，在某些情况下同步可能比原子操作慢，但请比较您的原始方法和替换方法。前者非常清晰且易于维护，而后者肯定更复杂。因此，可能存在非常微妙的并发错误，您在初始测试期间不会发现这些错误。我已经看到了一个问题，size并且head可能真的不同步，因为虽然这些操作中的每一个都是原子的，但组合不是，有时这可能会导致不一致的状态。

So, my advise:

所以，我的建议是：

Start simple
Profile
If performance is good enough, leave simple implementation as is
If you need performance improvement, then start to get clever (possibly using more specialized lock at first), and TEST, TEST, TEST

开始简单
轮廓
如果性能足够好，则保留简单的实现原样
如果你需要性能提升，那就开始变聪明（可能一开始使用更专业的锁），然后TEST、TEST、TEST

Answer 4

回答by Prof Mo

Here's code for a busy wait lock.

这是忙等待锁的代码。

public class BusyWaitLock
{
    private static final boolean LOCK_VALUE = true;
    private static final boolean UNLOCK_VALUE = false;
    private final static Logger log = LoggerFactory.getLogger(BusyWaitLock.class);

    /**
     * @author Rod Moten
     *
     */
    public class BusyWaitLockException extends RuntimeException
    {

        /**
         * 
         */
        private static final long serialVersionUID = 1L;

        /**
         * @param message
         */
        public BusyWaitLockException(String message)
        {
            super(message);
        }



    }

    private AtomicBoolean lock = new AtomicBoolean(UNLOCK_VALUE);
    private final long maximumWaitTime ; 

    /**
     * Create a busy wait lock with that uses the default wait time of two minutes.
     */
    public BusyWaitLock()
    {
        this(1000 * 60 * 2); // default is two minutes)
    }

    /**
     * Create a busy wait lock with that uses the given value as the maximum wait time.
     * @param maximumWaitTime - a positive value that represents the maximum number of milliseconds that a thread will busy wait.
     */
    public BusyWaitLock(long maximumWaitTime)
    {
        if (maximumWaitTime < 1)
            throw new IllegalArgumentException (" Max wait time of " + maximumWaitTime + " is too low. It must be at least 1 millisecond.");
        this.maximumWaitTime = maximumWaitTime;
    }

    /**
     * 
     */
    public void lock ()
    {
        long startTime = System.currentTimeMillis();
        long lastLogTime = startTime;
        int logMessageCount = 0;
        while (lock.compareAndSet(UNLOCK_VALUE, LOCK_VALUE)) {
            long waitTime = System.currentTimeMillis() - startTime;
            if (waitTime - lastLogTime > 5000) {
                log.debug("Waiting for lock. Log message # {}", logMessageCount++);
                lastLogTime = waitTime;
            }
            if (waitTime > maximumWaitTime) {
                log.warn("Wait time of {} exceed maximum wait time of {}", waitTime, maximumWaitTime);
                throw new BusyWaitLockException ("Exceeded maximum wait time of " + maximumWaitTime + " ms.");
            }
        }
    }

    public void unlock ()
    {
        lock.set(UNLOCK_VALUE);
    }
}

在 Java 中，AtomicInteger compareAndSet() 与 synchronized 关键字的性能如何？

提问by Alan Kent

回答by Péter T?r?k

Update: on nonblocking algorithms

更新：关于非阻塞算法

回答by irreputable

回答by Alexander Pogrebnyak

回答by Prof Mo

相关推荐

最近更新

标签

在 Java 中，AtomicInteger compareAndSet() 与 synchronized 关键字的性能如何？

提问by Alan Kent

回答by Péter T?r?k

Update: on nonblocking algorithms

更新：关于非阻塞算法

回答by irreputable

回答by Alexander Pogrebnyak

回答by Prof Mo

相关推荐

java log4j 的生产设置文件？

java LinkedHashSet 或 ArrayList

java <f:ajax> 无法将 <f:ajax> 附加到非 ClientBehaviorHolder 父级

请解释 Java 中的 RuntimeException 以及应该在哪里使用它

相关推荐

最近更新

标签