Java 中的 ArrayList 和多线程
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3589308/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ArrayList and Multithreading in Java
提问by user433500
Under what circumstances would an unsynchronized collection, say an ArrayList, cause a problem? I can't think of any, can someone please give me an example where an ArrayList causes a problem and a Vector solves it? I wrote a program that have 2 threads both modifying an arraylist that has one element. One thread puts "bbb" into the arraylist while the other puts "aaa" into the arraylist. I don't really see an instance where the string is half modified, I am on the right track here?
在什么情况下,未同步的集合(例如 ArrayList)会导致问题?我想不出任何问题,有人可以给我一个例子,其中 ArrayList 会导致问题,而 Vector 可以解决问题吗?我写了一个程序,它有 2 个线程,它们都修改了一个包含一个元素的数组列表。一个线程将“bbb”放入数组列表,而另一个将“aaa”放入数组列表。我真的没有看到字符串被修改一半的实例,我在这里走在正确的轨道上吗?
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
另外,我记得有人告诉我多个线程并不是真正同时运行,一个线程运行了一段时间,然后另一个线程运行(在具有单个 CPU 的计算机上)。如果这是正确的,两个线程怎么可能同时访问相同的数据?也许线程 1 将在修改某些内容的过程中停止,而线程 2 将启动?
Many Thanks in advance.
提前谢谢了。
回答by Tom
When will it cause trouble?
什么时候会引起麻烦?
Anytime that a thread is reading the ArrayList and the other one is writing, or when they are both writing. Here's a very known example.
任何时候一个线程正在读取 ArrayList 而另一个正在写入,或者当它们都在写入时。这是一个非常有名的例子。
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
另外,我记得有人告诉我多个线程并不是真正同时运行,一个线程运行了一段时间,然后另一个线程运行(在具有单个 CPU 的计算机上)。如果这是正确的,两个线程怎么可能同时访问相同的数据?也许线程 1 将在修改某些内容的过程中停止,而线程 2 将启动?
Yes, Single core cpus can execute only one instruction at a time (not really, pipelininghas been here for a while, but as a professor once said, thats "free" parallelism). Even though, each process running in your computer is only executed for a period of time, then it goes to an idle state. In that moment, another process may start/continue its execution. And then go into an idle state or finish. Processes execution are interleaved.
是的,单核 CPU 一次只能执行一条指令(不是真的,流水线已经存在一段时间了,但正如一位教授曾经说过的,那是“免费”并行性)。即使在您的计算机中运行的每个进程只执行一段时间,然后它会进入空闲状态。在那一刻,另一个进程可能会开始/继续执行。然后进入空闲状态或完成。进程执行是交错的。
With threads the same thing happens, only that they are contained inside a process. How they execute is dependant on the Operating System, but the concept remains the same. They change from active to idle constantly through their lifetime.
线程也会发生同样的事情,只是它们包含在进程中。它们的执行方式取决于操作系统,但概念保持不变。在他们的一生中,他们不断地从活跃变为空闲。
回答by Thilo
I don't really see an instance where the string is half modified, I am on the right track here?
我真的没有看到字符串被修改一半的实例,我在这里走在正确的轨道上吗?
That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.
那不会发生。然而,可能发生的情况是只有一个字符串被添加。或者在调用添加过程中发生异常。
can someone please give me an example where an ArrayList causes a problem and a Vector solves it?
有人可以给我一个例子,其中 ArrayList 会导致问题而 Vector 可以解决问题吗?
If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:
如果要从多个线程访问一个集合,则需要同步此访问。但是,仅使用 Vector 并不能真正解决问题。您不会遇到上述问题,但以下模式仍然无效:
// broken, even though vector is "thread-safe"
if (vector.isEmpty())
vector.add(1);
The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have. You need to synchronize in your application code (and then there is no need to use Vector).
Vector 本身不会损坏,但这并不意味着它无法进入您的业务逻辑不希望出现的状态。您需要在应用程序代码中进行同步(然后就不需要使用 Vector)了。
synchronized(list){
if (list.isEmpty())
list.add(1);
}
The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.
并发实用程序包还有许多集合,它们提供线程安全队列等所需的原子操作。
回答by Nikita Rybak
A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?
一个实际的例子。最后的列表应该包含 40 个项目,但对我来说它通常显示在 30 到 35 之间。猜猜为什么?
static class ListTester implements Runnable {
private List<Integer> a;
public ListTester(List<Integer> a) {
this.a = a;
}
public void run() {
try {
for (int i = 0; i < 20; ++i) {
a.add(i);
Thread.sleep(10);
}
} catch (InterruptedException e) {
}
}
}
public static void main(String[] args) throws Exception {
ArrayList<Integer> a = new ArrayList<Integer>();
Thread t1 = new Thread(new ListTester(a));
Thread t2 = new Thread(new ListTester(a));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(a.size());
for (int i = 0; i < a.size(); ++i) {
System.out.println(i + " " + a.get(i));
}
}
edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukarasked. (should've done it right away, when posting answer)
编辑
周围有更全面的解释(例如,Stephen C的帖子),但自从mfukar问起,我会发表一点评论。(应该在发布答案时立即完成)
This is the famous problem of incrementing integer from two different threads. There's a nice explanationin Sun's Java tutorial on concurrency. Only in that example they have --i
and ++i
and we have ++size
twice. (++size
is part of ArrayList#add
implementation.)
这是著名的从两个不同线程递增整数的问题。Sun 的 Java 并发教程中有一个很好的解释。只有在该示例他们有--i
和++i
我们有++size
两次。(++size
是ArrayList#add
实现的一部分。)
回答by Amit
The first part of youe query has been already answered. I will try to answer the second part :
您查询的第一部分已经得到回答。我将尝试回答第二部分:
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
另外,我记得有人告诉我多个线程并不是真正同时运行,一个线程运行了一段时间,然后另一个线程运行(在具有单个 CPU 的计算机上)。如果这是正确的,两个线程怎么可能同时访问相同的数据?也许线程 1 将在修改某些内容的过程中停止,而线程 2 将启动?
in wait-notify framework, the thread aquiring the lock on the object releases it when waiting on some condition. A great example is the producer-consumer problem. See here: link text
在等待通知框架中,获取对象锁定的线程在等待某个条件时释放它。一个很好的例子是生产者-消费者问题。请参阅此处:链接文本
回答by Stephen C
There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.
如果您在没有足够同步的情况下使用 ArrayList(例如),可能会出现三个方面的问题。
The first scenariois that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:
第一种情况是,如果两个线程碰巧同时更新 ArrayList,则它可能会损坏。例如,追加到列表的逻辑是这样的:
public void add(T element) {
if (!haveSpace(size + 1)) {
expand(size + 1);
}
elements[size] = element;
// HERE
size++;
}
Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE
and is preempted. The second thread comes along, and overwrites the slot in elements
that the first thread just updated with its own element, and then increments size
. When the first thread finally gets control, it updates size
. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null
to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)
现在假设我们有一个处理器/内核和两个线程在“同时”在同一个列表上执行此代码。假设第一个线程到达标记点HERE
并被抢占。第二个线程出现,并覆盖elements
第一个线程刚刚用它自己的元素更新的槽,然后增加size
。当第一个线程最终获得控制权时,它会更新size
。最终结果是我们添加了第二个线程的元素而不是第一个线程的元素,并且很可能还向null
列表中添加了 a 。(这只是说明性的。实际上,本机代码编译器可能对代码进行了重新排序,等等。但关键是如果同时发生更新,可能会发生不好的事情。)
The second scenarioarises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size
attribute. However, since size
is not volatile
, the new value of size
may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size()
on the list and get a stale value of size
. In the worst case, the second thread (calling get(int)
for example) might see inconsistent values of size
and the elements
array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)
在第二种情形出现因主存储器的内容在CPU的缓存内存缓存。假设我们有两个线程,一个向列表添加元素,第二个读取列表的大小。当线程添加一个元素时,它会更新列表的size
属性。然而,由于size
不是volatile
,size
可能不会立即将的新值写出到主内存中。相反,它可以位于缓存中,直到 Java 内存模型要求刷新缓存写入的同步点。同时,第二个线程可以调用size()
该列表并获得 的陈旧值size
。在最坏的情况下,第二个线程(get(int)
例如调用)可能会看到size
和elements
数组,导致意外异常。(请注意,即使只有一个内核且没有内存缓存,也可能发生这种问题。JIT 编译器可以自由地使用 CPU 寄存器来缓存内存内容,并且这些寄存器不会相对于它们的内存位置被刷新/刷新当发生线程上下文切换时。)
The third scenarioarises when you synchronize operations on the ArrayList
; e.g. by wrapping it as a SynchronizedList
.
在第三种情形,当你对操作进行同步出现ArrayList
; 例如通过将其包装为SynchronizedList
.
List list = Collections.synchronizedList(new ArrayList());
// Thread 1
List list2 = ...
for (Object element : list2) {
list.add(element);
}
// Thread 2
List list3 = ...
for (Object element : list) {
list3.add(element);
}
If thread2's list is an ArrayList
or LinkedList
and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException
. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list
a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequenceof list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.
如果 thread2 的列表是ArrayList
orLinkedList
并且两个线程同时运行,则线程 2 将失败并显示ConcurrentModificationException
。如果是其他(自制)列表,则结果不可预测。问题在于,list
对于由不同线程执行的一系列列表操作,制作同步列表不足以使其成为线程安全的。为此,应用程序通常需要在更高级别/更粗粒度上进行同步。
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU).
另外,我记得有人告诉我多个线程并不是真正同时运行,一个线程运行了一段时间,然后另一个线程运行(在具有单个 CPU 的计算机上)。
Correct. If there is only one core available to run the application, obviously only one thread gets to run at a time. This makes some of the hazards impossible and others become much less likely likely to occur. However, it is possible for the OS to switch from one thread to another thread at any point in the code, and at any time.
正确的。如果只有一个内核可用于运行应用程序,显然一次只能运行一个线程。这使得一些危险不可能发生,而另一些危险则不太可能发生。但是,操作系统有可能在代码中的任何点和任何时间从一个线程切换到另一个线程。
If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
如果这是正确的,两个线程怎么可能同时访问相同的数据?也许线程 1 将在修改某些内容的过程中停止,而线程 2 将启动?
Yup. That's possible. The probability of it happening is very small1but that just makes this kind of problem more insidious.
是的。这是可能的。它发生的概率非常小1但这只会使这种问题更加阴险。
1 - This is because thread time-slicing events are extremely infrequent, when measured on the timescale of hardware clock cycles.
1 - 这是因为在硬件时钟周期的时间尺度上测量时,线程时间切片事件非常罕见。
回答by fastcodejava
You cannot control when one thread will be stopped and other will start. Thread 1 will not wait until it has completely finished adding data. There is always possible to corrupt data.
您无法控制一个线程何时停止而另一个线程何时启动。线程 1 不会等到它完全完成添加数据。总是有可能损坏数据。