java 创建数百万个小型临时对象的最佳实践

Question

提问by Humble Programmer

What are the "best practices" for creating (and releasing) millions of small objects?

创建（和发布）数百万个小对象的“最佳实践”是什么？

I am writing a chess program in Java and the search algorithm generates a single "Move" object for each possible move, and a nominal search can easily generate over a million move objects per second. The JVM GC has been able to handle the load on my development system, but I'm interested in exploring alternative approaches that would:

我正在用 Java 编写一个国际象棋程序，搜索算法为每个可能的移动生成一个“移动”对象，而名义搜索可以轻松地每秒生成超过一百万个移动对象。JVM GC 已经能够处理我的开发系统上的负载，但我有兴趣探索替代方法：

Minimize the overhead of garbage collection, and
reduce the peak memory footprint for lower-end systems.

最小化垃圾收集的开销，以及
减少低端系统的峰值内存占用。

A vast majority of the objects are very short-lived, but about 1% of the moves generated are persisted and returned as the persisted value, so any pooling or caching technique would have to provide the ability to exclude specific objects from being re-used.

绝大多数对象的生命周期都非常短，但是大约 1% 的生成的移动被持久化并作为持久化值返回，因此任何池化或缓存技术都必须提供排除特定对象被重用的能力.

I don't expect fully-fleshed out example code, but I would appreciate suggestions for further reading/research, or open source examples of a similar nature.

我不希望有完整的示例代码，但我会很感激关于进一步阅读/研究的建议，或者类似性质的开源示例。

Answer 1

回答by Niels Bech Nielsen

Run the application with verbose garbage collection:

使用详细垃圾收集运行应用程序：

java -verbose:gc

And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep.

它会告诉您何时收集。将有两种类型的扫描，快速扫描和完整扫描。

[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]

The arrow is before and after size.

箭头是前后尺寸。

As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want.

只要它只是在执行 GC 而不是完整的 GC，您就可以安全回家。常规 GC 是“年轻一代”中的复制收集器，因此不再引用的对象只是被遗忘，这正是您想要的。

Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuningis probably helpful.

阅读Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning可能会有所帮助。

Answer 2

回答by Mikhail

Since version 6, the server mode of JVM employs an escape analysistechnique. Using it you can avoid GC all together.

从版本 6 开始，JVM 的服务器模式采用了逃逸分析技术。使用它你可以一起避免 GC。

Answer 3

回答by Pierre Laporte

Well, there are several questions in one here !

好吧，这里有几个问题合二为一！

1 - How are short-lived objects managed ?

1 - 如何管理短期对象？

As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis.

如前所述，JVM 可以完美地处理大量的短期对象，因为它遵循弱世代假设。

Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop

请注意，我们说的是到达主内存（堆）的对象。这并非总是如此。您创建的许多对象甚至都不离开 CPU 寄存器。例如，考虑这个 for 循环

for(int i=0, i<max, i++) {
  // stuff that implies i
}

Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If maxis equal to Integer.MAX_VALUE, you loop might take some time to execute. However, the ivariable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory.

让我们不要考虑循环展开（JVM 在您的代码上大量执行的优化）。如果max等于Integer.MAX_VALUE，则循环可能需要一些时间来执行。但是，i变量永远不会逃脱循环块。因此，JVM 会将该变量放在 CPU 寄存器中，定期递增它，但永远不会将其发送回主内存。

So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them.

因此，如果仅在本地使用，创建数百万个对象并不是什么大问题。它们在被存储到 Eden 之前就会死亡，所以 GC 甚至不会注意到它们。

2 - Is it useful to reduce the overhead of the GC ?

2 - 减少 GC 的开销有用吗？

As usual, it depends.

像往常一样，这取决于。

First, you should enable GC logging to have a clear view about what is going on. You can enable it with -Xloggc:gc.log -XX:+PrintGCDetails.

首先，您应该启用 GC 日志记录以清楚了解正在发生的事情。您可以使用-Xloggc:gc.log -XX:+PrintGCDetails.

If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it.

如果您的应用程序在 GC 周期中花费了大量时间，那么，是的，调整 GC，否则，它可能真的不值得。

For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there.

例如，如果您每 100 毫秒有一个年轻的 GC 需要 10 毫秒，那么您将 10% 的时间花在 GC 上，并且您每秒有 10 个收集（这是 huuuuuge）。在这种情况下，我不会花任何时间进行 GC 调优，因为 10 GC/s 仍然存在。

3 - Some experience

3 - 一些经验

I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!).

我在创建大量给定类的应用程序上遇到了类似的问题。在 GC 日志中，我注意到应用程序的创建速度约为 3 GB/s，这太多了（拜托...每秒 3 GB 的数据？！）。

The problem : Too many frequent GC caused by too many objects being created.

问题：由于创建的对象太多而导致频繁的 GC。

In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available :

就我而言，我附加了一个内存分析器并注意到一个类代表了我所有对象的很大一部分。我追踪实例化发现这个类基本上是一对包裹在一个对象中的布尔值。在这种情况下，有两种解决方案可用：

Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately
Cache the objects, knowing that there were only 4 different instances

重新设计算法，这样我就不会返回一对布尔值，而是有两种方法分别返回每个布尔值
缓存对象，知道只有 4 个不同的实例

I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances).

我选择了第二个，因为它对应用程序的影响最小，并且易于引入。我花了几分钟才把一个带有非线程安全缓存的工厂（我不需要线程安全，因为我最终只有 4 个不同的实例）。

The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3).

分配率下降到 1 GB/s，年轻 GC 的频率也是如此（除以 3）。

Hope that helps !

希望有帮助！

Answer 4

回答by bestsss

If you have just value objects (that is, no references to other objects) and really but I mean really tons and tons of them, you can use direct ByteBufferswith native byte ordering [the latter is important] and you need some few hundred lines of code to allocate/reuse + getter/setters. Getters look similar to long getQuantity(int tupleIndex){return buffer.getLong(tupleInex+QUANTITY_OFFSSET);}

如果您只有值对象（即，没有对其他对象的引用）并且真的但我的意思是真的很多，那么您可以直接ByteBuffers使用本机字节排序[后者很重要]，并且您需要几百行分配/重用+ getter/setter的代码。吸气剂看起来类似于long getQuantity(int tupleIndex){return buffer.getLong(tupleInex+QUANTITY_OFFSSET);}

That would solve the GC problem almost entirely as long as you do allocate once only, that is, a huge chunk and then manage the objects yourself. Instead of references you'd have only index (that is, int) into the ByteBufferthat has to be passed along. You may need to do the memory align yourself as well.

只要您只分配一次，即一个巨大的块，然后自己管理对象，这几乎可以完全解决 GC 问题。而不是引用，您只有索引（即int）到ByteBuffer必须传递的。您可能还需要自己调整内存。

The technique would feel like using C and void*, but with some wrapping it's bearable. A performance downside could be bounds checking if the compiler fails to eliminate it. A major upside is the locality if you process the tuples like vectors, the lack of the object header reduces the memory footprint as well.

该技术感觉就像使用C and void*，但有一些包装它是可以忍受的。如果编译器未能消除它，则性能下降可能是边界检查。如果你像向量一样处理元组，一个主要的好处是局部性，缺少对象头也减少了内存占用。

Other than that, it's likely you'd not need such an approach as the young generation of virtually all JVM dies trivially and the allocation cost is just a pointer bump. Allocation cost can be a bit higher if you use finalfields as they require memory fence on some platforms (namely ARM/Power), on x86 it is free, though.

除此之外，您可能不需要这样的方法，因为几乎所有 JVM 的年轻代都会死掉，而且分配成本只是一个指针碰撞。如果您使用final字段，分配成本可能会更高一些，因为它们在某些平台（即 ARM/Power）上需要内存栅栏，但在 x86 上它是免费的。

Answer 5

回答by Nitsan Wakart

Assuming you find GC is an issue (as others point out it might not be) you will be implementing your own memory management for you special case i.e. a class which suffers massive churn. Give object pooling a go, I've seen cases where it works quite well. Implementing object pools is a well trodden path so no need to re-visit here, look out for:

假设您发现 GC 是一个问题（正如其他人指出的那样可能不是），您将为您的特殊情况实现自己的内存管理，即遭受大量流失的类。试一试对象池，我见过它运行良好的情况。实现对象池是一条很好的路径，因此无需在此处重新访问，请注意：

multi-threading: using thread local pools might work for your case
backing data structure: consider using ArrayDeque as it performs well on remove and has no allocation overhead
limit the size of your pool :)

多线程：使用线程本地池可能适用于您的情况
支持数据结构：考虑使用 ArrayDeque，因为它在移除时表现良好并且没有分配开销
限制池的大小:)

Measure before/after etc,etc

测量之前/之后等

Answer 6

回答by OldCurmudgeon

I dealt with this scenario with some XML processing code some time ago. I found myself creating millions of XML tag objects which were very small (usually just a string) and extremely short-lived (failure of an XPathcheck meant no-match so discard).

前段时间我用一些 XML 处理代码处理过这种情况。我发现自己创建了数百万个 XML 标记对象，这些对象非常小（通常只是一个字符串）并且非常短暂（XPath检查失败意味着不匹配，因此丢弃）。

I did some serious testing and came to the conclusion that I could only achieve about a 7% improvement on speed using a list of discarded tags instead of making new ones. However, once implemented I found that the free queue needed a mechanism added to prune it if it got too big - this completely nullified my optimisation so I switched it to an option.

我做了一些认真的测试，得出的结论是，使用丢弃的标签列表而不是制作新标签只能使速度提高约 7%。然而，一旦实现，我发现空闲队列需要添加一种机制来在它太大时修剪它——这完全使我的优化无效，所以我将它切换到一个选项。

In summary - probably not worth it - but I'm glad to see you are thinking about it, it shows you care.

总而言之 - 可能不值得 - 但我很高兴看到您正在考虑它，这表明您很关心。

Answer 7

回答by StanislavL

I've met a similar problem. First of all, try to reduce the size of the small objects. We introduced some default field values referencing them in each object instance.

我遇到过类似的问题。首先，尽量减小小物体的尺寸。我们在每个对象实例中引入了一些引用它们的默认字段值。

For example, MouseEvent has a reference to Point class. We cached Points and referenced them instead of creating new instances. The same for, for example, empty strings.

例如，MouseEvent 有对 Point 类的引用。我们缓存 Points 并引用它们而不是创建新实例。例如，空字符串也是如此。

Another source was multiple booleans which were replaced with one int and for each boolean we use just one byte of the int.

另一个来源是多个布尔值，它们被一个 int 替换，对于每个布尔值，我们只使用 int 的一个字节。

Answer 8

回答by David Plumpton

Given that you are writing a chess program there are some special techniques you can use for decent performance. One simple approach is to create a large array of longs (or bytes) and treat it as a stack. Each time your move generator creates moves it pushes a couple of numbers onto the stack, e.g. move from square and move to square. As you evaluate the search tree you will be popping off moves and updating a board representation.

鉴于您正在编写国际象棋程序，您可以使用一些特殊的技术来获得不错的表现。一种简单的方法是创建一个大的长整型（或字节）数组并将其视为堆栈。每次您的移动生成器创建移动时，它都会将几个数字压入堆栈，例如从一个方格移动到另一个方格。当您评估搜索树时，您将弹出移动并更新棋盘表示。

If you want expressive power use objects. If you want speed (in this case) go native.

如果您想要表现力，请使用对象。如果你想要速度（在这种情况下）原生。

Answer 9

回答by rkj

One solution I've used for such search algorithms is to create just one Move object, mutate it with new move, and then undo the move before leaving the scope. You are probably analyzing just one move at a time, and then just storing the best move somewhere.

我用于此类搜索算法的一种解决方案是仅创建一个 Move 对象，用新的移动对其进行变异，然后在离开作用域之前撤消移动。您可能一次只分析一个走法，然后将最佳走法存储在某处。

If that's not feasible for some reason, and you want to decrease peak memory usage, a good article about memory efficiency is here: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

如果由于某种原因这不可行，并且您想降低峰值内存使用量，请参阅有关内存效率的好文章：http: //www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-教程.pdf

Answer 10

回答by Ilya Gazman

I am not a big fan of GC, so I always try finding ways around it. In this case I would suggest using Object Pool pattern:

我不是 GC 的忠实粉丝，所以我总是尝试寻找解决方法。在这种情况下，我建议使用对象池模式：

The idea is to avoid creating new objects by store them in a stack so you can reuse it later.

这个想法是通过将它们存储在堆栈中来避免创建新对象，以便您以后可以重用它。

Class MyPool
{
   LinkedList<Objects> stack;

   Object getObject(); // takes from stack, if it's empty creates new one
   Object returnObject(); // adds to stack
}

java 创建数百万个小型临时对象的最佳实践

提问by Humble Programmer

回答by Niels Bech Nielsen

回答by Mikhail

回答by Pierre Laporte

回答by bestsss

回答by Nitsan Wakart

回答by OldCurmudgeon

回答by StanislavL

回答by David Plumpton

回答by rkj

回答by Ilya Gazman

相关推荐

最近更新

标签

java 创建数百万个小型临时对象的最佳实践

提问by Humble Programmer

回答by Niels Bech Nielsen

回答by Mikhail

回答by Pierre Laporte

回答by bestsss

回答by Nitsan Wakart

回答by OldCurmudgeon

回答by StanislavL

回答by David Plumpton

回答by rkj

回答by Ilya Gazman

相关推荐

java 将 txt 文件中的数据放入二维数组中

用 Java 在 Zip 文件夹中创建文件夹

java java中的矩阵类

java 解压缩 tar.gz 文件时出现问题

相关推荐

最近更新

标签