java EJB 如何并行化一个长的、CPU 密集型的进程?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2005934/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 18:53:39  来源:igfitidea点击:

How can an EJB parallelize a long, CPU intensive process?

javaarchitectureejbparallel-processing

提问by b.roth

The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.

该应用程序有一个 CPU 密集型的长进程,当客户端请求它时,该进程当前在一个服务器上串行运行(一种 EJB 方法)。

It's theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I'd like to use this parallelization to optimize performance.

理论上可以(从概念的角度来看)将该进程拆分为 N 个块并并行执行它们,只要可以收集所有并行作业的输出并将其连接在一起,然后再将其发送回启动该进程的客户端. 我想使用这种并行化来优化性能。

How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.

如何使用 EJB 实现这种并行化?我知道我们不应该在 EJB 方法中创建线程。相反,我们应该发布消息(每个作业一个)以供消息驱动 bean (MDB) 使用。但是这样就不再是同步调用了。在这种情况下,同步似乎是一个要求,因为我需要在将所有作业的输出发送回客户端之前收集它。

Is there a solution for this?

有解决方案吗?

采纳答案by Robin

This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.

这个特定的问题已多次出现,我将总结有几种可能的解决方案,我只推荐其中一种。

Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.

使用来自 commonj API 的 WorkManager。它允许在 Java EE 容器中使用托管线程,并且专门设计用于适合您的用例。如果您使用的是 WebSphere 或 WebLogic,这些 API 已经在您的服务器中可用。对于其他人,您必须自己使用第三方解决方案。

WorkManager info

工作经理信息

Related questions Why Spawning threads is discouraged

相关问题 为什么不鼓励生成线程

回答by Will Hartung

There are all sorts of ways to do this.

有各种各样的方法可以做到这一点。

One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.

一,您可以使用 EJB 计时器来创建将立即启动的一次性进程。这是在后台生成进程的好技术。EJB 计时器与特定的会话 Bean 实现相关联。您可以将 EJB 计时器添加到您希望能够执行此操作的每个会话 Bean,或者您可以拥有一个单独的会话 Bean,然后可以通过某种调度机制调用您的应用程序逻辑。

For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.

对我来说,我将一个可序列化的参数 blob 以及满足特定接口的类名传递给通用会话 Bean,然后执行该类。通过这种方式,我可以轻松地为大多数东西添加背景。

One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.

关于 EJB 计时器的一个警告是 EJB 计时器是持久的。一旦你创建了一个 EJB Timer,它就会留在容器中,直到它的工作完成或取消。问题在于,如果您有一个长时间运行的进程,并且服务器出现故障,那么当它重新启动时,该进程将继续并重新启动。请注意,这可能是一件好事,但前提是您的流程已准备好重新启动。但是,如果您有一个简单的过程迭代“10,000 个项目”,如果服务器在项目 9,999 上出现故障,当它恢复时,您可以轻松地看到它只是从项目 1 开始。这一切都可行,只是需要注意的。

Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.

另一种背景方式是您可以使用 JMS 队列。将一条消息放入队列,处理程序会从应用程序的其余部分同步运行。

The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.

这里的巧妙部分,也是我利用 Timer Bean 所做的工作,是您可以根据您配置系统拥有的 MDB 实例数量来控制将运行多少“作业”。

So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.

因此,对于在多个并行块中运行进程的特定任务,我接受该任务,将其分解为“片段”,然后将每个片段发送到消息队列中,MDB 在其中执行它们。如果我允许 10 个 MDB 实例,我可以同时运行任何任务的 10 个“部分”。

This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.

这实际上非常有效。将进程拆分并通过 JMS 队列路由它会产生一些开销,但这基本上就是“启动时间”。一旦开始,您将获得真正的好处。

Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.

使用消息队列的另一个好处是您可以在单独的机器上执行实际的长时间运行的进程,或者您可以轻松地创建一个机器集群来处理这些进程。然而,界面是一样的,代码不知道有什么区别。

I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.

我发现,一旦您将一个长时间运行的进程归为后台,您就会付出代价,即不能即时访问该进程。也就是说,没有理由直接监控正在执行的类本身,只需让它们将有趣的信息和统计信息发布到数据库或 JMX 或其他任何东西,而不是拥有可以直接监控对象的东西,因为它共享相同的内存空间。

I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.

我能够轻松地建立一个框架,让任务在 EJB 计时器或 MDB 分散队列上运行,任务是相同的,我可以监控它们的进度,停止它们等。

You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.

您可以结合分散技术来创建多个 EJB 计时器作业。MDB 的免费优势之一是它充当线程池,可以限制您的作业(因此您不会突然让太多后台进程使您的系统饱和)。只需利用容器中的 EJB 管理功能,您就可以“免费”获得它。

Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.

最后,Java EE 6 为会话 Bean 方法提供了一个新的“异步”(或其他)限定符。我不知道它是如何工作的细节,因为我还没有使用过新的 Java EE 6 容器。但我想您可能不会只想为这个设施更换容器。

回答by alphazero

An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.

EJB 最终是用于提供请求/回复语义的客户端-服务器系统的事务组件。如果您发现自己处于需要将长时间运行的事务归入请求/回复周期范围内的位置,那么您的系统架构师(ure)就在某个地方走错了路。

The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.

您描述的情况由具有消息传递后端的基于事件的架构干净且正确地处理。初始事件启动流程(然后可以通过让工作人员订阅事件主题来简单地并行化)并且聚合流程本身在完成时引发一个事件。您仍然可以在请求/回复周期的范围内压缩这些序列,但是您必然会违反 Java EE 系统架构规范的文字和精神。

回答by Alex Punnen

Back to the Future - Java EE 7 has lot more Concurrency support via ManagedThreadFactory, ManagedExecutor service etc (JSR 236: Concurrency Utilities for Java EE) with which you can create your own 'managed'Threads .It is no longer a taboo in EE AS supporting it (Wildfly ?) via usining the ManagedThread* API's

回到未来 - Java EE 7 通过 ManagedThreadFactory、ManagedExecutor 服务等(JSR 236:Concurrency Utilities for Java EE)提供更多并发支持,您可以使用它们创建自己的“托管”线程。它不再是 EE AS 中的禁忌通过使用 ManagedThread* API 来支持它(Wildfly ?)

More details

更多细节

https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdfhttp://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities002.htm

https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdf http://docs.oracle.com/javaee/7/tutorial/doc /concurrency-utilities002.htm

回答by Carl Smotricz

I once participated in a project where EJB transactions ran for up to 5 hours at a time. Aargh!

我曾经参与过一个项目,其中 EJB 事务一次最多运行 5 个小时。啊!

This same application also had a BEA specialist consultant who approved that they started additional threads from the transactions. While it's disrecommended in the specs and elsewhere, it doesn't automatically result in failure. You need to be aware that your extra threads are outside the container's control and thus if something goes wrong it's your fault. But if you can assure that the number of threads started in the worst case doesn't exceed reasonable limits, and that they all terminate cleanly within reasonable time, then it is quite possible to work like this. In fact, in your case it sounds like the almost-only solution.

该应用程序还有一位 BEA 专家顾问,该顾问批准他们从交易中启动额外线程。虽然它在规范和其他地方不推荐,但它不会自动导致失败。您需要意识到您的额外线程不在容器的控制范围内,因此如果出现问题,那是您的错。但是,如果您可以确保在最坏情况下启动的线程数不会超过合理的限制,并且它们都在合理的时间内干净地终止,那么很有可能像这样工作。事实上,就您而言,这听起来几乎是唯一的解决方案。

There are some slightly esoteric solutions possible where your EJB app reaches out to another app for a service, which then does the multithreading in itself before returning to the EJB caller. But this is essentially just shifting the problem around.

有一些稍微深奥的解决方案是可能的,其中您的 EJB 应用程序为服务伸出另一个应用程序,然后在返回到 EJB 调用方之前,它自己执行多线程处理。但这本质上只是在转移问题。

You may, however, consider a thread pooling solution to keep an upper limit on the number of threads spawned. If you have too many threads your application will behave horribly.

但是,您可以考虑使用线程池解决方案来保持生成的线程数的上限。如果您有太多线程,您的应用程序将表现得非常糟糕。

回答by ewernli

You've analyzed the situation quite well, and no, there is not patern for this that match the EJB model.

您已经很好地分析了情况,不,没有与 EJB 模型匹配的模式。

Creating threads is mainly forbidden because it bypass the app. server thread managementstrategy and also because of the transactions.

创建线程主要是被禁止的,因为它绕过了应用程序。服务器线程管理策略也因为事务

I worked on a project with similar requireements and I decided to spawn additional threads(going against the sepc then). The operation to parallelized was read-only, so it worked regarding the transaction (the thread would basically have not transaction associated to them). I also knew that I wouldn't spawn too many threads per EJB calls, so the number of threads was not an issue. But if your threads are supposed to modify data, then you break the transactional model of the EJB seriously. But if your operation in pure computing, that might be ok.

我在一个具有类似要求的项目上工作,我决定产生额外的线程(当时与 sepc 背道而驰)。并行化的操作是只读的,所以它对事务起作用(线程基本上没有与之关联的事务)。我也知道每次 EJB 调用不会产生太多线程,所以线程数不是问题。但是如果您的线程应该修改数据,那么您就严重破坏了 EJB 的事务模型。但是如果您在纯计算中操作,那可能没问题。

Hope it helps...

希望能帮助到你...