Java 中 GPGPU/CUDA/OpenCL 的最佳方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2633483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 10:07:23  来源:igfitidea点击:

Best approach for GPGPU/CUDA/OpenCL in Java?

javacudagpgpuopencl

提问by Frederik

General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing.

图形处理单元上的通用计算 ( GPGPU) 是一个非常有吸引力的概念,可以利用 GPU 的力量进行任何类型的计算。

I'd love to use GPGPU for image processing, particles, and fast geometric operations.

我喜欢使用 GPGPU 进行图像处理、粒子和快速几何运算。

Right now, it seems the two contenders in this space are CUDA and OpenCL. I'd like to know:

目前,该领域的两个竞争者似乎是 CUDA 和 OpenCL。我想知道:

  • Is OpenCL usable yet from Java on Windows/Mac?
  • What are the libraries ways to interface to OpenCL/CUDA?
  • Is using JNA directly an option?
  • Am I forgetting something?
  • OpenCL 是否可以在 Windows/Mac 上通过 Java 使用?
  • 与 OpenCL/CUDA 接口的库方式有哪些?
  • 直接使用 JNA 是一种选择吗?
  • 我是不是忘记了什么?

Any real-world experience/examples/war stories are appreciated.

任何现实世界的经验/例子/战争故事都值得赞赏。

采纳答案by zOlive

AFAIK, JavaCL / OpenCL4Javais the only OpenCL binding that is available on all platforms right now (including MacOS X, FreeBSD, Linux, Windows, Solaris, all in Intel 32, 64 bits and ppc variants, thanks to its use of JNA).

AFAIK,JavaCL / OpenCL4Java是目前唯一可在所有平台上使用的 OpenCL 绑定(包括 MacOS X、FreeBSD、Linux、Windows、Solaris,由于使用了JNA,所有这些都支持 Intel 32、64 位和 ppc 变体)。

It has demos that actually run fine from Java Web Start at least on Mac and Windows (to avoid random crashes on Linux, please see this wiki page, such as this Particles Demo.

它具有至少在 Mac 和 Windows 上从 Java Web Start 实际运行良好的演示(为了避免在 Linux 上随机崩溃,请参阅这个 wiki 页面,例如这个Particles Demo

It also comes with a few utilities (GPGPU random number generation, basic parallel reduction, linear algebra) and a Scala DSL.

它还带有一些实用程序(GPGPU 随机数生成、基本并行缩减、线性代数)和Scala DSL

Finally, it's the oldest bindings available (since june 2009) and it has an active user community.

最后,它是可用的最旧的绑定(自 2009 年 6 月以来),并且拥有活跃的用户社区

(Disclaimer: I'm JavaCL's author :-))

(免责声明:我是JavaCL的作者 :-))

回答by Ivan

Well CUDA is a modification of C, to write CUDA kernel you have to code in C, and then compile to executable form with nvidia's CUDA compiler. Produced native code could then be linked with Java using JNI. So technically you can't write kernel code from Java. There is JCUDA http://www.jcuda.de/jcuda/JCuda.html, it provides you with cuda's apis for general memory/device menagement and some Java methods that are implemented in CUDA and JNI wrapped (FFT, some linear algebra methods.. etc etc..).

嗯,CUDA 是对 C 的修改,要编写 CUDA 内核,您必须用 C 进行编码,然后使用 nvidia 的 CUDA 编译器编译为可执行形式。然后可以使用 JNI 将生成的本机代码与 Java 链接。所以从技术上讲,你不能用 Java 编写内核代码。有 JCUDA http://www.jcuda.de/jcuda/JCuda.html,它为您提供了用于一般内存/设备管理的 cuda apis 和一些在 CUDA 和 JNI 包装中实现的 Java 方法(FFT,一些线性代数方法.. 等等等等..)。

On the other hand OpenCL is just an API. OpenCL kernels are plain strings passed to the API so using OpenCL from Java you should be able to specify your own kernels. OpenCL binding for java can be found here http://www.jocl.org/.

另一方面,OpenCL 只是一个 API。OpenCL 内核是传递给 API 的纯字符串,因此使用 Java 中的 OpenCL,您应该能够指定自己的内核。Java 的 OpenCL 绑定可以在这里找到http://www.jocl.org/

回答by halfwarp

I've been using JOCL and I'm very happy with it.

我一直在使用 JOCL,我对它非常满意。

The main disadvantage of OpenCL over CUDA (at least for me) is the lack of available libraries (Thrust, CUDPP, etc). However CUDA can be easily ported to OpenCL, and by looking at how those libraries work (algorithms, strategies, etc) is actually very nice as you learn a lot with it.

OpenCL 优于 CUDA(至少对我而言)的主要缺点是缺乏可用的库(Thrust、CUDPP 等)。然而,CUDA 可以很容易地移植到 OpenCL,并且通过查看这些库的工作方式(算法、策略等)实际上非常好,因为您可以从中学到很多东西。

回答by gfrost

You may also consider Aparapi. It allows you to write your code in Java and will attempt to convert bytecode to OpenCL at runtime.

你也可以考虑Aparapi。它允许您用 Java 编写代码,并尝试在运行时将字节码转换为 OpenCL。

Full disclosure. I am the Aparapi developer.

全面披露。我是 Aparapi 开发人员。

回答by karl

I know it's late but take a look at this: https://github.com/pcpratts/rootbeer1

我知道已经晚了,但看看这个:https: //github.com/pcpratts/rootbeer1

I have not worked with it but seems much easier to use than other solutions.

我没有使用过它,但似乎比其他解决方案更容易使用。

From the project page:

从项目页面:

Rootbeer is more advanced than CUDA or OpenCL Java Language Bindings. With bindings the developer must serialize complex graphs of objects into arrays of primitive types. With Rootbeer this is done automatically. Also with language bindings, the developer must write the GPU kernel in CUDA or OpenCL. With Rootbeer a static analysis of the Java Bytecode is done (using Soot) and CUDA code is automatically generated.

Rootbeer 比 CUDA 或 OpenCL Java 语言绑定更先进。使用绑定,开发人员必须将复杂的对象图序列化为基本类型的数组。使用 Rootbeer,这是自动完成的。对于语言绑定,开发人员必须在 CUDA 或 OpenCL 中编写 GPU 内核。使用 Rootbeer 对 Java 字节码进行静态分析(使用 Soot)并自动生成 CUDA 代码。

回答by Michael Dorner

I can also recommend JOCL by jogamp.org, works on Linux, Mac, and Windows. CONRAD, for example, uses heavily OpenCL in combination with JOCL.

我还可以通过 jogamp.org推荐JOCL,它适用于 Linux、Mac 和 Windows。例如,CONRAD将大量 OpenCL 与 JOCL 结合使用。

回答by Michael Dorner

You can take a look at the CUDA4J API

你可以看看CUDA4J API

http://sett.com/gpgpu/the-cuda4j-api

http://sett.com/gpgpu/the-cuda4j-api

回答by Guillaume Surroca

If you want to do some image processing or geometric operations, you may want a linear algebra library with gpu support (with CUDA for instance). I would suggest you ND4J witch is the linear algrebra with CUDA GPU support on which DeepLearning4J is built. With that you don't have to deal with CUDA directly and have to low level code in c. Plus if you want to do more stuff with image with DL4J you will have access to specific image processing operations such as convolution.

如果你想做一些图像处理或几何运算,你可能需要一个支持 gpu 的线性代数库(例如 CUDA)。我建议您使用 ND4J 女巫是构建 DeepLearning4J 的具有 CUDA GPU 支持的线性代数。有了它,您不必直接处理 CUDA,而必须使用 c 中的低级代码。另外,如果您想使用 DL4J 对图像进行更多处理,您将可以访问特定的图像处理操作,例如卷积。