JNI - 在 Java 和本机代码之间传递大量数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17709210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JNI - Passing large amounts of data between Java and Native code
提问by Rajiv
I am trying to achieve the following:
我正在努力实现以下目标:
1) I have a byte array on the java side that represents an image.
1) 我在 java 端有一个字节数组来表示一个图像。
2) I need to give my native code access to it.
2)我需要让我的本机代码访问它。
3) The native code decodes this image using GraphicsMagick and creates a bunch of thumbnails by calling resize. It also calculates a perceptual hash of the image which is either a vector or a unint8_t array.
3) 本机代码使用GraphicsMagick 解码此图像并通过调用resize 创建一堆缩略图。它还计算图像的感知散列,该散列是向量或 unint8_t 数组。
4) Once I return this data back to the Java side different threads will read it. The thumbnails will be uploaded to some external storage service via HTTP.
4) 一旦我将此数据返回给 Java 端,不同的线程将读取它。缩略图将通过 HTTP 上传到某些外部存储服务。
My questions are:
我的问题是:
1) What would be the most efficient way to pass the bytes from Java to my native code? I have access to it as a byte array. I don't see any particular advantage to passing it as a byte buffer (wrapping this byte array) vs a byte array here.
1) 将字节从 Java 传递到我的本机代码的最有效方法是什么?我可以以字节数组的形式访问它。我没有看到将它作为字节缓冲区(包装这个字节数组)与这里的字节数组传递的任何特别优势。
2) What would be the best way to return these thumbnails and perceptual hash back to the java code? I thought of a few options:
2)将这些缩略图和感知散列返回给java代码的最佳方法是什么?我想到了几个选择:
(i) I could allocate a byte buffer in Java and then pass it along to my native method. The native method could then write to it and set a limit after it is done and return the number of bytes written or some boolean indicating success. I could then slice and dice the byte buffer to extract the distinct thumbnails and perceptual hash and pass it along to the different threads that will upload the thumbnails. The problem with this approach is I don't know what size to allocate. The needed size will depend on the size of the thumbnails generated which I don't know in advance and the number of thumbnails (I do know this in advance).
(i) 我可以在 Java 中分配一个字节缓冲区,然后将它传递给我的本机方法。然后本机方法可以写入它并在完成后设置限制并返回写入的字节数或一些指示成功的布尔值。然后我可以对字节缓冲区进行切片和切块以提取不同的缩略图和感知哈希,并将其传递给将上传缩略图的不同线程。这种方法的问题是我不知道要分配什么大小。所需的大小取决于生成的缩略图的大小(我事先不知道)和缩略图的数量(我事先知道)。
(ii) I could also allocate the byte buffer in native code once I know the size needed. I could memcpy my blobs to the right region based on my custom packing protocol and return this byte buffer. Both (i) and (ii) seem complicated because of the custom packing protocol that would have to indicate the the length of each thumbnail and the perceptual hash.
(ii) 一旦我知道所需的大小,我也可以在本机代码中分配字节缓冲区。我可以根据我的自定义打包协议将我的 blob 存储到正确的区域并返回这个字节缓冲区。(i) 和 (ii) 看起来都很复杂,因为自定义打包协议必须指示每个缩略图的长度和感知哈希。
(iii) Define a Java class that has fields for thumbnails: array of byte buffers and perceptual hash: byte array. I could allocate the byte buffers in native code when I know the exact sizes needed. I can then memcpy the bytes from my GraphicsMagick blob to the direct address of each byte buffer. I am assuming that there is also some method to set the number of bytes written on the byte buffer so that the java code knows how big the byte buffers are. After the byte buffers are set, I could fill in my Java object and return it. Compared to (i) and (ii) I create more byte buffers here and also a Java object but I avoid the complexity of a custom protocol. Rationale behind (i), (ii) and (iii) - given that the only thing I do with these thumbnails is to upload them, I was hoping to save an extra copy with byte buffers (vs byte array) when uploading them via NIO.
(iii) 定义一个具有缩略图字段的 Java 类:字节缓冲区数组和感知散列:字节数组。当我知道所需的确切大小时,我可以在本机代码中分配字节缓冲区。然后,我可以将 GraphicsMagick blob 中的字节存储到每个字节缓冲区的直接地址。我假设还有一些方法可以设置写入字节缓冲区的字节数,以便 java 代码知道字节缓冲区有多大。设置字节缓冲区后,我可以填充我的 Java 对象并返回它。与 (i) 和 (ii) 相比,我在这里创建了更多字节缓冲区和一个 Java 对象,但我避免了自定义协议的复杂性。(i)、(ii) 和 (iii) 背后的基本原理 - 鉴于我对这些缩略图所做的唯一事情就是上传它们,
(iv) Define a Java class that has an array of byte arrays (instead of byte buffers) for the thumbnails and a byte array for the perceptual hash. I create these Java arrays in my native code and copy over the bytes from my GraphicsMagick blob using SetByteArrayRegion. The disadvantage vs the previous methods is that now there will be yet another copy in Java land when copying this byte array from the heap to some direct buffer when uploading it. Not sure that I would be saving any thing in terms of complexity vs (iii) here either.
(iv) 定义一个 Java 类,该类具有用于缩略图的字节数组(而不是字节缓冲区)和用于感知散列的字节数组。我在我的本机代码中创建这些 Java 数组,并使用 SetByteArrayRegion 从我的 GraphicsMagick blob 复制字节。与以前的方法相比,缺点是现在在上传时将此字节数组从堆复制到某个直接缓冲区时,Java 空间中还会有另一个副本。不确定我是否会在复杂性与 (iii) 方面节省任何东西。
Any advice would be awesome.
任何建议都会很棒。
EDIT: @main suggested an interesting solution. I am editing my question to follow up on that option. If I wanted to wrap native memory in a DirectBuffer like how @main suggests, how would I know when I can safely free the native memory?
编辑:@main 提出了一个有趣的解决方案。我正在编辑我的问题以跟进该选项。如果我想像@main 建议的那样将本机内存包装在 DirectBuffer 中,我怎么知道何时可以安全地释放本机内存?
采纳答案by main--
What would be the most efficient way to pass the bytes from Java to my native code? I have access to it as a byte array. I don't see any particular advantage to passing it as a byte buffer (wrapping this byte array) vs a byte array here.
将字节从 Java 传递到我的本机代码的最有效方法是什么?我可以以字节数组的形式访问它。我没有看到将它作为字节缓冲区(包装这个字节数组)与这里的字节数组传递的任何特别优势。
The big advantage of a direct ByteBuffer
is that you can call GetDirectByteBufferAddress
on the native side and you immediately have a pointer to the buffer contents, without any overhead. If you pass a byte array, you have to use GetByteArrayElements
and ReleaseByteArrayElements
(they might copy the array) or the critical versions (they pause the GC). So using a direct ByteBuffer
can have a positive impact on your code's performance.
直接的最大优点ByteBuffer
是您可以GetDirectByteBufferAddress
在本机端调用,并且您可以立即获得指向缓冲区内容的指针,而没有任何开销。如果传递字节数组,则必须使用GetByteArrayElements
and ReleaseByteArrayElements
(他们可能会复制数组)或关键版本(他们暂停 GC)。因此,使用直接ByteBuffer
可以对您的代码性能产生积极影响。
As you said, (i) won't work because you don't know how much data the method is going to return. (ii) is too complex because of that custom packaging protocol. I would go for a modified version of (iii): You don't need that object, you can just return an array of ByteBuffer
s where the first element is the hash and the other elements are the thumbnails. And you can throw away all the memcpy
s! That's the entire point in a direct ByteBuffer
: Avoiding copying.
正如您所说,(i) 将不起作用,因为您不知道该方法将返回多少数据。(ii) 由于自定义包装协议而过于复杂。我会选择 (iii) 的修改版本:你不需要那个对象,你可以只返回一个ByteBuffer
s数组,其中第一个元素是散列,其他元素是缩略图。你可以扔掉所有的memcpy
s!这就是直接的重点ByteBuffer
:避免复制。
Code:
代码:
void Java_MyClass_createThumbnails(JNIEnv* env, jobject, jobject input, jobjectArray output)
{
jsize nThumbnails = env->GetArrayLength(output) - 1;
void* inputPtr = env->GetDirectBufferAddress(input);
jlong inputLength = env->GetDirectBufferCapacity(input);
// ...
void* hash = ...; // a pointer to the hash data
int hashDataLength = ...;
void** thumbnails = ...; // an array of pointers, each one points to thumbnail data
int* thumbnailDataLengths = ...; // an array of ints, each one is the length of the thumbnail data with the same index
jobject hashBuffer = env->NewDirectByteBuffer(hash, hashDataLength);
env->SetObjectArrayElement(output, 0, hashBuffer);
for (int i = 0; i < nThumbnails; i++)
env->SetObjectArrayElement(output, i + 1, env->NewDirectByteBuffer(thumbnails[i], thumbnailDataLengths[i]));
}
Edit:
编辑:
I only have a byte array available to me for the input. Wouldn't wrapping the byte array in a byte buffer still incur the same tax? I also so this syntax for arrays: http://developer.android.com/training/articles/perf-jni.html#region_calls. Though a copy is still possible.
我只有一个字节数组可用于输入。将字节数组包装在字节缓冲区中是否仍然会产生相同的税收?我也使用这种数组语法:http: //developer.android.com/training/articles/perf-jni.html#region_calls。虽然副本仍然是可能的。
GetByteArrayRegion
always write to a buffer, therefore creating a copy every time, so I would suggest GetByteArrayElements
instead. Copying the array to a direct ByteBuffer
on the Java side is also not the best idea because you still have that copy that you could eventually avoid if GetByteArrayElements
pins the array.
GetByteArrayRegion
总是写入缓冲区,因此每次都创建一个副本,所以我建议GetByteArrayElements
改为。将数组直接复制到ByteBuffer
Java 端也不是最好的主意,因为您仍然拥有该副本,如果固定GetByteArrayElements
数组,您最终可以避免该副本。
If I create byte buffers that wrap native data, who is responsible for cleaning it up? I did the memcpy only because I thought Java would have no idea when to free this. This memory could be on the stack, on the heap or from some custom allocator, which seems like it would cause bugs.
如果我创建包装本机数据的字节缓冲区,谁负责清理它?我做 memcpy 只是因为我认为 Java 不知道何时释放它。该内存可能在堆栈上、堆上或来自某个自定义分配器,这似乎会导致错误。
If the data is on the stack, then you mustcopy it into Java array, a direct ByteBuffer
that was created in Java code or somewhere on the heap (and a direct ByteBuffer
that points to that location). If it's on the heap, then you can safely use that direct ByteBuffer
that you created using NewDirectByteBuffer
as long as you can ensure that nobody frees the memory. When the heap memory is free'd, you must no longer use the ByteBuffer
object. Java does not try to remove the native memory when a direct ByteBuffer
that was created using NewDirectByteBuffer
is GC'd. You have to take care of that manually, because you also created the buffer manually.
如果数据在堆栈上,那么您必须将其复制到 Java 数组中,该数组ByteBuffer
是在 Java 代码中或堆上某处创建的直接对象(以及ByteBuffer
指向该位置的直接对象)。如果它在堆上,那么你可以安全地使用ByteBuffer
你创建的直接使用NewDirectByteBuffer
,只要你能确保没有人释放内存。当堆内存被释放时,您不能再使用该ByteBuffer
对象。当ByteBuffer
使用创建的直接被NewDirectByteBuffer
GC 处理时,Java 不会尝试删除本机内存。您必须手动处理,因为您还手动创建了缓冲区。
回答by tallen
Byte array
I had to something similar, I returned a container (Vector or something) of Byte arrays. One of the other programmers implemented this as (and I think this is easier but a bit silly) a call-back. e.g. the JNI code would call a Java method for each response, then the original call (into the JNI code) would return. This does work okay though.
字节数组
我不得不做类似的事情,我返回了一个字节数组的容器(向量或其他东西)。其他程序员之一将此实现为(我认为这更容易但有点愚蠢)回调。例如,JNI 代码将为每个响应调用一个 Java 方法,然后原始调用(进入 JNI 代码)将返回。不过,这确实可以正常工作。