java 为什么在 redis 中使用管道时 100,000 条记录如此之慢?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16697389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
why it is so slow with 100,000 records when using pipeline in redis?
提问by znlyj
It is said that pipeline
is a better way when many set/get
is required in redis, so this is my test code:
据说pipeline
当set/get
redis需要很多的时候,这是一个更好的方法,所以这是我的测试代码:
public class TestPipeline {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
list.add(si);
ShardedJedis jedis = new ShardedJedis(list);
long startTime = System.currentTimeMillis();
ShardedJedisPipeline pipeline = jedis.pipelined();
for (int i = 0; i < 100000; i++) {
Map<String, String> map = new HashMap<String, String>();
map.put("id", "" + i);
map.put("name", "lyj" + i);
pipeline.hmset("m" + i, map);
}
pipeline.sync();
long endTime = System.currentTimeMillis();
System.out.println(endTime - startTime);
}
}
When I ran it, there is no response with this program for a while, but when I don't work with pipe
, it takes only 20073 ms, so I am confused why it is even better without pipeline
and how a wide gap!
当我运行它时,这个程序有一段时间没有响应,但是当我不使用pipe
它时,它只需要 20073 毫秒,所以我很困惑为什么没有它会更好pipeline
以及差距有多大!
Thanks for answer me, a few questions, how do you calculate 6MB data? When I send 10K data, pipeline is always faster than normal mode, but with 100k, pipeline would no response.I think 100-1000 operations is a advisable choice as below said.Is there anyting with JIT since I don't understand it?
谢谢你回答我,几个问题,6MB的数据是怎么计算的?当我发送 10K 数据时,pipeline 总是比正常模式快,但是使用 100k 时,pipeline 不会响应。我认为 100-1000 次操作是一个明智的选择,如下所述。JIT 有什么问题,因为我不明白它?
回答by Didier Spezia
There are a few points you need to consider before writing such a benchmark (and especially a benchmark using the JVM):
在编写这样的基准测试(尤其是使用 JVM 的基准测试)之前,您需要考虑以下几点:
on most (physical) machines, Redis is able to process more than 100K ops/s when pipelining is used. Your benchmark only deals with 100K item, so it does not last long enough to produce meaningful results. Furthermore, there is no time for the successive stages of the JIT to kick in.
the absolute time is not a very relevant metric. Displaying the throughput (i.e. the number of operation per second) while keeping the benchmark running for at least 10 seconds would be a better and more stable metric.
your inner loop generates a lot of garbage. If you plan to benchmark Jedis+Redis, then you need to keep the overhead of your own program low.
because you have defined everything into the main function, your loop will not be compiled by the JIT (depending on the JVM you use). Only the inner method calls may be. If you want the JIT to be efficient, make sure to encapsulate your code into methods that can be compiled by the JIT.
optionally, you may want to add a warm-up phase before performing the actual measurement to avoid accounting the overhead of running the first iterations with the bare-bone interpreter, and the cost of the JIT itself.
在大多数(物理)机器上,当使用流水线时,Redis 能够处理超过 100K ops/s。您的基准测试仅处理 10 万个项目,因此它的持续时间不足以产生有意义的结果。此外,没有时间让 JIT 的后续阶段开始。
绝对时间不是一个非常相关的指标。在保持基准运行至少 10 秒的同时显示吞吐量(即每秒操作数)将是一个更好、更稳定的指标。
您的内部循环会产生大量垃圾。如果您打算对 Jedis+Redis 进行基准测试,那么您需要将自己的程序的开销保持在较低水平。
因为您已经在 main 函数中定义了所有内容,所以 JIT 不会编译您的循环(取决于您使用的 JVM)。只有内部方法调用可能。如果您希望 JIT 高效,请确保将您的代码封装到可由 JIT 编译的方法中。
或者,您可能希望在执行实际测量之前添加一个预热阶段,以避免计算使用准系统解释器运行第一次迭代的开销以及 JIT 本身的成本。
Now, regarding Redis pipelining, your pipeline is way too long. 100K commands in the pipeline means Jedis has to build a 6MB buffer before sending anything to Redis. It means the socket buffers (on client side, and perhaps server-side) will be saturated, and that Redis will have to deal with 6 MB communication buffers as well.
现在,关于 Redis 流水线,您的流水线太长了。管道中有 10 万条命令意味着 Jedis 必须在向 Redis 发送任何内容之前构建一个 6MB 缓冲区。这意味着套接字缓冲区(在客户端,可能还有服务器端)将饱和,Redis 也必须处理 6 MB 的通信缓冲区。
Furthermore, your benchmark is still synchronous (using a pipeline does not magically make it asynchronous). In other words, Jedis will not start reading replies until the last query of your pipeline has been sent to Redis. When the pipeline is too long, it has the potential to block things.
此外,您的基准测试仍然是同步的(使用管道并不会神奇地使其异步)。换句话说,在您的管道的最后一个查询发送到 Redis 之前,Jedis 不会开始阅读回复。当管道太长时,它有可能阻塞事物。
Consider limiting the size of the pipeline to 100-1000 operations. Of course, it will generate more roundtrips, but the pressure on the communication stack will be reduced to an acceptable level. For instance, consider the following program:
考虑将管道的大小限制为 100-1000 次操作。当然,它会产生更多的往返次数,但是通信栈的压力会降低到可以接受的水平。例如,考虑以下程序:
import redis.clients.jedis.*;
import java.util.*;
public class TestPipeline {
/**
* @param args
*/
int i = 0;
Map<String, String> map = new HashMap<String, String>();
ShardedJedis jedis;
// Number of iterations
// Use 1000 to test with the pipeline, 100 otherwise
static final int N = 1000;
public TestPipeline() {
JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
list.add(si);
jedis = new ShardedJedis(list);
}
public void push( int n ) {
ShardedJedisPipeline pipeline = jedis.pipelined();
for ( int k = 0; k < n; k++) {
map.put("id", "" + i);
map.put("name", "lyj" + i);
pipeline.hmset("m" + i, map);
++i;
}
pipeline.sync();
}
public void push2( int n ) {
for ( int k = 0; k < n; k++) {
map.put("id", "" + i);
map.put("name", "lyj" + i);
jedis.hmset("m" + i, map);
++i;
}
}
public static void main(String[] args) {
TestPipeline obj = new TestPipeline();
long startTime = System.currentTimeMillis();
for ( int j=0; j<N; j++ ) {
// Use push2 instead to test without pipeline
obj.push(1000);
// Uncomment to see the acceleration
//System.out.println(obj.i);
}
long endTime = System.currentTimeMillis();
double d = 1000.0 * obj.i;
d /= (double)(endTime - startTime);
System.out.println("Throughput: "+d);
}
}
With this program, you can test with or without pipelining. Be sure to increase the number of iterations (N parameter) when pipelining is used, so that it runs for at least 10 seconds. If you uncomment the println in the loop, you will realize that the program is slow at the begining and will get quicker as the JIT starts to optimize things (that's why the program should run at least several seconds to give a meaningful result).
使用此程序,您可以使用或不使用流水线进行测试。使用流水线时一定要增加迭代次数(N 参数),使其至少运行 10 秒。如果在循环中取消对 println 的注释,您将意识到程序在开始时很慢,并且会随着 JIT 开始优化事物而变得更快(这就是为什么程序应该至少运行几秒钟才能给出有意义的结果)。
On my hardware (an old Athlon box), I can get 8-9 times more throughput when the pipeline is used. The program could be further improved by optimizing key/value formatting in the inner loop and adding a warm-up phase.
在我的硬件(一个旧的 Athlon 机器)上,当使用管道时,我可以获得 8-9 倍的吞吐量。通过优化内循环中的键/值格式并添加预热阶段,可以进一步改进该程序。