java 什么是最佳 scrypt 工作因素?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11126315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are optimal scrypt work factors?
提问by Michael Louis Thaler
I'm using a Java scrypt libraryfor password storage. It calls for an N
, r
and p
value when I encrypt things, which its documentation refers to as "CPU cost", "memory cost" and "parallelization cost" parameters. Only problem is, I don't actually know what they specifically mean, or what good values would be for them; perhaps they correspond somehow to the -t, -m and -M switches on Colin Percival's original app?
我正在使用Java scrypt 库来存储密码。当我加密事物时N
,它需要一个,r
和p
值,它的文档将其称为“CPU 成本”、“内存成本”和“并行化成本”参数。唯一的问题是,我实际上并不知道它们的具体含义,或者对它们来说什么是好的价值观;也许它们以某种方式对应于Colin Percival 原始应用程序上的 -t、-m 和 -M 开关?
Does anyone have any suggestions for this? The library itself lists N = 16384, r = 8 and p = 1, but I don't know if this is strong or weak or what.
有没有人对此有任何建议?库本身列出了 N = 16384、r = 8 和 p = 1,但我不知道这是强还是弱还是什么。
回答by gimpf
As a start:
作为开始:
cpercivalmentioned in his slides from 2009something around
cpercival在他 2009 年的幻灯片中提到了一些关于
- (N = 2^14, r = 8, p = 1) for < 100ms (interactive use), and
- (N = 2^20, r = 8, p = 1) for < 5s (sensitive storage).
- (N = 2^14, r = 8, p = 1) 小于 100 毫秒(交互使用),以及
- (N = 2^20,r = 8,p = 1)< 5s(敏感存储)。
These values happen to be good enough for general use (password-db for some WebApp) even today (2012-09). Of course, specifics depend on the application.
即使在今天(2012-09),这些值也足以满足一般用途(某些 WebApp 的 password-db)。当然,具体取决于应用程序。
Also, those values (mostly) mean:
此外,这些值(主要)意味着:
N
: General work factor, iteration count.r
: blocksize in use for underlying hash; fine-tunes the relative memory-cost.p
: parallelization factor; fine-tunes the relative cpu-cost.
N
:一般工作因素,迭代次数。r
:用于底层哈希的块大小;微调相对内存成本。p
: 并行化因子;微调相对 CPU 成本。
r
and p
are meant to accommodate for the potential issue that CPU speed and memory size and bandwidth do not increase as anticipated. Should CPU performance increase faster, you increase p
, should instead a breakthrough in memory technology provide an order of magnitude improvement, you increase r
. And N
is there to keep up with the general doubling of performance per some timespan.
r
并且p
旨在解决 CPU 速度、内存大小和带宽未按预期增加的潜在问题。如果 CPU 性能提高得更快,您就会增加p
,如果内存技术的突破提供一个数量级的改进,您就会增加r
。并且N
是为了跟上每一段时间的性能翻倍的步伐。
Important:All values change the result. (Updated:) This is the reason why all scrypt parameters are stored in the result string.
重要提示:所有值都会改变结果。(更新:)这就是为什么所有 scrypt 参数都存储在结果字符串中的原因。
回答by Ian Boyd
The memory required for scrypt to operate is calculated as:
scrypt 运行所需的内存计算如下:
128 bytes ×
N_cost
×r_blockSizeFactor
128 字节 ×
N_cost
×r_blockSizeFactor
for the parameters you quote (N=16384
, r=8
, p=1
)
对于您引用的参数 ( N=16384
, r=8
, p=1
)
128×16384×8 = 16,777,216 bytes = 16 MB
128×16384×8 = 16,777,216 字节 = 16 MB
You have to take this into account when choosing parameters.
选择参数时必须考虑到这一点。
Bcrypt is "weaker"than Scrypt (although still three orders of magnitude stronger than PBKDF2) because it only requires 4 KB of memory. You want to make it difficult to parallelize cracking in hardware. For example, if a video card has 1.5 GB of on-board memory and you tuned scrypt to consume 1 GB of memory:
Bcrypt比 Scrypt “弱”(虽然仍然比 PBKDF2 强三个数量级),因为它只需要 4 KB 的内存。您想让硬件中的并行破解变得困难。例如,如果视频卡具有 1.5 GB 的板载内存,并且您将 scrypt 调整为消耗 1 GB 的内存:
128×16384×512 = 1,073,741,824 bytes = 1 GB
128×16384×512 = 1,073,741,824 字节 = 1 GB
then an attacker could not parallelize it on their video card. But then your application/phone/server would need to use 1 GB of RAM every time they calculated a password.
那么攻击者就无法在他们的视频卡上并行化它。但是,您的应用程序/电话/服务器每次计算密码时都需要使用 1 GB 的 RAM。
It helps me to think about the scrypt parameters as a rectangle. Where:
它帮助我将 scrypt 参数视为一个矩形。在哪里:
- the width is the amount of memory required (128*N*r)
- the height is the number of iterations performed
- and the resulting area is the overall hardness
- 宽度是所需的内存量(128*N*r)
- 高度是执行的迭代次数
- 结果面积是整体硬度
- the
cost
(N) increases both memory usageand iterations. - the
blockSizeFactor
(r) increases memory usage.
- 的
cost
(Ñ)同时增加内存使用和迭代。 - 的
blockSizeFactor
([R )增加的内存使用情况。
The remaining parameter parallelization
(p) means that you have to do the entire thing 2, 3, or more times:
剩余的参数parallelization
( p) 意味着您必须将整个操作执行 2 次、3 次或更多次:
If you had more memory than CPU, you could calculate the three separate paths in parallel - requiring triple the memory:
如果您的内存比 CPU 多,您可以并行计算三个单独的路径 - 需要三倍的内存:
But in all real-world implementations, it is calculated in series, tripling the calculations needed:
但在所有现实世界的实现中,它是串行计算的,所需的计算量增加了三倍:
In reality, nobody has ever chosen a p
factor other than p=1
.
实际上,p
除了 之外,没有人选择过其他因素p=1
。
What are the ideal factors?
什么是理想因素?
- As much RAM as you can spare
- for as much time as you can spare!
- 尽可能多的内存
- 尽可能多的时间!
Bonus Chart
奖金图表
Graphical version of above:
上面的图形版本:
Notes:
笔记:
- the vertical axis is log scale
- Cost factor (horizontal) itself is log (iterations = 2CostFactor)
- Highlighted in the
r=8
curve
- 纵轴是对数刻度
- 成本因子(水平)本身是对数(迭代数 = 2 CostFactor)
- 在
r=8
曲线中突出显示
And zoomed in version of above to the reasonable area:
并将上述版本放大到合理区域:
回答by Nick Bauer
I don't want to step on the excellent answers provided above, but no one really talks about why "r" has the value it has. The low-level answer provided by the Colin Percival's Scrypt paper is that it relates to the "memory latency-bandwidth product". But what does that actually mean?
我不想踩上面提供的优秀答案,但没有人真正谈论为什么“r”具有它的价值。Colin Percival 的 Scrypt 论文提供的低级答案是它与“内存延迟-带宽积”有关。但这实际上意味着什么?
If you're doing Scrypt right, you should have a large memory block which is mostly sitting in main memory. Main memory takes time to pull from. When an iteration of the block-jumping loop first selects an element from the large block to mix into the working buffer, it has to wait on the order of 100 ns for the first chunk of data to arrive. Then it has to request another, and wait for it to arrive.
如果您正确地执行 Scrypt,您应该有一个大内存块,它主要位于主内存中。主内存需要时间来提取。当块跳转循环的迭代首先从大块中选择一个元素以混合到工作缓冲区中时,它必须等待 100 ns 的数量级,以便第一个数据块到达。然后它必须请求另一个,并等待它到达。
For r = 1, you'd be doing 4nr Salsa20/8 iterations and2n latency-imbued reads from main memory.
对于 r = 1,您将进行 4nr Salsa20/8 迭代和2n次来自主内存的延迟读取。
This isn't good, because it means that an attacker could get an advantage over you by building a system with reduced latency to main memory.
这并不好,因为这意味着攻击者可以通过构建一个减少主内存延迟的系统来获得优于您的优势。
But if you increase r and proportionally decrease N, you are able to achieve the same memory requirements and do the same number of computations as before--except that you've traded some random accesses for sequential accesses. Extending sequential access allows either the CPU or the library to prefetch the next required blocks of data efficiently. While the initial latency is still there, the reduced or eliminated latency for the later blocks averages the initial latency out to a minimal level. Thus, an attacker would gain little from improving their memory technology over yours.
但是,如果您增加 r 并按比例减少 N,您将能够达到与以前相同的内存要求并执行相同数量的计算——除了您将一些随机访问换成了顺序访问。扩展顺序访问允许 CPU 或库有效地预取下一个所需的数据块。虽然初始延迟仍然存在,但后期块的减少或消除延迟将初始延迟平均到最低水平。因此,攻击者不会从改进他们的内存技术中获得什么。
However, there is a point of diminishing returns with increasing r, and that is related to the "memory latency-bandwidth product" referred to before. What this product indicates is how many bytes of data can be in transit from main memory to the processor at any given time. It's the same idea as a highway--if it takes 10 minutes to travel from point A to point B (latency), and the road delivers 10 cars/minute to point B from point A (bandwidth), the roadway between points A and B contains 100 cars. So, the optimal r relates to how many 64-byte chunks of data you can request at once, in order to cover up the latency of that initial request.
但是,随着 r 的增加,有一个收益递减点,这与之前提到的“内存延迟-带宽积”有关。该产品表示在任何给定时间可以从主存储器传输到处理器的数据字节数。这与高速公路的想法相同——如果从 A 点行驶到 B 点需要 10 分钟(延迟),并且道路从 A 点(带宽)(A 点和 A 点之间的道路)到 B 点的速度为 10 辆车/分钟B 包含 100 辆汽车。因此,最佳 r 与您一次可以请求多少个 64 字节的数据块有关,以掩盖该初始请求的延迟。
This improves the speed of the algorithm, allowing you to either increase N for more memory and computations or to increase p for more computations, as desired.
这提高了算法的速度,允许您根据需要增加 N 以获得更多内存和计算,或者增加 p 以进行更多计算。
There are some other issues with increasing "r" too much, which I haven't seen discussed much:
将“r”增加太多还有其他一些问题,我没有看到太多讨论:
- Increasing r while decreasing N reduces the number of pseudorandom jumps around memory. Sequential accesses are easier to optimize, and could give an attacker a window. As Colin Percival noted to me on Twitter, larger r could allow an attacker to use a lower cost, slower storage technology, reducing their costs considerably (https://twitter.com/cperciva/status/661373931870228480).
- The size of the working buffer is 1024r bits, so the number of possible end products, which will eventually be fed into PBKDF2 to produce the Scrypt output key, is 2^1024r. The number of permutations (possible sequences) of jumps around the large memory block is 2^NlogN. Which means that there are 2^NlogN possible products of the memory-jumping loop. If 1024r > NlogN, that would seem to indicate that the working buffer is being under-mixed. While I don't know this for sure, and would love to see a proof or disproof, it maybe possible for correlations to be found between the working buffer's result and the sequence of jumps, which could allow an attacker an opportunity to reduce their memory requirements without as-greatly increased computational cost. Again, this is an observation based on the numbers--it may be that everything is so well-mixed in every round that this isn't a problem. r = 8 is well below this potential threshold for the standard N = 2^14 -- for N = 2^14, this threshold would be r = 224.
- 在减少 N 的同时增加 r 会减少内存周围的伪随机跳跃次数。顺序访问更容易优化,并且可以给攻击者一个窗口。正如 Colin Percival 在 Twitter 上告诉我的那样,更大的 r 可以让攻击者使用成本更低、速度更慢的存储技术,从而大大降低他们的成本(https://twitter.com/cperciva/status/661373931870228480)。
- 工作缓冲区的大小为 1024r 位,因此最终将被送入 PBKDF2 以生成 Scrypt 输出密钥的可能最终产品的数量是 2^1024r。围绕大内存块跳转的排列数(可能的序列)为 2^NlogN。这意味着内存跳跃循环有 2^NlogN 个可能的乘积。如果 1024r > NlogN,这似乎表明工作缓冲区混合不足。虽然我不确定这一点,并且很想看到证据或反驳,但它可能可以在工作缓冲区的结果和跳转序列之间找到相关性,这可以让攻击者有机会减少他们的内存需求,而不会显着增加计算成本。同样,这是基于数字的观察 - 可能是每一轮中的所有东西都很好地混合,这不是问题。r = 8 远低于标准 N = 2^14 的这个潜在阈值——对于 N = 2^14,这个阈值将是 r = 224。
To sum up all the recommendations:
总结所有建议:
- Choose r to be just large enough to average out the effects of memory latency on your device and no more. Keep in mind that the value Colin Percival recommended, r = 8, appears to remain fairly optimal broadly for memory technology, and this apparently hasn't changed much in 8 years; 16 may be ever so slightly better.
- Decide on how big a chunk of memory you want to use per thread, keeping in mind this affects computation time as well, and set N accordingly.
- Increase p arbitrarily high to what your usage can tolerate (note: on my system and using my own implementation, p = 250 (4 threads) with N = 16384 and r = 8 takes ~5 seconds), and enable threading if you can deal with the added memory cost.
- When tuning, prefer large N and memory block size to increased p and computation time. Scrypt's primary benefit comes from its large memory block size.
- 选择 r 以使其足够大,以平均计算内存延迟对您的设备的影响,仅此而已。请记住,Colin Percival 推荐的值 r = 8,对于内存技术来说似乎仍然是相当理想的,而且这显然在 8 年中没有太大变化;16 可能会稍微好一点。
- 决定每个线程要使用多大的内存块,记住这也会影响计算时间,并相应地设置 N。
- 将 p 任意提高到您的使用可以容忍的程度(注意:在我的系统上并使用我自己的实现,p = 250(4 个线程),N = 16384 且 r = 8 需要约 5 秒),并在可以处理的情况下启用线程增加了内存成本。
- 调优时,更喜欢大 N 和内存块大小,而不是增加 p 和计算时间。Scrypt 的主要好处来自其大内存块大小。
A benchmark of my own implementation of Scrypt on a Surface Pro 3 with an i5-4300 (2 cores, 4 threads), using a constant 128Nr = 16 MB and p = 230; left-axis is seconds, bottom axis is r value, error bars are +/- 1 standard deviation:
我自己在带有 i5-4300(2 核,4 线程)的 Surface Pro 3 上实现 Scrypt 的基准测试,使用常量 128Nr = 16 MB 和 p = 230;左轴是秒,底轴是 r 值,误差线是 +/- 1 标准偏差: