postgresql 如何在 Postgres 9.6+ 中生成长度为 N 的随机、唯一、字母数字 ID?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41970461/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 02:30:47  来源:igfitidea点击:

How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

sqldatabasepostgresqlrandom

提问by Ian Storm Taylor

I've seen a bunch of different solutionson StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytesI want to ask again to see if there is a simpler solution in newer versions.

我在 StackOverflow 上看到了许多不同的解决方案这些解决方案跨越了很多年和许多 Postgres 版本,但是有一些新的功能,比如gen_random_bytes我想再次询问是否在新版本中有更简单的解决方案。

Given IDs which contain a-zA-Z0-9, and vary in size depending on where they're used, like...

给定的 ID 包含a-zA-Z0-9, 大小取决于它们的使用位置,例如...

bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS

uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA

HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK

(Like the IDs that Stripe uses.)

(就像Stripe 使用ID一样。)

How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?

在 Postgres 9.6+ 中,如何使用一种简单的方法来为不同的用例指定不同的长度,从而随机且安全地生成它们(就减少冲突和降低可预测性而言)?

I'm thinking that ideally the solution has a signature similar to:

我认为理想情况下,该解决方案具有类似于以下内容的签名:

generate_uid(size integer) returns text

Where sizeis customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.

在哪里size可以根据您自己的权衡来定制,以降低冲突的机会与减少字符串大小以提高可用性。

From what I can tell, it must use gen_random_bytes()instead of random()for true randomness, to reduce the chance that they can be guessed.

据我所知,它必须使用gen_random_bytes()而不是random()真正的随机性,以减少他们被猜到的机会。

Thanks!

谢谢!



I know there's gen_random_uuid()for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like: "id": "ch_19iRv22eZvKYlo2CAxkjuHxZ"that are as short as possible while still containing only alphanumeric characters.

我知道有gen_random_uuid()UUID,但我不想在这种情况下使用它们。我正在寻找可以为我提供类似于 Stripe(或其他人)使用的 ID 的东西,看起来像:"id": "ch_19iRv22eZvKYlo2CAxkjuHxZ"尽可能短,同时仍然只包含字母数字字符。

This requirement is also why encode(gen_random_bytes(), 'hex')isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.

这个要求也是为什么encode(gen_random_bytes(), 'hex')不太适合这种情况的原因,因为它减少了字符集,从而迫使我增加字符串的长度以避免冲突。

I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:

我目前正在应用程序层执行此操作,但我希望将其移至数据库层以减少相互依赖性。以下是在应用程序层执行此操作的 Node.js 代码可能如下所示:

var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

function generate(length) {
  var bytes = crypto.randomBytes(length);
  var chars = [];

  for (var i = 0; i < bytes.length; i++) {
    chars.push(set[bytes[i] % set.length]);
  }

  return chars.join('');
}

采纳答案by Ian Storm Taylor

Figured this out, here's a function that does it:

想通了,这里有一个函数可以做到这一点:

CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
  characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  bytes BYTEA := gen_random_bytes(size);
  l INT := length(characters);
  i INT := 0;
  output TEXT := '';
BEGIN
  WHILE i < size LOOP
    output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
    i := i + 1;
  END LOOP;
  RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

And then to run it simply do:

然后运行它只需执行以下操作:

generate_uid(10)
-- '3Rls4DjWxJ'


Warning

警告

When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10for any reasonably commonly created object, I just used 10as a simple example.

执行此操作时,您需要确保您创建的 ID 的长度足以避免随着您创建的对象数量的增加而发生冲突,这可能会因为生日悖论而违反直觉。因此,您可能希望长度大于(或远大于)10任何合理通常创建的对象的长度,我只是用作10一个简单示例。



Usage

用法

With the function defined, you can use it in a table definition, like so:

定义函数后,您可以在表定义中使用它,如下所示:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT generate_uid(10),
  name TEXT NOT NULL,
  ...
);

And then when inserting data, like so:

然后在插入数据时,像这样:

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

It will automatically generate the idvalues:

它将自动生成id值:

    id     |  name  | ...
-----------+--------+-----
owmCAx552Q | ian    |
ZIofD6l3X9 | victor |


Usage with a Prefix

带前缀的用法

Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:

或者,您可能想在查看日志或调试器中的单个 ID 时为方便起见添加前缀(类似于Stripe 的做法),如下所示:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT ('col_' || generate_uid(10)),
  name TEXT NOT NULL,
  ...
);

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

      id       |  name  | ...
---------------+--------+-----
col_wABNZRD5Zk | ian    |
col_ISzGcTVj8f | victor |

回答by Evan Carroll

Review,

  1. 26 characters in [a-z]
  2. 26 characters in [A-Z]
  3. 10 characters in [0-9]
  4. 62 characters in [a-zA-Z0-9](base62)
  5. The function substring(string [from int] [for int])looks useful.
  1. 26 个字符 [a-z]
  2. 26 个字符 [A-Z]
  3. 10个字符 [0-9]
  4. [a-zA-Z0-9](base62) 中的62 个字符
  5. 该功能substring(string [from int] [for int])看起来很有用。

So it looks something like this. First we demonstrate that we can take the random-range and pull from it.

所以它看起来像这样。首先,我们证明我们可以获取随机范围并从中提取。

SELECT substring(
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
  1, -- 1 is 'a', 62 is '9'
  1,
);

Now we need a range between 1and 63

现在我们需要一个介于1和之间的范围63

SELECT trunc(random()*62+1)::int+1
FROM generate_series(1,1e2) AS gs(x)

This gets us there.. Now we just have to join the two..

这让我们到达那里.. 现在我们只需要加入这两个..

SELECT substring(
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
  trunc(random()*62)::int+1
  1
)
FROM generate_series(1,1e2) AS gs(x);

Then we wrap it in an ARRAYconstructor (because this is fast)

然后我们将它包装在一个ARRAY构造函数中(因为这很快)

SELECT ARRAY(
  SELECT substring(
    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
    trunc(random()*62)::int+1,
    1
  )
  FROM generate_series(1,1e2) AS gs(x)
);

And, we call array_to_string()to get a text.

而且,我们打电话array_to_string()来获取文本。

SELECT array_to_string(
  ARRAY(
      SELECT substring(
        'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
        trunc(random()*62)::int+1,
        1
      )
      FROM generate_series(1,1e2) AS gs(x)
  )
  , ''
);

From here we can even turn it into a function..

从这里我们甚至可以把它变成一个函数..

CREATE FUNCTION random_string(randomLength int)
RETURNS text AS $$
SELECT array_to_string(
  ARRAY(
      SELECT substring(
        'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
        trunc(random()*62)::int+1,
        1
      )
      FROM generate_series(1,randomLength) AS gs(x)
  )
  , ''
)
$$ LANGUAGE SQL
RETURNS NULL ON NULL INPUT
VOLATILE LEAKPROOF;

and then

接着

SELECT * FROM random_string(10);

回答by Dantio

Thanks to Evan Carroll answer, I took a look on hashids.org. For Postgres you have to compile the extensionor run some TSQL functions. But for my needs, I created something simpler based on hashids ideas (short, unguessable, unique, custom alphabet, avoid curse words).

感谢 Evan Carroll 的回答,我查看了 hashids.org。对于 Postgres,您必须编译扩展或运行一些 TSQL函数。但是为了我的需要,我根据 hashids 的想法创建了一些更简单的东西(简短的、不可猜测的、独特的、自定义的字母表,避免诅咒词)。

Shuffle alphabet:

随机字母表:

CREATE OR REPLACE FUNCTION consistent_shuffle(alphabet TEXT, salt TEXT) RETURNS TEXT AS $$
DECLARE
    SALT_LENGTH INT := length(salt);
    integer INT = 0;
    temp TEXT = '';
    j INT = 0;
    v INT := 0;
    p INT := 0;
    i INT := length(alphabet) - 1;
    output TEXT := alphabet;
BEGIN
    IF salt IS NULL OR length(LTRIM(RTRIM(salt))) = 0 THEN
        RETURN alphabet;
    END IF;
    WHILE i > 0 LOOP
        v := v % SALT_LENGTH;
        integer := ASCII(substr(salt, v + 1, 1));
        p := p + integer;
        j := (integer + v + p) % i;

        temp := substr(output, j + 1, 1);
        output := substr(output, 1, j) || substr(output, i + 1, 1) || substr(output, j + 2);
        output := substr(output, 1, i) || temp || substr(output, i + 2);

        i := i - 1;
        v := v + 1;
    END LOOP;
    RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

The main function:

主要功能:

CREATE OR REPLACE FUNCTION generate_uid(id INT, min_length INT, salt TEXT) RETURNS TEXT AS $$
DECLARE
    clean_alphabet TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
    curse_chars TEXT := 'csfhuit';
    curse TEXT := curse_chars || UPPER(curse_chars);
    alphabet TEXT := regexp_replace(clean_alphabet, '[' || curse  || ']', '', 'gi');
    shuffle_alphabet TEXT := consistent_shuffle(alphabet, salt);
    char_length INT := length(alphabet);
    output TEXT := '';
BEGIN
    WHILE id != 0 LOOP
        output := output || substr(shuffle_alphabet, (id % char_length) + 1, 1);
        id := trunc(id / char_length);
    END LOOP;
    curse := consistent_shuffle(curse, output || salt);
    output := RPAD(output, min_length, curse);
    RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

How-to use examples:

如何使用示例:

-- 3: min-length
select generate_uid(123, 3, 'salt'); -- output: "0mH"

-- or as default value in a table
CREATE SEQUENCE IF NOT EXISTS my_id_serial START 1;
CREATE TABLE collections (
    id TEXT PRIMARY KEY DEFAULT generate_uid(CAST (nextval('my_id_serial') AS INTEGER), 3, 'salt')
);
insert into collections DEFAULT VALUES ;

回答by Evan Carroll

I'm looking for something that gives me "shortcodes" (similar to what Youtube uses for video IDs) that are as short as possible while still containing only alphanumeric characters.

我正在寻找能够为我提供尽可能短但仍仅包含字母数字字符的“短代码”(类似于 Youtube 用于视频 ID 的代码)的东西。

This is a fundamentally different question from what you first asked. What you want here then is to put a serialtype on the table, and to use hashids.org code for PostgreSQL.

这是一个与您最初提出的问题根本不同的问题。那么你在这里想要的是serial在表上放置一个类型,并为 PostgreSQL使用hashids.org 代码

  • This returns 1:1 with the unique number (serial)
  • Never repeats or has a chance of collision.
  • Also base62 [a-zA-Z0-9]
  • 这将返回 1:1 的唯一编号(串行)
  • 永远不会重复或有碰撞的机会。
  • 还有base62 [a-zA-Z0-9]

Code looks like this,

代码看起来像这样,

SELECT id, hash_encode(foo.id)
FROM foo; -- Result: jNl for 1001

SELECT hash_decode('jNl') -- returns 1001

This module also supports salts.

该模块还支持盐。

回答by Roman Tkachuk

This query generate required string. Just change second parasmeter of generate_series to choose length of random string.

此查询生成所需的字符串。只需更改 generate_series 的第二个参数即可选择随机字符串的长度。

SELECT
     string_agg(c, '')
FROM (
     SELECT
          chr(r + CASE WHEN r > 25 + 9 THEN 97 - 26 - 9 WHEN r > 9 THEN 64 - 9 ELSE 48 END) AS c
     FROM (
           SELECT
                 i,
                 (random() * 60)::int AS r
           FROM
                 generate_series(0, 62) AS i
          ) AS a
      ORDER BY i
     ) AS A;