在 C++ 中生成唯一 ID

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/65524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 12:36:06  来源:igfitidea点击:

Generating a Unique ID in c++

c++hash

提问by Deathbob

What is the best way to generate a Unique ID from two (or more) short ints in C++? I am trying to uniquely identify vertices in a graph. The vertices contain two to four short ints as data, and ideally the ID would be some kind of a hash of them. Prefer portability and uniqueness over speed or ease.

从 C++ 中的两个(或多个)短整数生成唯一 ID 的最佳方法是什么?我正在尝试唯一标识图中的顶点。顶点包含两到四个短整数作为数据,理想情况下,ID 是它们的某种散列。与速度或易用性相比,更喜欢便携性和独特性。

There are a lot of great answers here, I will be trying them all tonight to see what fits my problem the best. A few more words on what I'm doing.

这里有很多很棒的答案,今晚我将全部尝试一下,看看什么最适合我的问题。再说几句我在做什么。

The graph is a collection of samples from an audio file. I use the graph as a Markov Chain to generate a new audio file from the old file. Since each vertex stores a few samples and points to another sample, and the samples are all short ints, it seemed natural to generate an ID from the data. Combining them into a long long sounds good, but maybe something as simple as just a 0 1 2 3 generateIDis all I need. not sure how much space is necessary to guarantee uniqueness, if each vertex stores 2 16 bit samples, there are 2^32 possible combinations correct? and so if each vertex stores 4 samples, there are 2^64 possible combinations?

该图是来自音频文件的样本集合。我使用该图作为马尔可夫链从旧文件生成新的音频文件。由于每个顶点存储一些样本并指向另一个样本,并且样本都是短整数,因此从数据生成 ID 似乎很自然。将它们组合成一个长长的听起来不错,但也许像 0 1 2 3 这样简单的东西generateID就是我所需要的。不确定需要多少空间来保证唯一性,如果每个顶点存储 2 个 16 位样本,那么有 2^32 种可能的组合是否正确?所以如果每个顶点存储 4 个样本,那么有 2^64 种可能的组合?

Library and platform specific solutions not really relevant to this question. I don't want anyone else who might compile my program to have to download additional libraries or change the code to suit their OS.

特定于库和平台的解决方案与此问题并不真正相关。我不希望任何可能编译我的程序的人不得不下载额外的库或更改代码以适应他们的操作系统。

采纳答案by Doug T.

A simple solution is to use a 64 bit integer where the lower 16 bits is the first vertex coordinate, next 16 bits is the second, and so on. This will be unique for all your vertices, though not very compact.

一个简单的解决方案是使用 64 位整数,其中低 16 位是第一个顶点坐标,接下来的 16 位是第二个,依此类推。这对于您的所有顶点都是唯一的,尽管不是很紧凑。

So here's some half-assed code to do this. Hopefully I got the casts right.

所以这里有一些半途而废的代码来做到这一点。希望我选对了。

uint64_t generateId( uint16_t v1, uint16_t v2, uint16_t v3, uint16_t v4)
{ 
   uint64_t id;
   id = v1 | (((uint64_t)v2) << 16) | (((uint64_t)v3) << 32) | (((uint64_t)v4) << 48);
   return id;
}

Optionally this could be done with a union (great idea from Leon Timmermans, see comment). Very clean this way:

或者,这可以通过联合来完成(Leon Timmermans 的好主意,请参阅评论)。这样很干净:

struct vertex
{
    uint16_t v1;
    uint16_t v2;
    uint16_t v3;
    uint16_t v4;
};

union vertexWithId
{
    vertex v;
    uint64_t id;
};

int main()
{
    vertexWithId vWithId;
    // Setup your vertices
    vWithId.v.v1 = 2;
    vWithId.v.v2 = 5;

    // Your id is automatically setup for you!
    std::cout << "Id is " << vWithId.id << std::endl;
    return 0;
}

回答by Jeroen Dirks

Sometimes the simplest things works best.

有时,最简单的事情效果最好。

Can you just add an id field to the Vertex object and assign it a number in order of construction?

您可以只向 Vertex 对象添加一个 id 字段并按构造顺序为其分配一个数字吗?

static int sNextId = 0;
int getNextId() { return ++sNextId; }

回答by Fire Lancer

Well the only way to guarantee the ID is unique, is to make have more id combinations than what your gettings ids from

好吧,保证 ID 唯一的唯一方法是让 ID 组合比您从中获取的 ID 多

eg for 2 shorts (assuming 16bit), you should use a 32bit int

例如对于 2 个短裤(假设 16 位),您应该使用 32 位 int

int ID = ((int)short1 << 16) | short2;

and for 4 shorts you would need a 64bit int, etc...

对于 4 个短裤,您需要一个 64 位 int 等...

With basically anything else collisions (multiple things may get the same id) are pretty much guaranteed.

基本上任何其他冲突(多个事物可能会得到相同的 id)几乎都可以保证。

However a different approach (which I think would be better)to get ids would be to hand them out as vertices are inserted:

但是,获取 id 的另一种方法(我认为会更好)是在插入顶点时将它们分发出去:

unsigned LastId = 0;//global

unsigned GetNewId(){return ++LastId;}

This also has the effect of allowing you to add more/different data to each vertex. However if you expect to create more than 2^32 vertices without resetting it, this is probably not the best method.

这还具有允许您向每个顶点添加更多/不同数据的效果。但是,如果您希望在不重置的情况下创建超过 2^32 个顶点,这可能不是最好的方法。

回答by Fire Lancer

use a long long so you can store all 4 possibilities, then bitshift each short:

使用 long long 以便您可以存储所有 4 种可能性,然后对每个 short 进行位移:

((long long)shortNumberX) << 0, 4, 8, or 12

((long long)shortNumberX) << 0、4、8 或 12

make sure you cast before shifting, or your data could drop off the end.

确保在转移之前进行转换,否则您的数据可能会从最后丢失。

Edit: forgot to add, you should OR them together.

编辑:忘了添加,您应该将它们 OR 放在一起。

回答by David Dolson

If you prefer the portability, then boost::tupleis nice:

如果您更喜欢可移植性,那么boost::tuple很好:

You would want a tuple of 4 items:

您需要一个包含 4 个项目的元组:

typedef boost::tuple<uint16,uint16,uint16,uint16> VertexID;

You can assign like this:

你可以这样分配:

VertexID id = boost::make_tuple(1,2,3,4);

The boost tuple already has support for comparison, equality, etc., so it is easy to use in containers and algorithms.

boost元组已经支持比较、相等等,所以很容易在容器和算法中使用。

回答by xtofl

The definition of the "ID" in the question isn't really clear: do you need to use it as a key for fast Vertex lookup? You could define a comparator for the std::map(see below for an example)

问题中“ID”的定义不是很清楚:您是否需要将其用作快速顶点查找的键?您可以为std::map(参见下面的示例)定义一个比较器

Do you need to be able to differentiate between two Vertex objects with the same coordinates (but different in another field)? Define some 'id factory' (cfr. the singleton pattern) that generates e.g. a sequence of ints, unrelated to the values of the Vertex objects. - Much in the way Fire Lancer suggests (but beware of thread-safety issues!)

您是否需要能够区分具有相同坐标(但在另一个领域不同)的两个 Vertex 对象?定义一些“id 工厂”(参见单例模式),它生成一个整数序列,与 Vertex 对象的值无关。- Fire Lancer 建议的方式很多(但要注意线程安全问题!)

In my opinion, two vertices with identical coordinates are identical. So why would you even need an extra ID?

在我看来,具有相同坐标的两个顶点是相同的。那么为什么你甚至需要一个额外的ID呢?

As soon as you define a 'strict weak ordering' on this type, you can use it as a key in e.g. an std::map,

一旦您在此类型上定义了“严格弱排序”,您就可以将其用作例如 an 中的键std::map

struct Vertex {
  typedef short int Value;
  Value v1, v2;

  bool operator<( const Vertex& other ) const {
    return v1 < other.v1 || ( v1 == other.v1 && v2 < other.v2 ) ;
};

Vertex x1 = { 1, 2 };
Vertex x2 = { 1, 3 };
Vertex y1 = { 1, 2 }; // too!

typedef std::set<Vertex> t_vertices;

t_vertices vertices;
vertices.insert( x1 );
vertices.insert( x2 );
vertices.insert( y1 ); // won't do a thing since { 1, 2 } is already in the set.

typedef std::map<Vertex, int> t_vertex_to_counter;
t_vertex_to_counter count;
count[ x1 ]++;
assert( count[x1] == 1 );
assert( count[y1] == 1 );
count[ x2 ]++;
count[ y1 ]++; 
assert( count[x1] == 2 );
assert( count[y1] == 2 );

回答by xtofl

If you are on Windows, you could useCoCreateGUIDAPI, on Linux you can use /proc/sys/kernel/random/uuid, you can also look at 'libuuid'.

如果你在 Windows 上,你可以使用CoCreateGUIDAPI,在 Linux 上你可以使用 /proc/sys/kernel/random/uuid,你也可以看看 'libuuid'。

回答by bk1e

If you're building a hash table in which to store your vertices, I can think of a couple of ways to avoid collisions:

如果您正在构建一个哈希表来存储您的顶点,我可以想到几种避免冲突的方法:

  1. Generate IDs directly from the input data without throwing any bits away, and use a hash table that is large enough to hold all possible IDs. With 64-bit IDs, the latter will be extremely problematic: you will have to use a table that is smaller than your range of IDs, therefore you will have to deal with collisions. Even with 32-bit IDs, you would need well over 4GB of RAM to pull this off without collisions.
  2. Generate IDs sequentially as you read in the vertices. Unfortunately, this makes it very expensive to search for previously read vertices in order to update their probabilities, since a sequential ID generator is not a hash function. If the amount of data used to construct the Markov chain is significantly smaller than the amount of data that the Markov chain is used to generate (or if they are both small), this may not be an issue.
  1. 直接从输入数据生成 ID,不丢弃任何位,并使用足够大的哈希表来保存所有可能的 ID。对于 64 位 ID,后者将是非常有问题的:您将不得不使用一个小于 ID 范围的表,因此您将不得不处理冲突。即使使用 32 位 ID,您也需要超过 4GB 的 RAM 才能在不发生冲突的情况下实现这一点。
  2. 在读取顶点时按顺序生成 ID。不幸的是,这使得搜索先前读取的顶点以更新它们的概率变得非常昂贵,因为顺序 ID 生成器不是哈希函数。如果用于构建马尔可夫链的数据量明显小于用于生成马尔可夫链的数据量(或者如果两者都很小),这可能不是问题。

Alternatively, you could use a hash table implementation that handles collisions for you (such as unordered_map/hash_map), and concentrate on the rest of your application.

或者,您可以使用为您处理冲突的哈希表实现(例如unordered_map/ hash_map),并专注于应用程序的其余部分。

回答by basszero

off the cuff I'd say use prime numbers,

袖手旁观我会说使用质数,

id = 3 * value1 + 5 * value2 + .... + somePrime * valueN

Make sure you don't overflow your id space (long? long long?). Since you've got a fixed number of values just crap some random primes. Don't bother generating them, there are enough available in lists to get you going for awhile.

确保你没有溢出你的 id 空间(长?长长?)。由于您有固定数量的值,因此只需胡说八道一些随机素数。不要费心生成它们,列表中有足够的可用内容让您继续使用一段时间。

I'm a little sketchy on the proof though, maybe someone more mathmatical can hook me up. Probably has something to do with unique prime factorization of a number.

不过,我对证明有点粗略,也许更数学的人可以联系我。可能与一个数字的唯一质因数分解有关。