C语言 什么是哈希表以及如何在 C 中创建它?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31930046/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 12:04:02  来源:igfitidea点击:

What is a hash table and how do you make it in C?

cdata-structuresstructhashtable

提问by maxib7

I have a few questions on a data-structure called a hash table (also know as associative array) and how it is implemented in C.

我对称为哈希表(也称为关联数组)的数据结构以及它如何在 C 中实现有一些疑问。

How do you make a hash table in C? What is a hashtable and how do you implement it? Why would I want to use a hash table rather than an array?

你如何在C中制作哈希表?什么是哈希表以及如何实现它?为什么我要使用哈希表而不是数组?

NOTE: I know this is a really broad question which will require a large answer but, I made this because I had some people asking me what it all was. so I put it on here to fully explain it and help anyone else out.

注意:我知道这是一个非常广泛的问题,需要一个很大的答案,但是,我之所以这样做是因为有人问我这一切是什么。所以我把它放在这里是为了充分解释它并帮助其他人。

回答by maxib7

Prerequisites

先决条件

For this answer I'm going to assume you know how to use pointers, structs, and have a basic understanding of the C language.

对于这个答案,我将假设您知道如何使用指针、结构,并且对 C 语言有基本的了解。

Also if you don't know. When talking about the speed of algorithms and data structures you should know the terms:

还有如果你不知道。在谈论算法和数据结构的速度时,您应该知道以下术语:

O() = (it's pronounced "Big-oh") Big-oh or O() refers to the "worst-case-scenario" runtime. Similarly, in math, it's big O notation and describes the limiting behavior of a function. If somethings O(1) that's constant time "really good". If somethings O(n) that means if the list is a million long. It is at worst going to run a million time. O() is generally the one used to determine how fast something runs because that's how fast it'll run in it's worst case.

O() =(发音为“Big-oh”)Big-oh 或 O() 指的是“最坏情况”运行时。同样,在数学中,它是大 O 符号并描述了函数的极限行为。如果某事 O(1) 那是恒定时间“真的很好”。如果某事 O(n) 那意味着如果列表是一百万长。最坏的情况是运行一百万次。O() 通常用于确定某事物运行的速度,因为这是它在最坏情况下的运行速度。

Ω = (greek letter Omega) refers to it's best case scenario. It's not used that as much as O() so I won't go into too much detail about it. But just know that if somethings Ω(1), in it's best case scenario it'll take just one time.

Ω =(希腊字母 Omega)指的是最好的情况。它不像 O() 那样被使用,所以我不会详细介绍它。但是要知道,如果是 Ω(1),在最好的情况下,它只需要一次。

Θ = (greek letter theta) is unique in that it is only used when the O() and Ω() runtime are the same. So like in the case of the recursive sorting algorithm merge sort. It's run time is Θ(n(log(n))). Which means that it's O(n(log(n))) and it's Ω(n(log(n))).

Θ =(希腊字母 theta)的独特之处在于它仅在 O() 和 Ω() 运行时相同时使用。所以就像在递归排序算法合并排序的情况下一样。它的运行时间是 Θ(n(log(n)))。这意味着它是 O(n(log(n))) 和 Ω(n(log(n)))。

What is a Hash table?

什么是哈希表?

A hash table or associative array is a popular data structure used in programming. A hash table is just a linked list (I'll get to what a linked list is later on) with a hash function. A hash function basically just takes things and puts them in different "baskets". Each "basket" is just another linked list or something else depending on how you implement it. I'll explain more details on hash tables when I show you how to implement one.

哈希表或关联数组是编程中常用的数据结构。哈希表只是一个带有哈希函数的链表(稍后我将介绍链表是什么)。散列函数基本上只是将事物放入不同的“篮子”中。每个“篮子”只是另一个链表或其他东西,具体取决于您如何实现它。当我向您展示如何实现哈希表时,我将解释有关哈希表的更多细节。

Why would I want to use a hash table rather than an array?

为什么我要使用哈希表而不是数组?

An array is very easy to use and simple to make, but it also has its downsides. For this example, let's say we have a program and in which we want to keep all its users in an array.

数组非常易于使用且易于制作,但它也有其缺点。在这个例子中,假设我们有一个程序,我们希望将它的所有用户保存在一个数组中。

That's pretty simple. Let's just say we plan on this program having no more than 100 users and fill that array with our users

这很简单。假设我们计划让这个程序的用户不超过 100 个,并用我们的用户填充该数组

char* users[100];

// iterate over every user and "store" their name
for (int i = 0; i < userCount; i++)
{
    users[i] = "New username here";
}

So that works all well and fine and really fast too. That's O(1) right there. We can access any user in constant time.

所以这一切都很好,而且非常快。这就是 O(1)。我们可以在恒定时间内访问任何用户。

But let's now assume that our program gets really popular. It now has over 80 users. Uh-Oh! We better increase the size of that array or else we'll get a buffer overflow.

但是现在让我们假设我们的程序变得非常流行。它现在有80多个用户。哦哦!我们最好增加该数组的大小,否则我们会出现缓冲区溢出。

So how do we do that? Well we're gonna have to make a new array that's bigger and copy over the contents of the old array into the new array.

那么我们该怎么做呢?好吧,我们将不得不创建一个更大的新数组,并将旧数组的内容复制到新数组中。

That's very costly and we don't want to do that. We want to think cleverly and not use a something that has a fixed size. Well we already know how to use pointers to our advantage and we can bundle information into a structif we wanted to.

这是非常昂贵的,我们不想这样做。我们想聪明地思考,而不是使用具有固定大小的东西。好吧,我们已经知道如何使用指针来发挥我们的优势,struct如果我们愿意,我们可以将信息捆绑到一个中。

So we could create a structto store the username and then have it point (via a pointer) to a new struct. Voila! We now have a data structure that is expandable. It's a list of bundled information that's linked together by pointers. Thus the name linked list.

所以我们可以创建一个struct来存储用户名,然后让它(通过指针)指向一个新的struct. !我们现在有一个可扩展的数据结构。它是通过指针链接在一起的捆绑信息列表。因此名称链表。

Linked Lists

链表

So let's create that linked list. First we're gonna need a struct

所以让我们创建那个链表。首先我们需要一个struct

typedef struct node
{
    char* name;
    struct node* next;
}
node;

Alright so we have a string nameand a... Wait a sec... I've never heard of a data type called a struct node. Well for our convenience I typedefa new "data type" called nodethat also happens to be our structcalled node.

好的,所以我们有一个字符串name和一个......等一下......我从来没有听说过一种叫做 a 的数据类型struct node。好吧,为了方便起见,我使用typedef了一种新的“数据类型” node,它也恰好是我们的struct调用node.

So now that we have our node for our list, what do we need next? Well we need to create a "root" to our list so we can traverseit (I'll explain what I mean by traverselater). So let's assign a root. (remember that nodedata type I typdefed earlier)

所以现在我们有了列表的节点,接下来我们需要什么?好吧,我们需要为我们的列表创建一个“根”,以便我们可以traverse(稍后我将解释我的意思traverse)。所以让我们分配一个根。(记住nodetypdef之前编辑过的数据类型)

node* first = NULL;

So now that we have our root all we need to do is make a function to insert new usernames into our list.

所以现在我们有了我们的根,我们需要做的就是创建一个函数来将新的用户名插入到我们的列表中。

/*
 * inserts a name called buffer into
 * our linked list
 */
void insert(char* buffer)
{     
    // try to instantiate node for number
    node* newptr = malloc(sizeof(node));
    if (newptr == NULL)
    {
        return;
    }

    // make a new ponter
    newptr->name = buffer;
    newptr->next = NULL;

    // check for empty list
    if (first == NULL)
    {
        first = newptr;
    }
    // check for insertion at tail
    else
    {
        // keep track of the previous spot in list
        node* predptr = first;

        // because we don't know how long this list is
        // we must induce a forever loop until we find the end
        while (true)
        {
            // check if it is the end of the list
            if (predptr->next == NULL)
            {
                // add new node to end of list
                predptr->next = newptr;

                // break out of forever loop
                break;
            }

            // update pointer
            predptr = predptr->next;
        }
    }         
}

So there you go. We have a basic linked list and now we can keep adding users all we want and we don't have to worry about running out of room. But this does come with down sides. The big problem with this is that every node or "user" in our list is "anonymous". We don't know were they are at or even how many users we have with this. (of course there are ways of making this much better -- I just want to show a very basic linked list) We have to traverse the entire list to add a user because we cannot access the end directly.

所以你去。我们有一个基本的链表,现在我们可以随心所欲地添加用户,而不必担心空间不足。但这确实有不利的一面。最大的问题是我们列表中的每个节点或“用户”都是“匿名的”。我们不知道他们是否在,甚至我们有多少用户。(当然有很多方法可以让这个更好——我只想展示一个非常基本的链表)我们必须遍历整个列表来添加一个用户,因为我们不能直接访问最后。

It's like we are in a huge dust storm and you can't see anything and we need to get to our barn. We can't see where our barn is but we have a solution. There are people standing our there (our nodes) and they are all holding two ropes (our pointers). Each person only owns one rope but that rope is being held at the other end by someone else. Just like our struct, the rope acts as a pointer to where they are. So how do we get to our barn? (for this example the barn is the last "person" in the list). Well we have no idea how big our line of people are or where they go. In fact, all we see is a fence post with a rope tied to it. (Our root!) that fence post will never change so we can grab the post and start moving along until we see our first person. That person is holding two ropes (the post's pointer and their pointer).

就像我们在一场巨大的沙尘暴中,你什么也看不见,我们需要去我们的谷仓。我们看不到我们的谷仓在哪里,但我们有一个解决方案。有人站在我们那里(我们的nodes),他们都拿着两条绳子(我们的指针)。每个人只拥有一根绳子,但绳子的另一端被其他人握住。就像我们的struct,绳索充当指向它们所在位置的指针。那么我们怎么去我们的谷仓呢?(对于这个例子,谷仓是列表中的最后一个“人”)。好吧,我们不知道我们的人员有多大,也不知道他们去了哪里。事实上,我们所看到的只是一根系着绳子的栅栏柱。(我们的根!)那个栅栏柱永远不会改变,所以我们可以抓住柱子并开始前进,直到我们看到我们的第一个人。那个人拿着两条绳子(帖子'

So we keep traveling along the rope until we reach a new person and grab onto their rope. Eventually, we get to the end and find our barn!

所以我们一直沿着绳子走,直到我们遇到一个新人并抓住他们的绳子。最终,我们到达终点并找到我们的谷仓!

So that is a linked list in a nutshell. Its benefits are that it can expand as much as you want but its runtime depends on how big the list is, namely O(n). So if there are 1 million users, it would have to run 1 million times to insert a new name! Wow that seems really wasteful just to insert 1 name.

简而言之,这就是一个链表。它的好处是它可以随心所欲地扩展,但它的运行时间取决于列表的大小,即 O(n)。因此,如果有 100 万用户,则必须运行 100 万次才能插入新名称!哇,仅仅插入 1 个名字似乎真的很浪费。

Luckily, we are clever and can create a better solution. Why don't we, instead of having just one linked list, have a few linked lists. An array of linked lists if you will. Why don't we make an array of size 26. So we can have a unique linked list for every letter of the alphabet. Now instead of a run time of n. We can reasonably say that our new run time will be n/26. Now that won't make much of a difference at all if you have a list 1 million big. But we're just going to keep it simple for this example.

幸运的是,我们很聪明,可以创建更好的解决方案。我们为什么不,而不是只有一个链表,而是有几个链表。一个链表数组,如果你愿意的话。为什么我们不创建一个大小为 26 的数组。这样我们就可以为字母表中的每个字母创建一个唯一的链表。现在而不是 n 的运行时间。我们可以合理地说,我们的新运行时间将是 n/26。现在,如果您有一个 100 万大的列表,那根本就没有太大区别。但是对于这个例子,我们只是要保持简单。

So we have an array of linked lists but how are we going to sort our users into the array. Well... why don't we make a function that decides which user should go where. This function will "hash" the users if you will into an array or "table". So let's create this "hashed" linked list. Thus the name hash table

所以我们有一个链表数组,但是我们如何将我们的用户分类到数组中。嗯...为什么我们不做一个功能来决定哪个用户应该去哪里。如果您将用户放入数组或“表”,则此函数将“散列”用户。所以让我们创建这个“散列”链表。因此名称哈希表

Hash Table

哈希表

As I just said, our hash table will be an array of linked lists and will be hashed by the first letter of their username. Awill go to position 0, Bto 1, and so on.

正如我刚才所说,我们的哈希表将是一个链表数组,并将按其用户名的第一个字母进行哈希处理。A将转到位置 0、B到 1,依此类推。

The structfor the this hash table will be the same as the struct for our previous linked list

struct为这个哈希表将是相同的结构我们以前的链表

typedef struct node
{
    char* name;
    struct node* next;
}
node;

Now just like our linked list, we need a root for our hash table

现在就像我们的链表一样,我们的哈希表需要一个根

node* first[26] = {NULL};

The root will be an array the size of the alphabet and all positions in it will be initialized to NULL. (Remember: the last element in a linked list always has to point to NULLor else we wouldn't know it was the end)

根将是一个大小为字母表的数组,其中的所有位置都将被初始化为NULL。(记住:链表中的最后一个元素必须始终指向,NULL否则我们不知道它是结尾)

Let's make a main function. It takes a username we are going to hash then insert.

让我们做一个主函数。它需要一个我们要散列然后插入的用户名。

int main(char* name)
{
    // hash the name into a spot
    int hashedValue = hash(name);

    // insert the name in table with hashed value
    insert(hashedValue, name);
}

So here's our hash function. It's pretty simple. All we want to do is look at the first letter in the word and give a value from 0 - 25 based on what letter it is

所以这是我们的哈希函数。这很简单。我们想要做的就是查看单词中的第一个字母,并根据它是什么字母给出一个 0 - 25 的值

/*
 * takes a string and hashes it into the correct bucket
 */
int hash(const char* buffer)
{
    // assign a number to the first char of buffer from 0-25
    return tolower(buffer[0]) - 'a';
}

So now all we need is to create our insert function. It's going to look just like our insert function before except every time we reference our root, we're going to reference it as an array.

所以现在我们只需要创建我们的插入函数。它看起来就像我们之前的插入函数,除了每次我们引用我们的根时,我们都将它作为一个数组来引用。

/*
 * takes a string and inserts it into a linked list at a part of the hash table
 */
void insert(int key, const char* buffer)
{
    // try to instantiate node to insert word
    node* newptr = malloc(sizeof(node));
    if (newptr == NULL)
    {
        return;
    }

    // make a new pointer
    strcpy(newptr->name, buffer);
    newptr->next = NULL;

    // check for empty list
    if (first[key] == NULL)
    {
       first[key] = newptr;
    }
    // check for insertion at tail
    else
    {
        node* predptr = first[key];
        while (true)
        {
            // insert at tail
            if (predptr->next == NULL)
            {
                predptr->next = newptr;
                break;
            }

            // update pointer
            predptr = predptr->next;
        }
    }
}

So that's basics of a hash table. It's pretty simple if you know how to use pointers and structs. I know that was a pretty simple example of a hash table with only an insert function, but you can make it a lot better and get more creative with your hashing function. You can also make the array as big as you want or even a use multi-dimensional array.

这是哈希表的基础知识。如果您知道如何使用指针和结构,那就很简单了。我知道这是一个非常简单的哈希表示例,只有一个插入函数,但是您可以将它做得更好,并通过哈希函数获得更多创意。您还可以根据需要使数组尽可能大,甚至可以使用多维数组。