database 在数据库中存储标签的最有效方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/334183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:06:27  来源:igfitidea点击:

What is the most efficient way to store tags in a database?

databasedatabase-designtagstagging

提问by Logan Serman

I am implementing a tagging system on my website similar to one stackoverflow uses, my question is - what is the most effective way to store tags so that they may be searched and filtered?

我正在我的网站上实现一个类似于 stackoverflow 使用的标记系统,我的问题是 - 存储标记以便搜索和过滤的最有效方法是什么?

My idea is this:

我的想法是这样的:

Table: Items
Columns: Item_ID, Title, Content

Table: Tags
Columns: Title, Item_ID

Is this too slow? Is there a better way?

这太慢了吗?有没有更好的办法?

回答by Simon Scarfe

One item is going to have many tags. And one tag will belong to many items. This implies to me that you'll quite possibly need an intermediary table to overcome the many-to-many obstacle.

一件物品将有许多标签。一个标签将属于许多项目。这对我来说意味着您很可能需要一个中间表来克服多对多障碍。

Something like:

就像是:

Table: Items
Columns: Item_ID, Item_Title, Content

Table: Tags
Columns: Tag_ID, Tag_Title

Table: Items_Tags
Columns: Item_ID, Tag_ID

表:项目
列:Item_ID、Item_Title、内容

表:标签
列:Tag_ID、Tag_Title

表:Items_Tags
列:Item_ID、Tag_ID

It might be that your web app is insanely popular and need denormalising down the road, but it's pointless muddying the waters too early.

可能是您的 Web 应用程序非常受欢迎,并且需要在未来进行非规范化,但过早地将水域混为一谈是毫无意义的。

回答by Rob Kennedy

You should read Philipp Keller's blog posts about tagging database schemas. He tries a few and reports his results, both in terms of ease of constructing common queries, and in terms of performance. Number of tags, number of tagged items, and number of tags per item were all factors. The posts are from 2005; I'm not aware of any updates since then.

您应该阅读 Philipp Keller 关于标记数据库模式的博客文章。他尝试了几个,他报告的结果,无论是在轻松构建常用查询的条件,并在性能方面。标签数量、标签项目数量和每个项目的标签数量都是因素。这些帖子是 2005 年的;从那以后,我不知道有任何更新。

回答by Neil Barnwell

Actually I believe de-normalising the tags table might be a better way forward, depending on scale.

实际上,我认为对标签表进行反规范化可能是更好的方法,具体取决于规模。

This way, the tags table simply has tagid, itemid, tagname.

这样,标签表只有 tagid、itemid、tagname。

You'll get duplicate tagnames, but it makes adding/removing/editing tags for specific items MUCH more simple. You don't have to create a new tag, remove the allocation of the old one and re-allocate a new one, you just edit the tagname.

您将获得重复的标记名,但它使为特定项目添加/删除/编辑标记变得更加简单。您不必创建新标签,删除旧标签的分配并重新分配新标签,您只需编辑标签名。

For displaying a list of tags, you simply use DISTINCT or GROUP BY, and of course you can count how many times a tag is used easily, too.

要显示标签列表,您只需使用 DISTINCT 或 GROUP BY,当然您也可以轻松计算标签使用的次数。

回答by Dmitry Shvedov

If you don't mind using a bit of non-standard stuff, Postgres version 9.4 and up has an option of storing a record of type JSON text array.

如果您不介意使用一些非标准的东西,Postgres 9.4 及更高版本可以选择存储 JSON 文本数组类型的记录。

Your schema would be:

您的架构将是:

Table: Items
Columns: Item_ID:int, Title:text, Content:text

Table: Tags
Columns: Item_ID:int, Tag_Title:text[]

For more info, see this excellent post by Josh Berkus: http://www.databasesoup.com/2015/01/tag-all-things.html

有关更多信息,请参阅 Josh Berkus 的这篇优秀文章:http://www.databasesoup.com/2015/01/tag-all-things.html

There are more various options compared thoroughly for performance and the one suggested above is the best overall.

对性能进行了彻底比较,有更多不同的选项,上面建议的一种是总体上最好的。

回答by Valentin Vasilyev

I'd suggest using intermediary third table for storing tags<=>items associations, since we have many-to-many relations between tags and items, i.e. one item can be associated with multiple tags and one tag can be associated with multiple items. HTH, Valve.

我建议使用中间第三个表来存储标签<=>项目关联,因为我们在标签和项目之间有多对多关系,即一个项目可以与多个标签相关联,一个标签可以与多个项目相关联。HTH,阀门。

回答by Adam Pope

If space is going to be an issue, have a 3rd table Tags(Tag_Id, Title) to store the text for the tag and then change your Tags table to be (Tag_Id, Item_Id). Those two values should provide a unique composite primary key as well.

如果空间将成为问题,请使用第三个表 Tags(Tag_Id, Title) 来存储标签的文本,然后将您的标签表更改为 (Tag_Id, Item_Id)。这两个值也应该提供唯一的复合主键。

回答by Rockcoder

You can't really talk about slowness based on the data you provided in a question. And I don't think you should even worry too much about performance at this stage of developement. It's called premature optimization.

根据您在问题中提供的数据,您无法真正谈论缓慢。而且我认为在这个开发阶段你甚至不应该太担心性能。这称为过早优化

However, I'd suggest that you'd include Tag_ID column in the Tags table. It's usually a good practice that every table has an ID column.

但是,我建议您在标签表中包含 Tag_ID 列。每个表都有一个 ID 列通常是一个好习惯。

回答by Timothy Khouri

Items should have an "ID" field, and Tags should have an "ID" field (Primary Key, Clustered).

项目应该有一个“ID”字段,标签应该有一个“ID”字段(主键,集群)。

Then make an intermediate table of ItemID/TagID and put the "Perfect Index" on there.

然后制作一个 ItemID/TagID 的中间表,并将“完美索引”放在那里。