推荐用于标记或标记的 SQL 数据库设计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:09:03  来源:igfitidea点击:

Recommended SQL database design for tags or tagging

sqldatabase-designtagsdata-modelingtagging

提问by dlamblin

I've heard of a few ways to implement tagging; using a mapping table between TagID and ItemID (makes sense to me, but does it scale?), adding a fixed number of possible TagID columns to ItemID (seems like a bad idea), Keeping tags in a text column that's comma separated (sounds crazy but could work). I've even heard someone recommend a sparse matrix, but then how do the tag names grow gracefully?

我听说过几种实现标记的方法;使用 TagID 和 ItemID 之间的映射表(对我来说很有意义,但它是否可以缩放?),将固定数量的可能 TagID 列添加到 ItemID(似乎是个坏主意),将标记保留在以逗号分隔的文本列中(听起来疯狂但可以工作)。我什至听说有人推荐稀疏矩阵,但是标签名称如何优雅地增长?

Am I missing a best practice for tags?

我是否缺少标签的最佳实践?

回答by Yaakov Ellis

Three tables (one for storing all items, one for all tags, and one for the relation between the two), properly indexed, with foreign keys set running on a proper database, should work well and scale properly.

三张表(一张用于存储所有项目,一张用于所有标签,一张用于两者之间的关系),正确索引,外键设置在适当的数据库上运行,应该可以正常工作并正确扩展。

Table: Item
Columns: ItemID, Title, Content

Table: Tag
Columns: TagID, Title

Table: ItemTag
Columns: ItemID, TagID

回答by Scheintod

Normally I would agree with Yaakov Ellis but in this special case there is another viable solution:

通常我会同意 Yaakov Ellis 但在这种特殊情况下还有另一个可行的解决方案:

Use two tables:

使用两个表:

Table: Item
Columns: ItemID, Title, Content
Indexes: ItemID

Table: Tag
Columns: ItemID, Title
Indexes: ItemId, Title

This has some major advantages:

这有一些主要优点:

First it makes development much simpler: in the three-table solution for insert and update of itemyou have to lookup the Tagtable to see if there are already entries. Then you have to join them with new ones. This is no trivial task.

首先,它使开发变得更加简单:在插入和更新的三表解决方案中,item您必须查找Tag表以查看是否已经有条目。然后你必须加入他们与新的。这绝非易事。

Then it makes queries simpler (and perhaps faster). There are three major database queries which you will do: Output all Tagsfor one Item, draw a Tag-Cloud and select all items for one Tag Title.

然后它使查询更简单(也许更快)。您将执行三个主要的数据库查询:Tags为一个输出全部Item,绘制一个标签云并选择一个标签标题的所有项目。

All Tags for one Item:

一件商品的所有标签:

3-Table:

3-表:

SELECT Tag.Title 
  FROM Tag 
  JOIN ItemTag ON Tag.TagID = ItemTag.TagID
 WHERE ItemTag.ItemID = :id

2-Table:

2-表:

SELECT Tag.Title
FROM Tag
WHERE Tag.ItemID = :id

Tag-Cloud:

标签云:

3-Table:

3-表:

SELECT Tag.Title, count(*)
  FROM Tag
  JOIN ItemTag ON Tag.TagID = ItemTag.TagID
 GROUP BY Tag.Title

2-Table:

2-表:

SELECT Tag.Title, count(*)
  FROM Tag
 GROUP BY Tag.Title

Items for one Tag:

一个标签的项目:

3-Table:

3-表:

SELECT Item.*
  FROM Item
  JOIN ItemTag ON Item.ItemID = ItemTag.ItemID
  JOIN Tag ON ItemTag.TagID = Tag.TagID
 WHERE Tag.Title = :title

2-Table:

2-表:

SELECT Item.*
  FROM Item
  JOIN Tag ON Item.ItemID = Tag.ItemID
 WHERE Tag.Title = :title

But there are some drawbacks, too: It could take more space in the database (which could lead to more disk operations which is slower) and it's not normalized which could lead to inconsistencies.

但是也有一些缺点:它可能在数据库中占用更多空间(这可能导致更多的磁盘操作变慢)并且它没有标准化,这可能导致不一致。

The size argument is not that strong because the very nature of tags is that they are normally pretty small so the size increase is not a large one. One could argue that the query for the tag title is much faster in a small table which contains each tag only once and this certainly is true. But taking in regard the savings for not having to join and the fact that you can build a good index on them could easily compensate for this. This of course depends heavily on the size of the database you are using.

大小论据不是那么强烈,因为标签的本质是它们通常非常小,因此大小增加不是很大。有人可能会争辩说,在一个只包含每个标签一次的小表中,对标签标题的查询要快得多,这当然是真的。但是考虑到不必加入的节省以及您可以在它们上建立良好索引的事实可以轻松弥补这一点。这当然在很大程度上取决于您使用的数据库的大小。

The inconsistency argument is a little moot too. Tags are free text fields and there is no expected operation like 'rename all tags "foo" to "bar"'.

不一致的论点也有点没有实际意义。标签是自由文本字段,没有像“将所有标签“foo”重命名为“bar”这样的预期操作。

So tldr: I would go for the two-table solution. (In fact I'm going to. I found this article to see if there are valid arguments against it.)

所以 tldr:我会选择两桌解决方案。(事实上​​我要去。我发现这篇文章是为了看看是否有有效的论据反对它。)

回答by Nick Retallack

If you are using a database that supports map-reduce, like couchdb, storing tags in a plain text field or list field is indeed the best way. Example:

如果您使用的是支持 map-reduce 的数据库,例如 couchdb,将标签存储在纯文本字段或列表字段中确实是最好的方法。例子:

tagcloud: {
  map: function(doc){ 
    for(tag in doc.tags){ 
      emit(doc.tags[tag],1) 
    }
  }
  reduce: function(keys,values){
    return values.length
  }
}

Running this with group=true will group the results by tag name, and even return a count of the number of times that tag was encountered. It's very similar to counting the occurrences of a word in text.

使用 group=true 运行此命令将按标签名称对结果进行分组,甚至返回遇到该标签的次数的计数。这与计算 text 中单词的出现次数非常相似。

回答by David Schmitt

Use a single formatted text column[1] for storing the tags and use a capable full text search engine to index this. Else you will run into scaling problems when trying to implement boolean queries.

使用单个格式化文本列 [1] 来存储标签,并使用功能强大的全文搜索引擎对其进行索引。否则,您将在尝试实现布尔查询时遇到缩放问题。

If you need details about the tags you have, you can either keep track of it in a incrementally maintained table or run a batch job to extract the information.

如果您需要有关您拥有的标签的详细信息,您可以在增量维护的表中跟踪它或运行批处理作业来提取信息。

[1] Some RDBMS even provide a native array type which might be even better suited for storage by not needing a parsing step, but might cause problems with the full text search.

[1] 一些 RDBMS 甚至提供本机数组类型,它可能更适合存储,不需要解析步骤,但可能会导致全文搜索出现问题。

回答by Mark Biek

I've always kept the tags in a separate table and then had a mapping table. Of course I've never done anything on a really large scale either.

我总是将标签保存在一个单独的表中,然后有一个映射表。当然,我也从来没有做过大规模的事情。

Having a "tags" table and a map table makes it pretty trivial to generate tag clouds & such since you can easily put together SQL to get a list of tags with counts of how often each tag is used.

拥有“标签”表和地图表使得生成标签云等变得非常简单,因为您可以轻松地将 SQL 放在一起以获取带有每个标签使用频率计数的标签列表。

回答by user236575

I would suggest following design : Item Table: Itemid, taglist1, taglist2
this will be fast and make easy saving and retrieving the data at item level.

我建议采用以下设计: 项目表:Itemid、taglist1、taglist2
这将很快并且可以轻松地在项目级别保存和检索数据。

In parallel build another table: Tags tag do not make tag unique identifier and if you run out of space in 2nd column which contains lets say 100 items create another row.

并行构建另一个表:标签标签不会使标签唯一标识符,如果第二列中的空间不足,其中包含 100 个项目,则创建另一行。

Now while searching for items for a tag it will be super fast.

现在,在为标签搜索项目时,速度会非常快。