SQL 关系数据库中的键值对
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/126271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Key value pairs in relational database
提问by horace
Does someone have experience with storing key-value pairs in a database?
有人有在数据库中存储键值对的经验吗?
I've been using this type of table:
我一直在使用这种类型的表:
CREATE TABLE key_value_pairs (
itemid varchar(32) NOT NULL,
itemkey varchar(32) NOT NULL,
itemvalue varchar(32) NOT NULL,
CONSTRAINT ct_primarykey PRIMARY KEY(itemid,itemkey)
)
Then for example the following rows could exist:
然后例如可能存在以下行:
itemid itemkey itemvalue
---------------- ------------- ------------
123 Colour Red
123 Size Medium
123 Fabric Cotton
The trouble with this scheme is the SQL syntax required to extract data is quite complex. Would it be better to just create a series of key/value columns?
这种方案的问题在于提取数据所需的 SQL 语法非常复杂。只创建一系列键/值列会更好吗?
CREATE TABLE key_value_pairs (
itemid varchar(32) NOT NULL,
itemkey1 varchar(32) NOT NULL,
itemvalue1 varchar(32) NOT NULL,
itemkey2 varchar(32) NOT NULL,
itemvalue2 varchar(32) NOT NULL,
. . .etc . . .
)
This will be easier and faster to query but lacks the extensibility of the first approach. Any advice?
这将更容易和更快地查询,但缺乏第一种方法的可扩展性。有什么建议吗?
回答by Darrel Miller
Before you continue on your approach, I would humbly suggest you step back and consider if you really want to store this data in a "Key-Value Pair"table. I don't know your application but my experience has shown that every time I have done what you are doing, later on I wish I had created a color table, a fabric table and a size table.
在你继续你的方法之前,我会谦虚地建议你退后一步,考虑一下你是否真的想将这些数据存储在“键值对”表中。我不知道你的应用程序,但我的经验表明,每次我完成你正在做的事情后,我都希望我已经创建了一个颜色表、一个织物表和一个尺寸表。
Think about referential integrity constraints, if you take the key-value pair approach, the database can't tell you when you are trying to store a color id in a size field
考虑参照完整性约束,如果您采用键值对方法,数据库无法告诉您何时尝试在大小字段中存储颜色 ID
Think about the performance benefits of joining on a table with 10 values versus a generic value that may have thousands of values across multiple domains. How useful is an index on Key Value really going to be?
考虑在具有 10 个值的表上加入与在多个域中可能具有数千个值的通用值相比的性能优势。Key Value 的索引到底有多大用处?
Usually the reasoning behind doing what you are doing is because the domains need to be "user definable". If that is the case then even I am not going to push you towards creating tables on the fly (although that is a feasible approach).
通常做你正在做的事情背后的原因是因为域需要是“用户可定义的”。如果是这种情况,那么即使我也不打算推动您即时创建表格(尽管这是一种可行的方法)。
However, if your reasoning is because you think it will be easier to manage than multiple tables, or because you are envisioning a maintenance user interface that is generic for all domains, then stop and think really hard before you continue.
但是,如果您的推理是因为您认为它比多个表更易于管理,或者因为您正在设想一个对所有域通用的维护用户界面,那么在继续之前请停下来认真思考。
回答by Peter Marshall
There is another solution that falls somewhere between the two. You can use an xml type column for the keys and values. So you keep the itemid field, then have an xml field that contains the xml defined for some key value pairs like <items> <item key="colour" value="red"/><item key="xxx" value="blah"/></items>
Then when you extract your data fro the database you can process the xml in a number of different ways. Depending on your usage. This is an extend able solution.
还有另一种解决方案介于两者之间。您可以将 xml 类型列用于键和值。因此,您保留 itemid 字段,然后有一个 xml 字段,其中包含为某些键值对定义的 xml,<items> <item key="colour" value="red"/><item key="xxx" value="blah"/></items>
然后当您从数据库中提取数据时,您可以以多种不同的方式处理 xml。看你的使用情况。这是一个可扩展的解决方案。
回答by Matthew Watson
In most cases that you would use the first method, it's because you haven't really sat down and thought out your model. "Well, we don't know what the keys will be yet". Generally, this is pretty poor design. It's going to be slower than actually having your keys as columns, which they should be.
在大多数情况下,您会使用第一种方法,这是因为您还没有真正坐下来思考您的模型。“嗯,我们还不知道钥匙会是什么”。一般来说,这是非常糟糕的设计。这将比实际将您的键作为列要慢,它们应该是。
I'd also question why your id is a varchar.
我也会质疑为什么你的 id 是一个 varchar。
In the rare case that you really must implement a key/value table, the first solution is fine, although, I'd generally want to have the keys in a separate table so you aren't storing varchars as the keys in your key/value table.
在您确实必须实现键/值表的极少数情况下,第一个解决方案很好,不过,我通常希望将键放在单独的表中,这样您就不会将 varchars 存储为键中的键/价值表。
eg,
例如,
CREATE TABLE valid_keys (
id NUMBER(10) NOT NULL,
description varchar(32) NOT NULL,
CONSTRAINT pk_valid_keys PRIMARY KEY(id)
);
CREATE TABLE item_values (
item_id NUMBER(10) NOT NULL,
key_id NUMBER(10) NOT NULL,
item_value VARCHAR2(32) NOT NULL,
CONSTRAINT pk_item_values PRIMARY KEY(item_id),
CONSTRAINT fk_item_values_iv FOREIGN KEY (key_id) REFERENCES valid_keys (id)
);
You can then even go nuts and add a "TYPE" to the keys, allowing some type checking.
然后,您甚至可以疯狂地为键添加“类型”,从而允许进行一些类型检查。
回答by Mario
I once used key-value pairs in a database for the purpose of creating a spreadsheet (used for data entry) in which a teller would summarize his activity from working a cash drawer. Each k/v pair represented a named cell into which the user entered a monetary amount. The primary reason for this approach is that the spreadsheet was highly subject to change. New products and services were added routinely (thus new cells appeared). Also, certain cells were not needed in certain situations and could be dropped.
我曾经在数据库中使用键值对来创建电子表格(用于数据输入),柜员将在其中总结他在现金抽屉工作中的活动。每个 k/v 对代表一个命名单元格,用户在其中输入货币金额。这种方法的主要原因是电子表格很容易发生变化。定期添加新产品和服务(因此出现了新细胞)。此外,在某些情况下不需要某些单元格并且可以丢弃某些单元格。
The app I wrote was a rewrite of an application that did break the teller sheet into separate sections each represented in a different table. The trouble here was that as products and services were added, schema modifications were required. As with all design choices there are pros and cons to taking a certain direction as compared to another. My redesign certainly performed slower and more quickly consumed disk space; however, it was highly agile and allowed for new products and services to be added in minutes. The only issue of note, however, was disk consumption; there were no other headaches I can recall.
我编写的应用程序是对应用程序的重写,该应用程序确实将柜员表分成了不同的部分,每个部分都在不同的表格中表示。这里的问题是,随着产品和服务的添加,需要修改架构。与所有设计选择一样,与另一个方向相比,选择某个方向有利有弊。我的重新设计当然执行得更慢,消耗的磁盘空间也更快;但是,它非常灵活,可以在几分钟内添加新产品和服务。然而,唯一需要注意的问题是磁盘消耗。我记得没有其他头痛了。
As already mentioned, the reason I usually consider a key-value pair approach is when users—this could be a the business owner—want to create their own types having a user-specific set of attributes. In such situations I have come to the following determination.
如前所述,我通常考虑使用键值对方法的原因是当用户(这可能是企业主)想要创建自己的具有用户特定属性集的类型时。在这种情况下,我做出了以下决定。
If there is either no need to retrieve data by these attributes or searching can be deferred to the application once a chunk of data has been retrieved, I recommend storing all the attributes in a single text field (using JSON, YAML, XML, etc.). If there is a strong need to retrieve data by these attributes, it gets messy.
如果不需要通过这些属性检索数据,或者在检索到大量数据后可以将搜索推迟到应用程序,我建议将所有属性存储在单个文本字段中(使用 JSON、YAML、XML 等。 )。如果强烈需要通过这些属性检索数据,它就会变得混乱。
You can create a single "attributes" table (id, item_id, key, value, data_type, sort_value) where the sort column coverts the actual value into a string-sortable representation. (e.g. date: “2010-12-25 12:00:00”, number: “0000000001”) Or you can create separate attribute tables by data-type (e.g. string_attributes, date_attributes, number_attributes). Among numerous pros and cons to both approaches: the first is simpler, the second is faster. Both will cause you to write ugly, complex queries.
您可以创建单个“属性”表(id、item_id、key、value、data_type、sort_value),其中排序列将实际值转换为字符串可排序表示。(例如日期:“2010-12-25 12:00:00”,编号:“0000000001”)或者您可以按数据类型创建单独的属性表(例如 string_attributes、date_attributes、number_attributes)。在这两种方法的众多优缺点中:第一种更简单,第二种更快。两者都会导致您编写丑陋、复杂的查询。
回答by Jarod Elliott
From experience, i have found that certain keys will be more widely used or queried more often. We have usually then slightly de-normalized the design to include a specific field back in the main "item" table.
根据经验,我发现某些键将被更广泛地使用或更频繁地查询。我们通常会稍微对设计进行反规范化,以在主“项目”表中包含一个特定的字段。
eg. if every Item has a Colour, you might add the Colour column to your item table. Fabric and Size may be used less often and can be kept separate in the key-value pair table. You may even keep the colour in the key-value pair table, but duplicate the data in the item table to get the performance benefits.
例如。如果每个项目都有一个颜色,您可以将颜色列添加到您的项目表中。Fabric 和 Size 可能不经常使用,并且可以在键值对表中分开保存。您甚至可以将颜色保留在键值对表中,但复制项目表中的数据以获得性能优势。
Obviously this varies depending on the data and how flexible you need the key-value pairs to be. It can also result in your attribute data not being located consistantly. However, de-normalizing does greatly simplify the queries and improves their performance as well.
显然,这取决于数据以及您需要键值对的灵活性。它还可能导致您的属性数据无法一致定位。然而,反规范化确实极大地简化了查询并提高了它们的性能。
I would usually only consider de-normalizing when performance becomes and issue, not just to simplify a query.
我通常只会在性能出现问题时才考虑反规范化,而不仅仅是为了简化查询。
回答by Amar
PostgreSQL 8.4 supports hstore data type for storing sets of (key,value) pairs within a single PostgreSQL data field. Please refer http://www.postgresql.org/docs/8.4/static/hstore.htmlfor its usage information. Though it's very old question but thought to pass on this info thinking it might help someone.
PostgreSQL 8.4 支持 hstore 数据类型,用于在单个 PostgreSQL 数据字段中存储(键,值)对的集合。有关其使用信息,请参阅http://www.postgresql.org/docs/8.4/static/hstore.html。虽然这是一个非常古老的问题,但考虑传递此信息,认为它可能对某人有所帮助。
回答by Adam Pierce
I don't understand why the SQL to extract data should be complex for your first design. Surely to get all values for an item, you just do this:
我不明白为什么提取数据的 SQL 对于您的第一个设计来说应该很复杂。当然要获取项目的所有值,您只需执行以下操作:
SELECT itemkey,itemvalue FROM key_value_pairs WHERE itemid='123';
or if you just want one particular key for that item:
或者如果您只想要该项目的一个特定键:
SELECT itemvalue FROM key_value_pairs WHERE itemid='123' AND itemkey='Fabric';
The first design also gives you the flexibility to easily add new keys whenever you like.
第一种设计还使您可以灵活地随时轻松添加新键。
回答by mansu
I think the best way to design such tables is as follows:
我认为设计此类表的最佳方法如下:
- Make the frequently used fields as columns in the database.
- Provide a Misc column which contains a dictionary(in JSON/XML/other string formeat) which will contain the fields as key-value pairs.
- 将常用字段作为数据库中的列。
- 提供一个 Misc 列,其中包含一个字典(以 JSON/XML/其他字符串格式),它将包含作为键值对的字段。
Salient points:
要点:
- You can write your normal SQL queries to query for SQL in most situations.
- You can do a FullTextSearch on the key-value pairs. MySQL has a full text search engine, else you can use "like" queries which are a little slower. While full text search is bad, we assume that such queries are fewer, so that should not cause too many issues.
- If your key-value pairs are simple boolean flags, this technique has the same power as having a separate column for the key. Any more complex operation on the key value pairs should be done outside the database.
- Looking at the frequency of queries over a period of time will give tell you which key-value pairs need to be converted in columns.
- This technique also makes it easy to force integrity constraints on the database.
- It provides a more natural path for developers to re-factor their schema and code.
- 在大多数情况下,您可以编写普通的 SQL 查询来查询 SQL。
- 您可以对键值对执行 FullTextSearch。MySQL 有一个全文搜索引擎,否则你可以使用慢一点的“like”查询。虽然全文搜索很糟糕,但我们假设此类查询较少,因此不会造成太多问题。
- 如果您的键值对是简单的布尔标志,则此技术与为键设置单独列具有相同的功能。对键值对的任何更复杂的操作都应该在数据库之外完成。
- 查看一段时间内的查询频率会告诉您哪些键值对需要在列中转换。
- 这种技术还可以轻松地对数据库施加完整性约束。
- 它为开发人员提供了一条更自然的路径来重构他们的架构和代码。
回答by Jarrett Meyer
Violating normalization rules is fine as long as the business requirement can still be fulfilled. Having key_1, value_1, key_2, value_2, ... key_n, value_n
can be OK, right up until the point that you need key_n+1, value_n+1
.
只要仍然可以满足业务需求,违反规范化规则就可以。拥有key_1, value_1, key_2, value_2, ... key_n, value_n
可以,直到您需要为止key_n+1, value_n+1
。
My solution has been a table of data for shared attributes and XML for unique attributes. That means I use both. If everything (or most things) have a size, then size is a column in the table. If only object A have attribute Z, then Z is stored as XML similar Peter Marshall's answer already given.
我的解决方案是共享属性的数据表和唯一属性的 XML。这意味着我同时使用两者。如果所有东西(或大多数东西)都有大小,那么大小就是表中的一列。如果只有对象 A 具有属性 Z,则 Z 存储为 XML 类似于 Peter Marshall 已经给出的答案。
回答by Hannes Ovrén
If you have very few possible keys, then I would just store them as columns. But if the set of possible keys is large then your first approach is good (and the second approach would be impossible).
如果您的可能键很少,那么我会将它们存储为列。但是,如果可能的键集很大,那么您的第一种方法很好(第二种方法是不可能的)。
Or is it so that each item can only have a finite number of keys, but the keys could be something from a large set?
或者是不是每个项目只能有有限数量的键,但键可能来自一个大集合?
You could also consider using an Object Relational Mapper to make querying easier.
您还可以考虑使用对象关系映射器来简化查询。