SQL 如何将列表存储在数据库表的列中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3070384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 06:36:11  来源:igfitidea点击:

How to store a list in a column of a database table

sqllinqlinq-to-sqldatabase-designlinq-to-entities

提问by JnBrymn

So, per Mehrdad's answer to a related question, I get itthat a "proper" database table column doesn't store a list. Rather, you should create another table that effectively holds the elements of said list and then link to it directly or through a junction table. However, the type of list I want to create will be composed of unique items (unlike the linked question's fruitexample). Furthermore, the items in my list are explicitly sorted - which means that if I stored the elements in another table, I'd have to sort them every time I accessed them. Finally, the list is basically atomic in that any time I wish to access the list, I will want to access the entire list rather than just a piece of it - so it seems silly to have to issue a database query to gather together pieces of the list.

所以,每迈赫达德的回答一个相关的问题,我得到它是一个“正确”的数据库表列中不存储列表。相反,您应该创建另一个表来有效地保存所述列表的元素,然后直接或通过联结表链接到它。但是,我要创建的列表类型将由独特的项目组成(与链接问题的结果不同)例子)。此外,我的列表中的项目是明确排序的——这意味着如果我将元素存储在另一个表中,我每次访问它们时都必须对它们进行排序。最后,该列表基本上是原子的,因为任何时候我想访问该列表,我都希望访问整个列表,而不仅仅是其中的一部分 - 所以必须发出数据库查询来收集各个部分似乎很愚蠢列表。

AKX's solution (linked above) is to serialize the list and store it in a binary column. But this also seems inconvenient because it means that I have to worry about serialization and deserialization.

AKX 的解决方案(上面链接)是序列化列表并将其存储在二进制列中。但这似乎也很不方便,因为这意味着我必须担心序列化和反序列化。

Is there any better solution? If there isno better solution, then why? It seems that this problem should come up from time to time.

有没有更好的解决办法?如果没有更好的解决办法,那么为什么呢?看来这个问题应该不时出现。

... just a little more info to let you know where I'm coming from. As soon as I had just begun understanding SQL and databases in general, I was turned on to LINQ to SQL, and so now I'm a little spoiled because I expect to deal with my programming object model without having to think about how the objects are queried or stored in the database.

...只是更多的信息,让你知道我来自哪里。当我刚刚开始了解 SQL 和数据库时,我就开始使用 LINQ to SQL,所以现在我有点被宠坏了,因为我希望处理我的编程对象模型而不必考虑对象如何被查询或存储在数据库中。

Thanks All!

谢谢大家!

John

约翰

UPDATE: So in the first flurry of answers I'm getting, I see "you can go the CSV/XML route... but DON'T!". So now I'm looking for explanations of why. Point me to some good references.

更新:所以在我得到的第一批答案中,我看到“你可以走 CSV/XML 路线......但不要!”。所以现在我正在寻找原因的解释。给我一些好的参考资料。

Also, to give you a better idea of what I'm up to: In my database I have a Function table that will have a list of (x,y) pairs. (The table will also have other information that is of no consequence for our discussion.) I will never need to see part of the list of (x,y) pairs. Rather, I will take all of them and plot them on the screen. I will allow the user to drag the nodes around to change the values occasionally or add more values to the plot.

另外,为了让您更好地了解我在做什么:在我的数据库中,我有一个 Function 表,其中包含一个 (x,y) 对列表。(该表还将包含对我们的讨论无关紧要的其他信息。)我将永远不需要看到 (x,y) 对列表的一部分。相反,我会将它们全部取出并绘制在屏幕上。我将允许用户拖动节点以偶尔更改值或向图中添加更多值。

回答by Adam Robinson

No, there is no "better" way to store a sequence of items in a single column. Relational databases are designed specificallyto store one value per row/column combination. In order to store more than one value, you mustserialize your list into a single value for storage, then deserialize it upon retrieval. There is no other way to do what you're talking about (because what you're talking about is a bad idea that should, in general, never be done).

不,没有“更好”的方法可以将一系列项目存储在单个列中。关系数据库专门设计用于在每个行/列组合中存储一个值。为了存储多个值,您必须将列表序列化为单个值进行存储,然后在检索时将其反序列化。没有其他方法可以做你所说的(因为你所说的是一个坏主意,一般来说,永远不要做)。

I understand that you think it's silly to create another table to store that list, but this is exactly what relational databases do. You're fighting an uphill battle and violating one of the most basic principles of relational database design for no good reason. Since you state that you're just learning SQL, I would stronglyadvise you to avoid this idea and stick with the practices recommended to you by more seasoned SQL developers.

我知道您认为创建另一个表来存储该列表是愚蠢的,但这正是关系数据库所做的。您正在打一场艰苦的战斗,并且无缘无故地违反了关系数据库设计的最基本原则之一。由于您声明您只是在学习 SQL,我强烈建议您避免这种想法并坚持使用经验丰富的 SQL 开发人员向您推荐的做法。

The principle you're violating is called first normal form, which is the first step in database normalization.

您违反的原则称为第一范式,这是数据库规范化的第一步。

At the risk of oversimplifying things, database normalization is the process of defining your database based upon what the data is, so that you can write sensible, consistent queries against it and be able to maintain it easily. Normalization is designed to limit logical inconsistencies and corruption in your data, and there are a lot of levels to it. The Wikipedia article on database normalizationis actually pretty good.

在事情简单化的风险,数据库标准化定义是基于数据你的数据库的过程,让你可以写明智的,一致的查询反对它,并能够轻松地维护它。规范化旨在限制数据中的逻辑不一致和损坏,并且有很多级别。维基百科关于数据库规范化的文章其实很不错。

Basically, the first rule (or form) of normalization states that your table must represent a relation. This means that:

基本上,规范化的第一条规则(或形式)规定您的表必须表示一个关系。这意味着:

  • You must be able to differentiate one row from any other row (in other words, you table must have something that canserve as a primary key. This also means that no row should be duplicated.
  • Any ordering of the data must be defined by the data, not by the physical ordering of the rows (SQL is based upon the idea of a set, meaning that the onlyordering you should rely on is that which you explicitly define in your query)
  • Every row/column intersection must contain one and only onevalue
  • 您必须能够将一行与任何其他行区分开来(换句话说,您的表必须具有可以用作主键的内容。这也意味着不应重复任何行。
  • 数据的任何排序都必须由数据定义,而不是由行的物理排序定义(SQL 基于集合的概念,这意味着您应该依赖的唯一排序是您在查询中明确定义的排序)
  • 每个行/列交叉点必须包含一个且仅一个

The last point is obviously the salient point here. SQL is designed to store your sets for you, not to provide you with a "bucket" for you to store a set yourself. Yes, it's possible to do. No, the world won't end. You have, however, already crippled yourself in understanding SQL and the best practices that go along with it by immediately jumping into using an ORM. LINQ to SQL is fantastic, just like graphing calculators are. In the same vein, however, they should notbe used as a substitute for knowing how the processes they employ actually work.

最后一点显然是这里的重点。SQL 旨在为您存储您的集合,而不是为您提供一个“存储桶”让您自己存储一个集合。是的,这是可能的。不,世界不会结束。然而,您已经通过立即开始使用 ORM 来理解 SQL 和与之相伴的最佳实践。LINQ to SQL 非常棒,就像图形计算器一样。本着同样的精神,但是,他们应该被用作替代知道如何处理他们实际使用的工作。

Your list may be entirely "atomic" now, and that may not change for this project. But you will, however, get into the habit of doing similar things in other projects, and you'll eventually (likely quickly) run into a scenario where you're now fitting your quick-n-easy list-in-a-column approach where it is wholly inappropriate. There is not much additional work in creating the correct table for what you're trying to store, and you won't be derided by other SQL developers when they see your database design. Besides, LINQ to SQL is going to see your relation and give you the proper object-oriented interface to your list automatically. Why would you give up the convenience offered to you by the ORM so that you can perform nonstandard and ill-advised database hackery?

您的列表现在可能完全是“原子的”,并且对于这个项目可能不会改变。但是,您会养成在其他项目中做类似事情的习惯,并且最终(可能很快)会遇到这样一种情况:您现在正在拟合快速简便的列中列表完全不合适的方法。为您要存储的内容创建正确的表并没有太多额外的工作,而且当其他 SQL 开发人员看到您的数据库设计时,他们也不会嘲笑您。此外,LINQ to SQL 将查看您的关系并自动为您的列表提供适当的面向对象的接口。为什么要放弃 ORM 为您提供的便利,以便您可以执行非标准和不明智的数据库黑客攻击?

回答by jaltiere

You can just forget SQL all together and go with a "NoSQL" approach. RavenDB, MongoDBand CouchDBjump to mind as possible solutions. With a NoSQL approach, you are not using the relational model..you aren't even constrained to schemas.

您可以完全忘记 SQL 并采用“NoSQL”方法。 RavenDBMongoDBCouchDB作为可能的解决方案跃入脑海。使用 NoSQL 方法,您没有使用关系模型……您甚至不受模式的限制。

回答by jaltiere

What I have seen many people do is this (it may not be the best approach, correct me if I am wrong):

我见过很多人这样做(这可能不是最好的方法,如果我错了,请纠正我):

The table which I am using in the example is given below(the table includes nicknames that you have given to your specific girlfriends. Each girlfriend has a unique id):

我在示例中使用的表格如下(该表格包括您给特定女朋友的昵称。每个女朋友都有一个唯一的 ID):

nicknames(id,seq_no,names)

Suppose, you want to store many nicknames under an id. This is why we have included a seq_nofield.

假设您想在一个 id 下存储许多昵称。这就是我们包含一个seq_no字段的原因。

Now, fill these values to your table:

现在,将这些值填充到您的表中:

(1,1,'sweetheart'), (1,2,'pumpkin'), (2,1,'cutie'), (2,2,'cherry pie')

If you want to find all the names that you have given to your girl friend id 1 then you can use:

如果你想找到你给你女朋友 id 1 的所有名字,那么你可以使用:

select names from nicknames where id = 1;

回答by Haakon L?tveit

Simple answer: If, and only if, you're certain that the list will always be used as a list, then join the list together on your end with a character (such as '\0') that will not be used in the text ever, and store that. Then when you retrieve it, you can split by '\0'. There are of course other ways of going about this stuff, but those are dependent on your specific database vendor.

简单的回答:当且仅当您确定该列表将始终用作列表时,然后在您的末尾使用一个不会在列表中使用的字符(例如“\0”)将列表连接在一起文本永远,并存储。然后当你检索它时,你可以用'\0'分割。当然还有其他方法可以解决这些问题,但这些方法取决于您的特定数据库供应商。

As an example, you can store JSON in a Postgres database. If your list is text, and you just want the list without further hassle, that's a reasonable compromise.

例如,您可以将 JSON 存储在 Postgres 数据库中。如果您的列表是文本,而您只是希望列表不再麻烦,那么这是一个合理的妥协。

Others have ventured suggestions of serializing, but I don't really think that serializing is a good idea: Part of the neat thing about databases is that several programs written in different languages can talk to one another. And programs serialized using Java's format would not do all that well if a Lisp program wanted to load it.

其他人大胆提出了序列化的建议,但我并不认为序列化是一个好主意:关于数据库的部分巧妙之处在于,用不同语言编写的多个程序可以相互通信。如果 Lisp 程序想要加载它,那么使用 Java 格式序列化的程序不会做得那么好。

If you want a good way to do this sort of thing there are usually array-or-similar types available. Postgres for instance, offers array as a type, and lets you store an array of text, if that's what you want, and there are similar tricks for MySqland MS SQLusing JSON, and IBM's DB2offer an array type as well (in their own helpfuldocumentation). This would not be so common if there wasn't a need for this.

如果你想要一个好的方法来做这种事情,通常有可用的数组或类似类型。例如,Postgres 提供数组作为类型,并允许您存储文本数组,如果这是您想要的,并且使用 JSON 的MySqlMS SQL也有类似的技巧,IBM 的 DB2 也提供了一个数组类型(在他们的自己有用的文档)。如果不需要这个,这将不会如此普遍。

What you do lose by going that road is the notion of the list as a bunch of things in sequence. At least nominally, databases treat fields as single values. But if that's all you want, then you should go for it. It's a value judgement you have to make for yourself.

走那条路你失去的是列表的概念,即一系列按顺序排列的东西。至少在名义上,数据库将字段视为单个值。但如果这就是你想要的,那么你应该去争取。这是你必须为自己做出的价值判断。

回答by Thomas

In addition to what everyone else has said, I would suggest you analyze your approach in longer terms than just now. It is currentlythe case that items are unique. It is currentlythe case that resorting the items would require a new list. It is almost required that the list are currentlyshort. Even though I don't have the domain specifics, it is not much of a stretch to think those requirements could change. If you serialize your list, you are baking in an inflexibility that is not necessary in a more-normalized design. Btw, that does not necessarily mean a full Many:Many relationship. You could just have a single child table with a foreign key to the parent and a character column for the item.

除了其他人所说的之外,我建议您比现在更长远地分析您的方法。这是目前的项目是独一无二的情况。这是目前是诉诸项目将需要一个新的列表的情况。几乎要求该列表目前很短。即使我没有域的详细信息,认为这些要求可能会发生变化也不是一件容易的事。如果你序列化你的列表,你就会陷入一种在更规范化的设计中没有必要的不灵活性。顺便说一句,这并不一定意味着完整的 Many:Many 关系。您可以只有一个子表,其中包含一个指向父项的外键和一个用于该项目的字符列。

If you still want to go down this road of serializing the list, you might consider storing the list in XML. Some databases such as SQL Server even have an XML data type. The only reason I'd suggest XML is that almost by definition, this list needs to be short. If the list is long, then serializing it in general is an awful approach. If you go the CSV route, you need to account for the values containing the delimiter which means you are compelled to use quoted identifiers. Persuming that the lists are short, it probably will not make much difference whether you use CSV or XML.

如果您仍想沿着序列化列表的道路走下去,您可以考虑将列表存储在 XML 中。某些数据库(例如 SQL Server)甚至具有 XML 数据类型。我建议使用 XML 的唯一原因几乎是根据定义,这个列表需要很短。如果列表很长,那么通常对其进行序列化是一种糟糕的方法。如果您使用 CSV 路由,则需要考虑包含分隔符的值,这意味着您必须使用带引号的标识符。假设列表很短,使用 CSV 或 XML 可能不会有太大区别。

回答by hometoast

If you need to query on the list, then store it in a table.

如果需要对列表进行查询,则将其存储在表中。

If you always want the list, you could store it as a delimited list in a column. Even in this case, unless you have VERY specific reasons not to, store it in a lookup table.

如果您总是想要该列表,则可以将其存储为列中的分隔列表。即使在这种情况下,除非您有非常具体的理由不这样做,否则请将其存储在查找表中。

回答by David Neale

I'd just store it as CSV, if it's simple values then it should be all you need (XML is very verbose and serializing to/from it would probably be overkill but that would be an option as well).

我只是将它存储为 CSV,如果它是简单的值,那么它应该就是你所需要的(XML 非常冗长,序列化到/从它可能会矫枉过正,但这也是一种选择)。

Here's a good answerfor how to pull out CSVs with LINQ.

这是关于如何使用 LINQ 提取 CSV的一个很好的答案

回答by Alsin

Only one option doesn't mentioned in the answers. You can de-normalize your DB design. So you need two tables. One table contains proper list, one item per row, another table contains whole list in one column (coma-separated, for example).

答案中只有一个选项没有提到。您可以对数据库设计进行反规范化。所以你需要两张桌子。一个表包含正确的列表,每行一个项目,另一个表包含一列中的整个列表(例如,以昏迷分隔)。

Here it is 'traditional' DB design:

这是“传统”数据库设计:

List(ListID, ListName) 
Item(ItemID,ItemName) 
List_Item(ListID, ItemID, SortOrder)

Here it is de-normalized table:

这是非规范化表:

Lists(ListID, ListContent)

The idea here - you maintain Lists table using triggers or application code. Every time you modify List_Item content, appropriate rows in Lists get updated automatically. If you mostly read lists it could work quite fine. Pros - you can read lists in one statement. Cons - updates take more time and efforts.

这里的想法 - 您使用触发器或应用程序代码维护 Lists 表。每次修改 List_Item 内容时,列表中的相应行都会自动更新。如果您主要阅读列表,它可以工作得很好。优点 - 您可以在一个语句中阅读列表。缺点 - 更新需要更多的时间和精力。

回答by David Daniel

If you really wanted to store it in a column and have it queryable a lot of databases support XML now. If not querying you can store them as comma separated values and parse them out with a function when you need them separated. I agree with everyone else though if you are looking to use a relational database a big part of normalization is the separating of data like that. I am not saying that all data fits a relational database though. You could always look into other types of databases if a lot of your data doesn't fit the model.

如果您真的想将其存储在列中并使其可查询,那么现在很多数据库都支持 XML。如果不查询,您可以将它们存储为逗号分隔值,并在需要将它们分开时使用函数解析它们。我同意其他人的看法,但如果您希望使用关系数据库,规范化的很大一部分就是像这样分离数据。我并不是说所有数据都适合关系数据库。如果您的很多数据不适合该模型,您总是可以查看其他类型的数据库。

回答by Antonin GAVREL

I was very reluctant to choose the path I finally decide to take because of many answers. While they add more understanding to what is SQL and its principles, I decided to become an outlaw. I was also hesitant to post my findings as for some it's more important to vent frustration to someone breaking the rules rather than understanding that there are very few universal truthes.

由于很多答案,我非常不愿意选择我最终决定走的路。虽然他们增加了对什么是 SQL 及其原则的更多理解,但我决定成为一名不法分子。我也犹豫是否要发布我的发现,因为对于某些人来说,向违反规则的人发泄沮丧比理解普遍真理非常少更重要。

I have tested it extensively and, in my specific case, it was way more efficient than both using array type (generously offered by PostgreSQL) or querying another table.

我已经对它进行了广泛的测试,在我的特定情况下,它比使用数组类型(由 PostgreSQL 慷慨提供)或查询另一个表更有效。

Here is my answer: I have successfully implemented a list into a single field in PostgreSQL, by making use of the fixed length of each item of the list. Let say each item is a color as an ARGB hex value, it means 8 char. So you can create your array of max 10 items by multiplying by the length of each item:

这是我的答案:通过利用列表中每个项目的固定长度,我已经成功地将列表实现到 PostgreSQL 中的单个字段中。假设每个项目都是一种颜色作为 ARGB 十六进制值,这意味着 8 个字符。因此,您可以通过乘以每个项目的长度来创建最多 10 个项目的数组:

ALTER product ADD color varchar(80)

In case your list items length differ you can always fill the padding with \0

如果您的列表项长度不同,您总是可以用 \0 填充填充

NB: Obviously this is not necessarily the best approach for hex number since a list of integers would consume less storage but this is just for the purpose of illustrating this idea of array by making use of a fixed length allocated to each item.

注意:显然这不一定是十六进制数的最佳方法,因为整数列表会消耗更少的存储空间,但这只是为了通过使用分配给每个项目的固定长度来说明数组的这种想法。

The reason why: 1/ Very convenient: retrieve item i at substring i*n, (i +1)*n. 2/ No overhead of cross tables queries. 3/ More efficient and cost-saving on the server side. The list is like a mini blob that the client will have to split.

原因: 1/ 非常方便:在子串 i*n, (i +1)*n 处检索项目 i。2/ 没有跨表查询的开销。3/ 在服务器端更高效、更节省成本。该列表就像客户端必须拆分的迷你 blob。

While I respect people following rules, many explanations are very theoretical and often fail to acknowledge that, in some specific cases, especially when aiming for cost optimal with low-latency solutions, some minor tweaks are more than welcome.

虽然我尊重遵守规则的人,但许多解释都是非常理论化的,而且往往没有承认,在某些特定情况下,尤其是在以低延迟解决方案实现成本最优时,一些小的调整是非常受欢迎的。

"God forbid that it is violating some holy sacred principle of SQL": Adopting a more open-minded and pragmatic approach before reciting the rules is always the way to go. Else you might end up like a candid fanatic reciting the Three Laws of Roboticsbefore being obliterated by Skynet

“上帝保佑它违反了一些神圣的 SQL 原则”:在背诵规则之前采取更开放和务实的方法总是要走的路。否则在被天网抹杀之前,你可能会像一个坦率的狂热者一样背诵机器人三定律

I don't pretend that this solution is a breakthrough, nor that it is ideal in term of readability and database flexibility, but it can certainly give you an edge when it comes to latency.

我不认为这个解决方案是一个突破,也不认为它在可读性和数据库灵活性方面是理想的,但在延迟方面它肯定会给你带来优势。