SQL 实体属性值数据库与严格的关系模型电子商务
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/870808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Entity Attribute Value Database vs. strict Relational Model Ecommerce
提问by Zachary Scott
It is safe to say that the EAV/CRdatabase model is bad. That said,
可以肯定地说,EAV/CR数据库模型很糟糕。那说,
Question: What database model, technique, or pattern should be used to deal with "classes" of attributes describing e-commerce products which can be changed at run time?
问题:应该使用什么数据库模型、技术或模式来处理描述电子商务产品的属性“类”,这些属性可以在运行时更改?
In a good E-commerce database, you will store classes of options (like TV resolution then have a resolution for each TV, but the next product may not be a TV and not have "TV resolution"). How do you store them, search efficiently, and allow your users to setup product types with variable fields describing their products? If the search engine finds that customers typically search for TVs based on console depth, you could add console depth to your fields, then add a single depth for each tv product type at run time.
在一个好的电子商务数据库中,您将存储选项类别(例如电视分辨率然后为每台电视提供一个分辨率,但下一个产品可能不是电视并且没有“电视分辨率”)。您如何存储它们、高效搜索并允许您的用户使用描述其产品的可变字段来设置产品类型?如果搜索引擎发现客户通常根据控制台深度搜索电视,您可以将控制台深度添加到您的字段中,然后在运行时为每个电视产品类型添加一个深度。
There is a nice common feature among good e-commerce apps where they show a set of products, then have "drill down" side menus where you can see "TV Resolution" as a header, and the top five most common TV Resolutions for the found set. You click one and it only shows TVs of that resolution, allowing you to further drill down by selecting other categories on the side menu. These options would be the dynamic product attributes added at run time.
优秀的电子商务应用程序有一个很好的共同功能,它们显示一组产品,然后有“向下钻取”侧边菜单,您可以在其中看到“电视分辨率”作为标题,以及最常见的前五种电视分辨率发现集。您单击一个,它只会显示该分辨率的电视,允许您通过在侧边菜单上选择其他类别来进一步深入了解。这些选项将是在运行时添加的动态产品属性。
Further discussion:
进一步讨论:
So long story short, are there any links out on the Internet or model descriptions that could "academically" fix the following setup?I thank Noel Kennedy for suggesting a category table, but the need may be greater than that. I describe it a different way below, trying to highlight the significance. I may need a viewpoint correction to solve the problem, or I may need to go deeper in to the EAV/CR.
长话短说,互联网上是否有任何链接或模型描述可以“学术上”修复以下设置?我感谢 Noel Kennedy 提出了一个类别表,但需求可能不止于此。我在下面以不同的方式描述它,试图突出其重要性。我可能需要进行视点校正来解决问题,或者我可能需要更深入地研究 EAV/CR。
Love the positive response to the EAV/CR model. My fellow developers all say what Jeffrey Kemp touched on below: "new entities must be modeled and designed by a professional" (taken out of context, read his response below). The problem is:
喜欢对 EAV/CR 模型的积极响应。我的开发人员同事都说了 Jeffrey Kemp 在下面提到的内容:“新实体必须由专业人士建模和设计”(断章取意,阅读下面的回复)。问题是:
- entities add and remove attributes weekly
(search keywords dictate future attributes) - new entities arrive weekly
(products are assembled from parts) - old entities go away weekly
(archived, less popular, seasonal)
- 实体每周添加和删除属性
(搜索关键字决定未来的属性) - 新实体每周到达
(产品由零件组装而成) - 旧实体每周消失
(存档的、不受欢迎的、季节性的)
The customer wants to add attributes to the products for two reasons:
客户想要为产品添加属性有两个原因:
- department / keyword search / comparison chart between like products
- consumer product configuration before checkout
- 部门/关键词搜索/同类产品对比图
- 结账前的消费品配置
The attributes must have significance, not just a keyword search. If they want to compare all cakes that have a "whipped cream frosting", they can click cakes, click birthday theme, click whipped cream frosting, then check all cakes that are interesting knowing they all have whipped cream frosting. This is not specific to cakes, just an example.
属性必须有意义,而不仅仅是关键字搜索。如果他们想比较所有有“奶油糖霜”的蛋糕,他们可以点击蛋糕,点击生日主题,点击奶油糖霜,然后检查所有有趣的蛋糕,知道它们都有奶油糖霜。这不是特定于蛋糕的,只是一个例子。
采纳答案by Jeffrey Kemp
There's a few general pros and cons I can think of, there are situations where one is better than the other:
我能想到一些一般的利弊,在某些情况下,一个比另一个更好:
Option 1, EAV Model:
选项 1,EAV 型号:
- Pro: less time to design and develop a simple application
- Pro: new entities easy to add (might even be added by users?)
- Pro: "generic" interface components
- Con: complex code required to validate simple data types
- Con: much more complex SQL for simple reports
- Con: complex reports can become almost impossible
- Con: poor performance for large data sets
- 优点:设计和开发简单应用程序的时间更少
- 优点:易于添加的新实体(甚至可能由用户添加?)
- Pro:“通用”界面组件
- 缺点:验证简单数据类型所需的复杂代码
- 缺点:用于简单报告的更复杂的 SQL
- 缺点:复杂的报告几乎是不可能的
- 缺点:大型数据集性能不佳
Option 2, Modelling each entity separately:
选项 2,分别为每个实体建模:
- Con: more time required to gather requirements and design
- Con: new entities must be modelled and designed by a professional
- Con: custom interface components for each entity
- Pro: data type constraints and validation simple to implement
- Pro: SQL is easy to write, easy to understand and debug
- Pro: even the most complex reports are relatively simple
- Pro: best performance for large data sets
- 缺点:需要更多时间来收集需求和设计
- 缺点:新实体必须由专业人士建模和设计
- 缺点:每个实体的自定义界面组件
- 优点:数据类型约束和验证易于实现
- 优点:SQL 易于编写,易于理解和调试
- 优点:即使是最复杂的报表也相对简单
- Pro:大型数据集的最佳性能
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
选项 3,组合(模型实体“正确”,但为某些/所有实体的自定义属性添加“扩展”)
- Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
- Con: new entities must be modelled and designed by a professional
- Pro: new attributes might be easily added later on
- Con: complex code required to validate simple data types (for the custom attributes)
- Con: custom interface components still required, but generic interface components may be possible for the custom attributes
- Con: SQL becomes complex as soon as any custom attribute is included in a report
- Con: good performance generally, unless you start need to search by or report by the custom attributes
- 优点/缺点:收集需求和设计所需的时间比选项 1 多,但可能不如选项 2 *
- 缺点:新实体必须由专业人士建模和设计
- 优点:稍后可以轻松添加新属性
- 缺点:验证简单数据类型所需的复杂代码(用于自定义属性)
- 缺点:仍然需要自定义界面组件,但对于自定义属性,通用界面组件可能是可能的
- 缺点:只要报告中包含任何自定义属性,SQL 就会变得复杂
- 缺点:一般性能良好,除非您开始需要通过自定义属性进行搜索或报告
* I'm not sure if Option 3 would necessarily save any time in the design phase.
*我不确定选项 3 是否一定会在设计阶段节省任何时间。
Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.
就我个人而言,我倾向于选项 2,并尽可能避免使用 EAV。但是,对于某些场景,用户需要 EAV 带来的灵活性;但这需要付出很大的代价。
回答by Javier
It is safe to say that the EAV/CR database model is bad.
可以肯定地说,EAV/CR 数据库模型很糟糕。
No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.
不,这不对。只是它们对关系数据库的使用效率低下。纯粹的键/值存储非常适合这种模型。
Now, to your real question: How to store various attributes and keep them searchable?
现在,真正的问题是:如何存储各种属性并使其可搜索?
Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.
只需使用EAV。在您的情况下,它将是一个额外的表。在属性名称和值上对其进行索引,大多数 RDBM 会在属性名称重复上使用前缀压缩,使其非常快速和紧凑。
EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.
当您使用 EAV/CR 替换“真实”字段时,它会变得丑陋。与每个工具一样,过度使用它是“坏的”,并给它带来不好的形象。
回答by Vee
// At this point, I'd like to take a moment to speak to you about the Magento/Adobe PSD format. // Magento/PSDis not a good ecommerce platform/format. Magento/PSDis not even a bad ecommerce platform/format. Calling it such would be an // insult to other bad ecommerce platform/formats, such as Zencart or OsCommerce. No, Magento/PSDis an abysmal ecommerce platform/format. Having // worked on this code for several weeks now, my hate for Magento/PSDhas grown to a raging fire // that burns with the fierce passion of a million suns.
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...
内部模型充其量是古怪的,就像有人将模式放入一个boggle游戏,将其密封并将其放入油漆罐中......
Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.
现实世界:我正在开发一个中间件履行应用程序,这是获取地址信息的查询之一。
CREATE OR REPLACE VIEW sales_flat_addresses AS
SELECT sales_order_entity.parent_id AS order_id,
sales_order_entity.entity_id,
CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type,
GROUP_CONCAT(
CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value )
ORDER BY sales_order_entity_varchar.value DESC
SEPARATOR '!!!!!'
) as data
FROM sales_order_entity
INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id
INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id
AND sales_order_entity.entity_type_id =12
GROUP BY sales_order_entity.entity_id
ORDER BY eav_attribute.attribute_code = 'address_type'
Exacts address information for an order, lazily
订单的确切地址信息,懒惰
--
——
Summary:Only use Magento if:
总结:仅在以下情况下使用 Magento:
- You are being given large sacks of money
- You must
- Enjoy pain
- 你得到了一大袋钱
- 你必须
- 享受痛苦
回答by Lucas T
I'm surprised nobody mentioned NoSQL databases.
我很惊讶没有人提到 NoSQL 数据库。
I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".
我从未在生产环境中实践过 NoSQL(刚刚测试了 MongoDB 并印象深刻),但 NoSQL 的全部意义在于能够在同一个“文档”中保存具有不同属性的项目。
回答by Jerry Jasperson
Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.
在性能不是主要要求的情况下,如在 ETL 类型的应用程序中,EAV 具有另一个明显的优势:差异化保存。
I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.
我已经实现了许多应用程序,其中一个首要要求是能够查看域对象从其第一个“版本”到当前状态的历史记录。如果该域对象具有大量属性,则意味着每次更改都需要将新行插入到其对应的表中(不是更新,因为历史会丢失,而是插入)。假设这个域对象是一个 Person,我有 500k 个 Person 需要跟踪,在 Persons 生命周期中对各种属性的平均变化超过 100 次。再加上很少有应用程序只有 1 个主要域对象的事实,您很快就会推测数据库的大小会很快失控。
An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.
一个简单的解决方案是仅保存对主要域对象的差异更改,而不是重复保存冗余信息。
All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".
所有模型都会随着时间的推移而变化,以反映新的业务需求。时期。使用 EAV 只是我们盒子里的工具之一;但它永远不应该被自动归类为“坏”。
回答by aaimnr
I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure): https://forum.virtuemart.net/index.php?topic=58686.0
我正在努力解决同样的问题。查看以下关于两个现有电子商务解决方案的讨论可能会很有趣:Magento (EAV) 和 Joomla(常规关系结构):https: //forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's EAV performance is a real showstopper.
看来,Magento 的 EAV 性能是一个真正的表演者。
That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.
这就是我倾向于标准化结构的原因。为了克服缺乏灵活性,我正在考虑在未来添加一些可以编辑的单独数据字典(XML 或单独的数据库表),并基于此,用于显示和比较具有新属性集的产品类别的应用程序代码将是生成,连同 SQL 脚本。
Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.
在这种情况下,这种架构似乎是最佳选择——同时具有灵活性和高性能。
The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.
问题可能是在实时环境中频繁使用 ALTER TABLE。我正在使用 Postgres,因此它的 MVCC 和事务性 DDL 有望减轻痛苦。
回答by Amanda Xu
I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.
我仍然投票支持在 EAV 的最低意义原子级别建模。让面向特定用户社区的标准、技术和应用程序来决定内容模型、属性的重复需求、谷物等。
回答by bob
If it's just about the product catalog attributes and hence validation requirements for those attributes are rather limited, the only real downside to EAV is query performance and even that is only a problem when your query deals with multiple "things" (products) with attributes, the performance for the query "give me all attributes for the product with id 234" while not optimal is still plenty fast.
如果它只是关于产品目录属性,因此对这些属性的验证要求相当有限,那么 EAV 唯一真正的缺点是查询性能,即使当您的查询处理多个具有属性的“事物”(产品)时,这也只是一个问题,查询“为我提供 ID 为 234 的产品的所有属性”的性能虽然不是最佳的,但仍然非常快。
One solution is to use the SQL database / EAV model only for the admin / edit side of the product catalog and have some process that denormalizes the products into something that makes it searchable. Since you already have attributes and hence it's rather likely that you want faceting, this something could be Solr or ElasticSearch. This approach avoids basically all downsides to the EAV model and the added complexity is limited to serializing a complete product to JSON on update.
一种解决方案是仅将 SQL 数据库/EAV 模型用于产品目录的管理/编辑端,并通过一些过程将产品非规范化为可搜索的内容。由于您已经拥有属性,因此很可能您想要分面,这可能是 Solr 或 ElasticSearch。这种方法基本上避免了 EAV 模型的所有缺点,并且增加的复杂性仅限于在更新时将完整产品序列化为 JSON。
回答by Gabriel Voinea
EAV has many drawbacks:
EAV 有很多缺点:
- Performance degradation over time Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
- The SQL queries are very complex and difficult to write.
- Data Integrity problems. You can't define foreign keys for all the fields needed.
- You have to define and maintain your own metadata.
- 随着时间的推移,性能下降 一旦应用程序中的数据量增长超过特定大小,该数据的检索和操作可能会变得越来越低效。
- SQL 查询非常复杂且难以编写。
- 数据完整性问题。您不能为所有需要的字段定义外键。
- 您必须定义和维护自己的元数据。
回答by z0r
I have a slightly different problem: instead of many attributes with sparse values (which is possibly a good reason to use EAV), I want to store something more like a spreadsheet. The columns in the sheet can change, but within a sheet all cells will contain data (not sparse).
我有一个稍微不同的问题:我想存储更像电子表格的东西,而不是许多具有稀疏值的属性(这可能是使用 EAV 的一个很好的理由)。工作表中的列可以更改,但在工作表内所有单元格都将包含数据(不是稀疏的)。
I made a small set of teststo benchmark two designs: one using EAV, and the other using a Postgres ARRAY to store cell data.
我做了一小组测试来对两种设计进行基准测试:一种使用 EAV,另一种使用 Postgres ARRAY 来存储单元数据。
Both schemas have indexes on appropriate columns, and the indexes are used by the planner.
两种模式在适当的列上都有索引,并且这些索引由规划器使用。
It turned out the array-based schema was an order of magnitude fasterfor both inserts and queries. From quick tests, it seemed that both scaled linearly. The tests aren't very thorough, though. Suggestions and forks welcome - they're under an MIT licence.
事实证明,基于数组的模式对于插入和查询都快了一个数量级。从快速测试来看,两者似乎都是线性缩放的。不过,测试不是很彻底。欢迎提出建议和分叉——他们获得了麻省理工学院的许可。