SQL 调查的数据库设计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1764435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:29:30  来源:igfitidea点击:

Database design for a survey

sqldatabase-design

提问by Michael

I need to create a survey where answers are stored in a database. I'm just wondering what would be the best way to implement this in the database, specifically the tables required. The survey contains different types of questions. For example: text fields for comments, multiple choice questions, and possibly questions that could contain more than one answer (i.e. check all that apply).

我需要创建一个调查,其中答案存储在数据库中。我只是想知道在数据库中实现这一点的最佳方法是什么,特别是所需的表。该调查包含不同类型的问题。例如:用于评论的文本字段、多项选择题,以及可能包含多个答案的问题(即勾选所有适用项)。

I've come up with two possible solutions:

我想出了两种可能的解决方案:

  1. Create a giant table which contains the answers for each survey submission. Each column would correspond to an answer from the survey. i.e. SurveyID, Answer1, Answer2, Answer3

    I don't think this is the best way since there are a lot of questions in this survey and doesn't seem very flexible if the survey is to change.

  2. The other thing I thought of was creating a Question table and Answer table. The question table would contain all the questions for the survey. The answer table would contain individual answers from the survey, each row linked to a question.

    A simple example:

    tblSurvey: SurveyID

    tblQuestion: QuestionID, SurveyID, QuestionType, Question

    tblAnswer: AnswerID, UserID, QuestionID, Answer

    tblUser: UserID, UserName

    My problem with this is that there could be tons of answers which would make the Answer table pretty huge. I'm not sure that's so great when it comes to performance.

  1. 创建一个巨大的表格,其中包含每个调查提交的答案。每列将对应于调查中的一个答案。即 SurveyID、Answer1、Answer2、Answer3

    我不认为这是最好的方法,因为这个调查中有很多问题,如果调查要改变,似乎不是很灵活。

  2. 我想到的另一件事是创建一个问题表和答案表。问题表将包含调查的所有问题。答案表将包含调查中的单个答案,每一行都链接到一个问题。

    一个简单的例子:

    tblSurvey: 调查ID

    tblQuestion: QuestionID, SurveyID, QuestionType, Question

    tblAnswer: AnswerID, UserID, QuestionID, Answer

    tblUser: 用户 ID,用户名

    我的问题是,可能会有大量的答案,这会使答案表变得非常庞大。我不确定这在性能方面是否很棒。

I'd appreciate any ideas and suggestions.

我很感激任何想法和建议。

采纳答案by Damir Sudarevic

I think that your model #2 is fine, however you can take a look at the more complex model which stores questions and pre-made answers (offered answers) and allows them to be re-used in different surveys.

- One survey can have many questions; one question can be (re)used in many surveys.
- One (pre-made) answer can be offered for many questions. One question can have many answers offered. A question can have different answers offered in different surveys. An answer can be offered to different questions in different surveys. There is a default "Other" answer, if a person chooses other, her answer is recorded into Answer.OtherText.
- One person can participate in many surveys, one person can answer specific question in a survey only once.

我认为您的模型 #2 很好,但是您可以查看更复杂的模型,该模型存储问题和预先制定的答案(提供的答案),并允许它们在不同的调查中重复使用。

- 一项调查可以有很多问题;一个问题可以在许多调查中(重新)使用。
- 可以为许多问题提供一个(预制)答案。一个问题可以有很多答案。一个问题在不同的调查中可以有不同的答案。可以针对不同调查中的不同问题提供答案。有一个默认的“其他”答案,如果一个人选择其他,她的答案会被记录到 Answer.OtherText 中。
- 一个人可以参加多次调查,一个人只能回答一次调查中的特定问题。

survey_model_02

调查模型_02

回答by Michael Durrant

My design is shown below.

我的设计如下所示。

The latest create script is at https://gist.github.com/durrantm/1e618164fd4acf91e372

最新的创建脚本位于https://gist.github.com/durrantm/1e618164fd4acf91e372

The script and the mysql workbench.mwb file are also available at
https://github.com/durrantm/surveyenter image description here

脚本和 mysql workbench.mwb 文件也可从
https://github.com/durrantm/survey 获得enter image description here

回答by tplaner

Definitely option #2, also I think you might have an oversight in the current schema, you might want another table:

绝对是选项 #2,而且我认为您可能对当前模式有所疏忽,您可能需要另一个表:

+-----------+
| tblSurvey |
|-----------|
| SurveyId  |
+-----------+

+--------------+
| tblQuestion  |
|--------------|
| QuestionID   |
| SurveyID     |
| QuestionType |
| Question     |
+--------------+

+--------------+
| tblAnswer    |
|--------------|
| AnswerID     |
| QuestionID   |
| Answer       |
+--------------+

+------------------+
| tblUsersAnswer   |
|------------------|
| UserAnswerID     |
| AnswerID         |
| UserID           |
| Response         |
+------------------+

+-----------+
| tblUser   |
|-----------|
| UserID    |
| UserName  |
+-----------+

Each question is going to probably have a set number of answers which the user can select from, then the actual responses are going to be tracked in another table.

每个问题可能都有一组用户可以从中选择的答案,然后将在另一个表中跟踪实际响应。

Databases are designed to store a lot of data, and most scale very well. There is no real need to user a lesser normal formsimply to save on space anymore.

数据库旨在存储大量数据,并且大多数都可以很好地扩展。没有真正需要用户使用较不规范的形式只是为了节省空间。

回答by Ryan Brunner

As a general rule, modifying schema based on something that a user could change (such as adding a question to a survey) should be considered fairly smelly. There's cases where it can be appropriate, particularly when dealing with large amounts of data, but know what you're getting into before you dive in. Having just a "responses" table for each survey means that adding or removing questions is potentially very costly, and it's very difficult to do analytics in a question-agnostic way.

作为一般规则,基于用户可以更改的内容(例如向调查添加问题)修改架构应该被认为是相当糟糕的。在某些情况下,它可能是合适的,尤其是在处理大量数据时,但在深入研究之前要知道自己在做什么。每个调查只有一个“响应”表意味着添加或删除问题可能会非常昂贵,而且很难以与问题无关的方式进行分析。

I think your second approach is best, but if you're certain you're going to have a lot of scale concerns, one thing that has worked for me in the past is a hybrid approach:

我认为你的第二种方法是最好的,但如果你确定你会有很多规模问题,过去对我有用的一件事是混合方法:

  1. Create detailed response tables to store per-question responses as you've described in 2. This data would generally not be directly queried from your application, but would be used for generating summary data for reporting tables. You'd probably also want to implement some form of archiving or expunging for this data.
  2. Also create the responses table from 1 if necessary. This can be used whenever users want to see a simple table for results.
  3. For any analytics that need to be done for reporting purposes, schedule jobs to create additional summary data based on the data from 1.
  1. 创建详细的响应表来存储您在 2 中描述的每个问题的响应。这些数据通常不会直接从您的应用程序中查询,但会用于生成报告表的汇总数据。您可能还想对这些数据实施某种形式的归档或清除。
  2. 如有必要,还可以从 1 创建响应表。只要用户想要查看简单的结果表,就可以使用它。
  3. 对于需要为报告目的进行的任何分析,请安排作业以根据 1.x 中的数据创建其他汇总数据。

This is absolutely a lot more work to implement, so I really wouldn't advise this unless you know for certain that this table is going to run into massive scale concerns.

这绝对需要更多的工作来实现,所以我真的不建议这样做,除非你确定这个表会遇到大规模的问题。

回答by Shiraz Bhaiji

The second approach is best.

第二种方法最好。

If you want to normalize it further you could create a table for question types

如果你想进一步规范化,你可以为问题类型创建一个表

The simple things to do are:

要做的简单的事情是:

  • Place the database and log on their own disk, not all on C as default
  • Create the database as large as needed so you do not have pauses while the database grows
  • 将数据库和登录放在自己的磁盘上,而不是默认都在C上
  • 根据需要创建尽可能大的数据库,以便在数据库增长时不会出现停顿

We have had log tables in SQL Server Table with 10's of millions rows.

我们在 SQL Server 表中有数百万行的日志表。

回答by kevchadders

No 2 looks fine.

没有 2 看起来不错。

For a table with only 4 columns it shouldn't be a problem, even with a good few million rows. Of course this can depend on what database you are using. If its something like SQL Server then it would be no problem.

对于只有 4 列的表,即使有几百万行也不会有问题。当然,这取决于您使用的数据库。如果它像 SQL Server 这样的东西,那就没问题了。

You'd probably want to create an index on the QuestionID field, on the tblAnswer table.

您可能希望在 tblAnswer 表的 QuestionID 字段上创建索引。

Of course, you need to specify what Database you are using as well as estimated volumes.

当然,您需要指定您使用的数据库以及估计的数量。

回答by Dave Swersky

Having a large Answer table, in and of itself, is not a problem. As long as the indexes and constraints are well defined you should be fine. Your second schema looks good to me.

拥有一个大的答案表本身不是问题。只要索引和约束定义良好,你应该没问题。你的第二个模式对我来说看起来不错。

回答by Jorge Córdoba

Given the proper index your second solution is normalized and good for a traditional relational database system.

给定正确的索引,您的第二个解决方案是规范化的,适用于传统的关系数据库系统。

I don't know how huge is huge but it should hold without problem a couple million answers.

我不知道有多大是巨大的,但它应该可以毫无问题地容纳几百万个答案。

回答by Ben Fransen

Looks pretty complete for a smiple survey. Don't forget to add a table for 'open values', where a customer can provide his opinion via a textbox. Link that table with a foreign key to your answer and place indexes on all your relational columns for performance.

一个简单的调查看起来很完整。不要忘记为“开放价值”添加一个表格,客户可以在其中通过文本框提供他的意见。将该表与外键链接到您的答案,并在所有关系列上放置索引以提高性能。

回答by Larry Lustig

Number 2 is correct. Use the correct design until and unless you detect a performance problem. Most RDBMS will not have a problem with a narrow but very long table.

数字 2 是正确的。使用正确的设计,直到并且除非您检测到性能问题。大多数 RDBMS 不会有一个狭窄但很长的表的问题。