database 在数据库中存储电子邮件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/65001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Storing email messages in a database
提问by Alister Bulman
What sort of database schema would you use to store email messages, with as much header information as practical/possible, into a database?
您将使用什么样的数据库模式将带有尽可能多的标题信息的电子邮件消息存储到数据库中?
Assume that they have been fed into a script from the MTA and parsed into the relevant headers/body/attachments.
假设它们已从 MTA 输入脚本并解析为相关的标题/正文/附件。
Would you store the message body whole in the database table, or split any MIME-parts apart? What about attachments?
您会将消息正文整个存储在数据库表中,还是将任何 MIME 部分分开?附件呢?
回答by Milen A. Radev
You may want to check the architectureand the DB schemaof "Archiveopteryx".
回答by Chase Seibert
You may want to use a schema where the message body and attachment records can be shared between multiple recipients on the message. It's not uncommon to see email servers where fully 50% of the disk storage is used by duplicate emails.
您可能希望使用一种架构,其中消息正文和附件记录可以在消息的多个收件人之间共享。看到电子邮件服务器中 50% 的磁盘存储空间被重复电子邮件使用的情况并不少见。
A simple hash of the body/attachment would be enough to see if that record was already in the database. However, you would still need to keep separate headers.
正文/附件的简单散列就足以查看该记录是否已经在数据库中。但是,您仍然需要保留单独的标题。
回答by ceejayoz
Depends on what you're going to be doing with it. If you're going to need to do frequent searching against certain bits of it, you'll want to break it up in a way that makes sense for your usage case. If it's just for something like storage of e-mail for Sarbanes-Oxley compliance, you'd probably be okay storing the whole thing - headers, parts, etc. - as one big text field.
取决于你要用它做什么。如果您需要对它的某些部分进行频繁的搜索,您需要以对您的用例有意义的方式将其分解。如果它只是为了满足 Sarbanes-Oxley 合规性而存储电子邮件之类的东西,那么您可能可以将整个内容(标题、部分等)存储为一个大文本字段。
回答by Ivan Bosnic
Suggestion: create a well defined table for storing e-mail with a column for each relevant part of a message: sender, header, subject, body. It is going to be much simpler later if you want to query, for example, by subject field. In the same table you can define a field to keep the path of a attachment and store the attached file on the file system, rather than storing it in blob fields.
建议:创建一个定义明确的表来存储电子邮件,其中包含消息的每个相关部分的列:发件人、标题、主题、正文。如果您想查询,例如,按主题字段,稍后会简单得多。在同一个表中,您可以定义一个字段来保留附件的路径并将附件存储在文件系统上,而不是将其存储在 blob 字段中。
回答by Gareth Rees
An important step in database schema design is to figure out what types of entity you want to model. For this application the entities might be:
数据库模式设计中的一个重要步骤是确定要建模的实体类型。对于此应用程序,实体可能是:
- Messages
- E-mail addresses
- Conversation threads (perhaps: if you want to do efficient threading)
- Attachments (perhaps: as suggested in other answers)
- ...
- 留言
- 电子邮件地址
- 对话线程(也许:如果你想做高效的线程)
- 附件(也许:如其他答案中所建议)
- ...
Once you know the entities, you can identify relationships between entities, which can be represented by tables:
一旦知道实体,就可以识别实体之间的关系,可以用表格表示:
- Messages have a many-many relationship to messages (
In-Reply-To
andReferences
headers). - Messages have a many-many relationship to e-mail addresses (
From
,To
,Cc
etc headers). - Messages have a many-one relationship with threads.
- Messages have a many-many relationship with attachments.
- ...
- 消息与消息(
In-Reply-To
和References
标题)之间存在多对多关系。 - 消息必须的电子邮件地址(一个多对多的关系
From
,To
,Cc
等头)。 - 消息与线程具有多一关系。
- 消息与附件之间存在多对多关系。
- ...
回答by Kluge
You'll probably want to at least store attachments separately to optimize storage. It's astonishing to see the size and quantity of attachments (videos, etc.) that most users unhesitatingly attach to emails.
您可能希望至少单独存储附件以优化存储。看到大多数用户毫不犹豫地附加到电子邮件中的附件(视频等)的大小和数量令人惊讶。
In the case of outgoing emails you may have multiple emails sending the same attachment. It's far more efficient to store a single copy of the attachment that is referenced by all emails that share it.
在外发电子邮件的情况下,您可能有多封电子邮件发送相同的附件。存储共享附件的所有电子邮件引用的单个附件副本的效率要高得多。
Another reason for storing attachments separately is that it gives you some archiving options later on. Should storage space become an issue, you can always go back and delete large attachments older than a given date in order to compact the database.
单独存储附件的另一个原因是它稍后为您提供了一些存档选项。如果存储空间成为问题,您可以随时返回并删除早于给定日期的大附件,以压缩数据库。
回答by Sklivvz
It all depends on what you want to do with the data, but in general I would want to store all data and also make sure that the semantics interpreted by the MUA are preserved in the db, so for example: - All headers that are parsed should have their own column - A column should contain the whole headers - The attachments (including body, multipart) should be in a many to one table with the email table.
这一切都取决于您想对数据做什么,但一般来说,我希望存储所有数据,并确保 MUA 解释的语义保留在数据库中,例如: - 解析的所有标头应该有自己的列 - 一列应该包含整个标题 - 附件(包括正文、多部分)应该与电子邮件表位于多对一的表中。
回答by Charles Graham
If it is already split up, and you can be sure that the routine to split the data is sound, then I would split up the table as granular as possible. You can always parse it back together in your middle tier. If space is not an issue, you could always store it twice. One, split up into the relevant fields, and another field that has the whole thing as one blob, if putting it back together is hard.
如果它已经被拆分,并且您可以确定拆分数据的例程是合理的,那么我会尽可能细化地拆分表格。您始终可以在中间层将其重新解析。如果空间不是问题,您可以随时存储两次。一个,拆分为相关领域,另一个领域将整个事物作为一个 blob,如果将其重新组合起来很困难。
回答by Allan Wind
It is not trivial to parse an email, so consider storing the email as a blob then parse it into whatever pieces you need afterwards.
解析电子邮件并非易事,因此请考虑将电子邮件存储为 blob,然后将其解析为您需要的任何片段。
/Allan
/艾伦