MySQL:很多表还是很多数据库?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/696682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL: Many tables or many databases?
提问by TheHippo
For a project we having a bunch of data that always have the same structure and is not linked together. There are two approaches to save the data:
对于一个项目,我们有一堆数据总是具有相同的结构并且没有链接在一起。有两种方法可以保存数据:
- Creating a new database for every pool (about 15-25 tables)
- Creating all the tables in one database and differ the pools by table names.
- 为每个池创建一个新数据库(大约 15-25 个表)
- 在一个数据库中创建所有表并按表名区分池。
Which one is easier and faster to handle for MySQL?
对于 MySQL,哪一个更容易、更快速地处理?
EDIT:I am not interessed in issues of database design, I am just interessed in which of the two possibilities is faster.
编辑:我对数据库设计问题不感兴趣,我只是对两种可能性中的哪一种更快感兴趣。
EDIT 2:I will try to make it more clear. As said we will have data, where some of the date rarely belongs together in different pools. Putting all the data of one type in one table and linking it with a pool id is not a good idea:
编辑 2:我会尽量让它更清楚。如前所述,我们将拥有数据,其中一些日期很少属于不同的池。将一种类型的所有数据放在一张表中并将其与池 ID 链接起来并不是一个好主意:
- It is hard to backup/delete a specific pool (and we expect that we are running out primary keys after a while (even when use big int))
- 很难备份/删除特定的池(我们预计一段时间后主键会用完(即使使用 big int))
So the idea is to make a database for every pool or create a lot of tables in one database. 50% of the queries against the database will be simple inserts
. 49% will be some simple selects
on a primary key.
所以这个想法是为每个池创建一个数据库或在一个数据库中创建很多表。50% 的数据库查询将是简单的inserts
。49% 将是一些简单selects
的主键。
The question is, what is faster to handle for MySQL
? Many tables or many databases?
问题是,处理什么更快MySQL
?许多表或许多数据库?
回答by Bill Karwin
There should be no significant performance difference between multiple tables in a single database versus multiple tables in separate databases.
单个数据库中的多个表与单独数据库中的多个表之间应该没有显着的性能差异。
In MySQL, databases (standard SQL uses the term "schema" for this) serve chiefly as a namespace for tables. A database has only a few attributes, e.g. the default character set and collation. And that usage of GRANT
makes it convenient to control access privileges per database, but that has nothing to do with performance.
在 MySQL 中,数据库(标准 SQL 为此使用术语“模式”)主要用作表的命名空间。数据库只有几个属性,例如默认字符集和排序规则。并且这种用法GRANT
可以方便地控制每个数据库的访问权限,但这与性能无关。
You can access tables in any database from a single connection (provided they are managed by the same instance of MySQL Server). You just have to qualify the table name:
您可以通过单个连接访问任何数据库中的表(前提是它们由同一 MySQL 服务器实例管理)。您只需要限定表名:
SELECT * FROM database17.accounts_table;
This is purely a syntactical difference. It should have no effect on performance.
这纯粹是一种语法差异。它应该对性能没有影响。
Regarding storage, you can't organize tables into a file-per-database as @Chris speculates. With the MyISAM storage engine, you always have a file per table. With the InnoDB storage engine, you either have a single set of storage files that amalgamate all tables, or else you have a file per table (this is configured for the whole MySQL server, not per database). In either case, there's no performance advantage or disadvantage to creating the tables in a single database versus many databases.
关于存储,您不能像@Chris 推测的那样将表组织到每个数据库的文件中。使用 MyISAM 存储引擎,每个表总是有一个文件。使用 InnoDB 存储引擎,您要么拥有一组合并所有表的存储文件,要么每个表都有一个文件(这是为整个 MySQL 服务器配置的,而不是每个数据库)。在任何一种情况下,与在多个数据库中创建表相比,在单个数据库中创建表都没有性能优势或劣势。
There aren't many MySQL configuration parameters that work per database. Most parameters that affect server performance are server-wide in scope.
没有多少 MySQL 配置参数适用于每个数据库。大多数影响服务器性能的参数都是服务器范围的。
Regarding backups, you can specify a subset of tables as arguments to the mysqldump
command. It may be more convenient to back up logical sets of tables per database, without having to name all the tables on the command-line. But it should make no difference to performance, only convenience for you as you enter the backup command.
关于备份,您可以指定表的子集作为mysqldump
命令的参数。备份每个数据库的逻辑表集可能更方便,而不必在命令行上命名所有表。但这对性能应该没有影响,只是在您输入备份命令时方便您。
回答by TheTXI
Why not create a single table to keep track of your pools (with a PoolID and PoolName as you columns, and whatever else you want to track) and then on your 15-25 tables you would add a column on all of them which would be a foreign key back to you pool table so you know which pool that particular record belongs to.
为什么不创建一个单独的表来跟踪您的池(使用 PoolID 和 PoolName 作为您的列,以及您想要跟踪的任何其他内容),然后在您的 15-25 个表上添加一列,这将是返回到池表的外键,以便您知道该特定记录属于哪个池。
If you don't want to mix the data like that, I would suggest making multiple databases. Creating multiple tables all for the same functionality makes my spider sense tingle.
如果您不想像那样混合数据,我建议您制作多个数据库。为相同的功能创建多个表让我的蜘蛛感觉刺痛。
回答by Matthew Farwell
If you don't want one set of tables with poolID poolname as TheTXI suggested, use separate databases rather than multiple tables that all do the same thing.
如果您不想要一组带有 TheTXI 建议的 poolID poolname 的表,请使用单独的数据库而不是多个表都做同样的事情。
That way, you restrict the variation between the accessing of different pools to the initial "use database" statement, you won't have to recode your SELECTs each time, or have dynamic sql.
这样,您将访问不同池之间的变化限制在初始“使用数据库”语句中,您不必每次都重新编码 SELECT,也不必使用动态 sql。
The other advantages of this approach are:
这种方法的其他优点是:
- Easy backup/restore
- Easy start/stop of a database instance.
- 轻松备份/恢复
- 轻松启动/停止数据库实例。
Disadvantages are:
缺点是:
- a little bit more admin work, but not much.
- 多一点管理工作,但不多。
I don't know what your application is, but really really think carefully before creating all of the tables in one database. That way madness lies.
我不知道你的应用程序是什么,但在一个数据库中创建所有表之前真的要仔细考虑。疯狂就是这样说的。
Edit: If performance is the only thing that concerns you, you need to measure it. Take a representative set of queries and measure their performance.
编辑:如果性能是你唯一关心的事情,你需要衡量它。选取一组具有代表性的查询并衡量它们的性能。
Edit 2: The difference in performance for a single query between the many tables/many databases model will be neglible. If you have one database, you can tune the hell out of it. If you have many databases, you can tune the hell out of all of them.
编辑 2:多表/多数据库模型之间单个查询的性能差异可以忽略不计。如果你有一个数据库,你可以完全摆脱它。如果你有很多数据库,你可以把它们都调出来。
My (our? - can't speak for anyone else) point is that, for well tuned database(s), there will be practically no difference in performance between the three options (poolid in table, multiple tables, multiple databases), so you can pick the option which is easiest for you, in the short AND long term.
我的(我们的? - 不能代表其他任何人)的观点是,对于调整良好的数据库,三个选项(表中的池ID、多个表、多个数据库)之间的性能几乎没有区别,所以您可以选择对您来说最简单的选项,无论是短期还是长期。
For me, the best option is still one database with poolId, as TheTXI suggested, then multiple databases, depending upon your (mostly administration) needs. If you need to know exactly what the difference in performance is between two options, we can't give you that answer. You need to set it up and test it.
对我来说,最好的选择仍然是一个带有 poolId 的数据库,正如 TheTXI 建议的那样,然后是多个数据库,这取决于您的(主要是管理)需要。如果您需要确切了解两个选项之间的性能差异,我们无法为您提供答案。您需要对其进行设置和测试。
With multiple databases, it becomes easy to throw hardware at it to improve performance.
有了多个数据库,就可以轻松地将硬件投入其中以提高性能。
回答by chaos
In the situation you describe, experience has led me to believe that you'll find the separate databases to be faster when you have a large number of pools.
在您描述的情况下,经验让我相信,当您拥有大量池时,您会发现单独的数据库会更快。
There's a really important general principle to observe here, though: Don't think about how fast it'll be, profile it.
不过,这里有一个非常重要的一般原则需要遵守:不要考虑它会有多快,要对其进行概要分析。
回答by Josh Smeaton
I'm not too sure I completely understand your scenario. Do you want to have all the pools using the same tables, but just differing by a distinguishing key? Or do you want separate pools of tables within the one database, with a suffix on each table to distinguish the pools?
我不太确定我完全理解你的情况。您是否希望所有池都使用相同的表,但只是有一个不同的键?或者您是否希望在一个数据库中使用单独的表池,并在每个表上添加一个后缀来区分这些池?
Either way though, you should have multiple databases for two major reasons. The first being if you have to change the schema on one pool, it won't affect the others.
无论哪种方式,出于两个主要原因,您都应该拥有多个数据库。第一个是如果您必须更改一个池上的架构,它不会影响其他池。
The second, if your load goes up (or for any other reason), you may want to move the pools onto separate physical machines with new database servers.
第二,如果您的负载增加(或出于任何其他原因),您可能希望将池移动到具有新数据库服务器的单独物理机器上。
Also, security access to a database server can be more tightly locked down.
此外,可以更严格地锁定对数据库服务器的安全访问。
All of these things can still be accomplished without requiring separate databases - but the separation will make all of this easier and reduce the complexity of having to mentally track which tables you want to operate on.
所有这些事情仍然可以在不需要单独的数据库的情况下完成 - 但分离将使所有这一切变得更容易,并降低必须在心理上跟踪要操作的表的复杂性。
回答by Brent Baisley
Differing the pools by table name or putting them in separate databases is about the same thing. However, if you have lots of tables in one database, MySQL has to load the table information and do a security check on all those tables when logging in/connecting.
按表名区分池或将它们放在单独的数据库中是大致相同的事情。但是,如果一个数据库中有很多表,MySQL 必须在登录/连接时加载表信息并对所有这些表进行安全检查。
As others mentioned, separate databases will allow you to shift things around and create optimizations specific to a certain pool (i.e. compressed tables). It is extra admin overhead, but there is considerably more flexibility.
正如其他人所提到的,单独的数据库将允许您改变事物并创建特定于某个池(即压缩表)的优化。这是额外的管理开销,但具有更大的灵活性。
Additionally, you can always "pool" the tables that are in separate databases by using federated or merge tables to simplify querying if needed.
此外,如果需要,您始终可以通过使用联合表或合并表来“汇集”不同数据库中的表以简化查询。
As for running out of primary keys, you could always use a compound primary key if you are using MyISAM tables. For example, if you have a field called groupCode (any type) and another called sequenceId (auto increment) and create your primary key as groupCode+sequenceId. The sequenceId will increment based on the next unique ID within the group code set. For example: AAA 1 AAA 2 BBB 1 AAA 3 CCC 1 AAA 4 BBB 2 ...
至于主键用完,如果您使用的是 MyISAM 表,则始终可以使用复合主键。例如,如果您有一个名为 groupCode(任何类型)的字段和另一个名为 sequenceId(自动递增)的字段,并将主键创建为 groupCode+sequenceId。sequenceId 将根据组代码集中的下一个唯一 ID 递增。例如:AAA 1 AAA 2 BBB 1 AAA 3 CCC 1 AAA 4 BBB 2 ...
Although with large tables you have to be careful about caching and make sure the file system you are using handles large files.
尽管对于大表,您必须小心缓存并确保您使用的文件系统处理大文件。
回答by Chris Shaffer
I don't know mysql very well, but I think I'll have to give the standard performance answer -- "It depends".
我不太了解 mysql,但我想我必须给出标准的性能答案——“这取决于”。
Some thoughts (dealing only with performance/maintenance, not database design):
一些想法(仅处理性能/维护,而不是数据库设计):
- Creating a new database means a separate file (or files) in the file system. These files could then be put on different filesystems if performance of one needs to be separate from the others, etc.
- A new database will probably handle caching differently; eg. All tables in one DB is going to mean a shared cache for the DB, whereas splitting the tables into separate databases means each database can have a separate cache [obviously all databases will share the same physical memory for cache, but there may be a limit per database, etc].
- Related to the separate files, this means that if one of your datasets becomes more important than the others, it can easily be pulled off to a new server.
- Separating the databases has an added benefit of allowing you to deploy updates one-at-a-time more easily than with the single database.
- 创建新数据库意味着文件系统中的一个单独文件(或多个文件)。如果一个文件的性能需要与其他文件分开等,这些文件可以放在不同的文件系统上。
- 一个新的数据库可能会以不同的方式处理缓存;例如。一个数据库中的所有表将意味着该数据库的共享缓存,而将表拆分为单独的数据库意味着每个数据库可以有一个单独的缓存[显然所有数据库将共享相同的物理内存用于缓存,但可能有限制每个数据库等]。
- 与单独的文件相关,这意味着如果您的数据集之一变得比其他数据集更重要,则可以轻松将其拉到新服务器上。
- 与单个数据库相比,分离数据库还有一个额外的好处,即允许您更轻松地一次部署一个更新。
However, to contrast, having multiple databases means the server will probably be using more memory (since it has multiple caches). I'm sure there are more "cons" for the multi-database approach, but I am drawing a blank now.
但是,相比之下,拥有多个数据库意味着服务器可能会使用更多内存(因为它有多个缓存)。我确信多数据库方法有更多的“缺点”,但我现在正在画一个空白。
So I suppose I would recommend the multi-database approach. Obviously this is only with the understanding that there may very well be a better "database-designy" way of handling whatever you are actually doing.
所以我想我会推荐多数据库方法。显然,这只是在理解可能有更好的“数据库设计”方式来处理您实际正在做的事情的情况下。
回答by aronchick
Given the restrictions you've placed on it, I'd rather spin up more tables in the existing database, rather than having to connect to multiple databases. Managing connection strings TEND to be harder, in addition to managing the different database optimizations you may have.
鉴于您对其施加的限制,我宁愿在现有数据库中创建更多表,而不必连接到多个数据库。除了管理您可能拥有的不同数据库优化之外,管理连接字符串往往更加困难。
回答by Tom Wright
FTR, in normal circumstances I'd take the approach described by TheTXI.
FTR,在正常情况下,我会采用 TheTXI 描述的方法。
In answer to your specific question though, I have found it to be dependant on usage. (Cop out I know, but hear me out.)
不过,在回答您的具体问题时,我发现它取决于使用情况。(警察我知道,但听我说。)
A single database is probably easier. You'll have to worry about just one connection and would still have to specify tables. Multiple databases could, under certain conditions, be faster though.
单个数据库可能更容易。您将只需要担心一个连接,并且仍然需要指定表。不过,在某些情况下,多个数据库可能会更快。
If I were you I'd try both. There's no way we'll be able to give you a useful answer.
如果我是你,我会两个都试试。我们不可能给你一个有用的答案。