MySQL 在时间戳上创建索引以优化查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9087767/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 11:57:57  来源:igfitidea点击:

Creating an index on a timestamp to optimize query

mysqloptimizationindexingtimestamp

提问by DanielGibbs

I have a query of the following form:

我有以下形式的查询:

SELECT * FROM MyTable WHERE Timestamp > [SomeTime] AND Timestamp < [SomeOtherTime]

I would like to optimize this query, and I am thinking about putting an index on timestamp, but am not sure if this would help. Ideally I would like to make timestamp a clustered index, but MySQL does not support clustered indexes, except for primary keys.

我想优化这个查询,我正在考虑在时间戳上放置一个索引,但我不确定这是否有帮助。理想情况下,我想让时间戳成为聚集索引,但 MySQL 不支持聚集索引,除了主键。

  • MyTablehas 4 million+ rows.
  • Timestampis actually of type INT.
  • Once a row has been inserted, it is never changed.
  • The number of rows with any given Timestampis on average about 20, but could be as high as 200.
  • Newly inserted rows have a Timestampthat is greater than most of the existing rows, but could be less than some of the more recent rows.
  • MyTable有 400 万多行。
  • Timestamp实际上是类型INT.
  • 一旦插入一行,它就永远不会改变。
  • 任何给定的行数Timestamp平均约为 20,但可能高达 200。
  • 新插入的行的 aTimestamp大于大多数现有行,但可能小于一些较新的行。

Would an index on Timestamphelp me to optimize this query?

索引会Timestamp帮助我优化这个查询吗?

采纳答案by Chris Nash

No question about it. Without the index, your query has to look at every row in the table. With the index, the query will be pretty much instantaneous as far as locating the right rows goes. The price you'll pay is a slightperformance decrease in inserts; but that really will be slight.

毫无疑问。如果没有索引,您的查询必须查看表中的每一行。有了索引,只要找到正确的行,查询就几乎是即时的。您将付出的代价是插入物的性能略有下降;但这真的很轻微。

回答by Ryan P

You should definitely use an index. MySQL has no clue what order those timestamps are in, and in order to find a record for a given timestamp (or timestamp range) it needs to look through every single record. And with 4 million of them, that's quite a bit of time! Indexes are your way of telling MySQL about your data -- "I'm going to look at this field quite often, so keep an list of where I can find the records for each value."

您绝对应该使用索引。MySQL 不知道这些时间戳的顺序,为了找到给定时间戳(或时间戳范围)的记录,它需要查看每条记录。有 400 万个,这是相当长的时间!索引是您将数据告知 MySQL 的方式——“我将经常查看此字段,因此请保留一份我可以在何处找到每个值的记录的列表。”

Indexes in general are a good idea for regularly queried fields. The only downside to defining indexes is that they use extra storage space, so unless you're real tight on space, you should try to use them. If they don't apply, MySQL will just ignore them anyway.

对于定期查询的字段,索引通常是一个好主意。定义索引的唯一缺点是它们使用了额外的存储空间,因此除非您的空间非常紧张,否则您应该尝试使用它们。如果它们不适用,MySQL 无论如何都会忽略它们。

回答by ypercube??

If your queries are mainly using this timestamp, you could test this design (enlarging the Primary Key with the timestamp as first part):

如果您的查询主要使用此时间戳,则可以测试此设计(以时间戳为第一部分扩大主键):

CREATE TABLE perf (
  , ts INT NOT NULL
  , oldPK 
  , ... other columns 
, PRIMARY KEY(ts, oldPK)
, UNIQUE (oldPK)
) ENGINE=InnoDB ;

This will ensure that the queries like the one you posted will be using the clustered (primary) key.

这将确保像您发布的查询一样使用集群(主)键。

Disadvantage is that your Inserts will be a bit slower. Also, If you have other indices on the table, they will be using a bit more space (as they will include the 4-bytes wider primary key).

缺点是你的插入会慢一点。此外,如果表中有其他索引,它们将使用更多空间(因为它们将包含 4 字节宽的主键)。

The biggest advantage of such a clustered index is that queries with big range scans, e.g. queries that have to read large parts of the table or the whole table will find the related rows sequentially and in the wanted order (BY timestamp), which will also be useful if you want to group by day or week or month or year.

这种聚集索引的最大优点是具有大范围扫描的查询,例如必须读取表的大部分或整个表的查询将按顺序和所需的顺序 ( BY timestamp)找到相关的行,这也很有用如果你想按天、周、月或年分组。

The old PK can still be used to identify rows by keeping a UNIQUEconstraint on it.

旧的 PK 仍可用于通过对其进行UNIQUE约束来识别行。



You may also want to have a look at TokuDB, a MySQL (and open source) variant that allows multiple clustered indices.

您可能还想看看TokuDB,它是一个 MySQL(和开源)变体,它允许多个聚集索引

回答by blackstrype

I don't disagree with the importance of indexing to improve select query times, but if you can index on other keys (and form your queries with these indexes), the need to index on timestamp may not be needed.

我不反对索引对提高选择查询时间的重要性,但是如果您可以对其他键进行索引(并使用这些索引形成您的查询),则可能不需要对时间戳进行索引。

For example, if you have a table with timestamp, category, and userId, it may be better to create an index on userIdinstead. In a table with many different users this will reduce considerably the remaining set on which to search the timestamp.

例如,如果您有一个包含timestamp,category和的表,userId最好在其上创建索引userId。在具有许多不同用户的表中,这将大大减少用于搜索时间戳的剩余集合。

...and If I'm not mistaken, the advantage of this would be to avoid the overhead of creating the timestamp index on each insertion -- in a table with high insertion rates and highly unique timestamps this could be an important consideration.

...如果我没记错的话,这样做的好处是避免在每次插入时创建时间戳索引的开销——在具有高插入率和高度唯一时间戳的表中,这可能是一个重要的考虑因素。

I'm struggling with the same problems of indexing based on timestamps and other keys. I still have testing to do so I can put proof behind what I say here. I'll try to postback based on my results.

我正在努力解决基于时间戳和其他键的索引问题。我还有测试要做,所以我可以为我在这里所说的内容提供证据。我会尝试根据我的结果回发。

A scenario for better explanation:

一个更好解释的场景:

  1. timestamp 99% unique
  2. userId 80% unique
  3. category 25% unique

    • Indexing on timestamp will quickly reduce query results to 1% the table size
    • Indexing on userId will quickly reduce query results to 20% the table size
    • Indexing on category will quickly reduce query results to 75% the table size
    • Insertion with indexes on timestamp will have high overhead **
    • Despite our knowledge that our insertions will respect the fact of have incrementing timestamps, I don't see any discussion of MySQL optimisation based on incremental keys.
    • Insertion with indexes on userId will reasonably high overhead.
    • Insertion with indexes on category will have reasonably low overhead.
  1. 时间戳 99% 唯一
  2. 用户 ID 80% 唯一
  3. 类别 25% 独特

    • 时间戳索引将快速将查询结果减少到表大小的 1%
    • 对 userId 进行索引将很快将查询结果减少到表大小的 20%
    • 分类索引将很快将查询结果减少到表大小的 75%
    • 在时间戳上插入索引将有很高的开销 **
    • 尽管我们知道我们的插入会尊重时间戳递增的事实,但我没有看到任何关于基于增量键的 MySQL 优化的讨论。
    • 在 userId 上插入索引会带来相当高的开销。
    • 在类别上插入索引将具有相当低的开销。

** I'm sorry, I don't know the calculated overhead or insertion with indexing.

** 对不起,我不知道计算的开销或索引插入。