database 用例:InfluxDB 与 Prometheus

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33350314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:59:43  来源:igfitidea点击:

Usecases: InfluxDB vs. Prometheus

databaseinfluxdbprometheus

提问by SpaceMonkey

Following the Prometheus webpageone main difference between Prometheus and InfluxDB is the usecase: while Prometheus stores time series only InfluxDB is better geared towards storing individual events. Since there was some major work done on the storage engine of InfluxDB I wonder if this is still true.

遵循Prometheus 网页,Prometheus 和 InfluxDB 之间的一个主要区别是用例:虽然 Prometheus 存储时间序列,但只有 InfluxDB 更适合存储单个事件。由于在 InfluxDB 的存储引擎上做了一些主要工作,我想知道这是否仍然正确。

I want to setup a time series database and apart from the push/push model (and probably a difference in performance) I can see no big thing which separates both projects. Can someone explain the difference in usecases?

我想设置一个时间序列数据库,除了推/推模型(可能还有性能差异),我看不出将两个项目分开的大事。有人可以解释用例的区别吗?

回答by Paul Dix

InfluxDB CEO and developer here. The next version of InfluxDB (0.9.5) will have our new storage engine. With that engine we'll be able to efficiently store either single event data or regularly sampled series. i.e. Irregular and regular time series.

InfluxDB 首席执行官和开发人员在这里。InfluxDB 的下一个版本(0.9.5)将拥有我们新的存储引擎。使用该引擎,我们将能够有效地存储单个事件数据或定期采样的系列。即不规则和规则的时间序列。

InfluxDB supports int64, float64, bool, and string data types using different compression schemes for each one. Prometheus only supports float64.

InfluxDB 支持 int64、float64、bool 和 string 数据类型,每种数据类型使用不同的压缩方案。Prometheus 只支持 float64。

For compression, the 0.9.5 version will have compression competitive with Prometheus. For some cases we'll see better results since we vary the compression on timestamps based on what we see. Best case scenario is a regular series sampled at exact intervals. In those by default we can compress 1k points timestamps as an 8 byte starting time, a delta (zig-zag encoded) and a count (also zig-zag encoded).

对于压缩,0.9.5 版本的压缩将与 Prometheus 竞争。在某些情况下,我们会看到更好的结果,因为我们会根据所看到的内容改变时间戳的压缩。最好的情况是按精确间隔采样的常规系列。在那些默认情况下,我们可以将 1k 点时间戳压缩为 8 字节的起始时间、增量(锯齿形编码)和计数(也是锯齿形编码)。

Depending on the shape of the data we've seen < 2.5 bytes per point on average after compactions.

根据我们看到的数据形状,压缩后平均每点 < 2.5 个字节。

YMMV based on your timestamps, the data type, and the shape of the data. Random floats with nanosecond scale timestamps with large variable deltas would be the worst, for instance.

YMMV 基于您的时间戳、数据类型和数据形状。例如,具有大变量增量的纳秒级时间戳的随机浮点数将是最糟糕的。

The variable precision in timestamps is another feature that InfluxDB has. It can represent second, millisecond, microsecond, or nanosecond scale times. Prometheus is fixed at milliseconds.

时间戳中的可变精度是 InfluxDB 的另一个特性。它可以表示秒、毫秒、微秒或纳秒级时间。Prometheus 固定在毫秒。

Another difference is that writes to InfluxDB are durable after a success response is sent to the client. Prometheus buffers writes in memory and by default flushes them every 5 minutes, which opens a window of potential data loss.

另一个区别是,在成功响应发送到客户端后,对 InfluxDB 的写入是持久的。Prometheus 在内存中缓冲写入,默认情况下每 5 分钟刷新一次,这会打开潜在数据丢失的窗口。

Our hope is that once 0.9.5 of InfluxDB is released, it will be a good choice for Prometheus users to use as long term metrics storage (in conjunction with Prometheus). I'm pretty sure that support is already in Prometheus, but until the 0.9.5 release drops it might be a bit rocky. Obviously we'll have to work together and do a bunch of testing, but that's what I'm hoping for.

我们希望,一旦 InfluxDB 0.9.5 发布,Prometheus 用户将其用作长期指标存储(与 Prometheus 结合使用)将是一个不错的选择。我很确定 Prometheus 已经提供了支持,但在 0.9.5 版本发布之前,它可能有点不稳定。显然,我们必须共同努力并进行大量测试,但这正是我所希望的。

For single server metrics ingest, I would expect Prometheus to have better performance (although we've done no testing here and have no numbers) because of their more constrained data model and because they don't append writes to disk before writing out the index.

对于单个服务器指标摄取,我希望 Prometheus 具有更好的性能(尽管我们在这里没有进行测试并且没有数字),因为它们的数据模型更加受限,并且因为它们在写出索引之前不会将写入附加到磁盘.

The query language between the two are very different. I'm not sure what they support that we don't yet or visa versa so you'd need to dig into the docs on both to see if there's something one can do that you need. Longer term our goal is to have InfluxDB's query functionality be a superset of Graphite, RRD, Prometheus and other time series solutions. I say superset because we want to cover those in addition to more analytic functions later on. It'll obviously take us time to get there.

两者之间的查询语言非常不同。我不确定他们支持什么,而我们还没有,反之亦然,所以你需要深入研究两者的文档,看看是否有你需要的东西可以做。从长远来看,我们的目标是让 InfluxDB 的查询功能成为 Graphite、RRD、Prometheus 和其他时间序列解决方案的超集。我说超集是因为我们想在稍后介绍更多解析函数之外的内容。显然我们需要时间才能到达那里。

Finally, a longer term goal for InfluxDB is to support high availability and horizontal scalability through clustering. The current clustering implementation isn't feature complete yet and is only in alpha. However, we're working on it and it's a core design goal for the project. Our clustering design is that data is eventually consistent.

最后,InfluxDB 的长期目标是通过集群支持高可用性和水平可扩展性。当前的集群实现功能尚未完成,仅处于 alpha 阶段。但是,我们正在努力,这是该项目的核心设计目标。我们的聚类设计是数据最终是一致的。

To my knowledge, Prometheus' approach is to use double writes for HA (so there's no eventual consistency guarantee) and to use federation for horizontal scalability. I'm not sure how querying across federated servers would work.

据我所知,Prometheus 的方法是对 HA 使用双重写入(因此没有最终的一致性保证)并使用联合来实现水平可扩展性。我不确定跨联合服务器的查询将如何工作。

Within an InfluxDB cluster, you can query across the server boundaries without copying all the data over the network. That's because each query is decomposed into a sort of MapReduce job that gets run on the fly.

在 InfluxDB 集群中,您可以跨服务器边界查询,而无需通过网络复制所有数据。这是因为每个查询都被分解为一种可以动态运行的 MapReduce 作业。

There's probably more, but that's what I can think of at the moment.

可能还有更多,但这是我目前能想到的。

回答by user5994461

We've got the marketing message from the two companies in the other answers. Now let's ignore it and get back to the sad real world of time-data series.

我们在其他答案中收到了两家公司的营销信息。现在让我们忽略它并回到时间数据系列的悲惨现实世界。

Some History

一些历史

InfluxDB and prometheus were made to replace old tools from the past era (RRDtool, graphite).

InfluxDB 和 prometheus 被用来替换过去时代的旧工具(RRDtool,石墨)。

InfluxDB is a time series database. Prometheus is a sort-of metrics collection and alerting tool, with a storage engine written just for that. (I'm actually not sure you could [or should] reuse the storage engine for something else)

InfluxDB 是一个时间序列数据库。Prometheus 是一种指标收集和警报工具,具有专门为此编写的存储引擎。(我实际上不确定您是否可以 [或应该] 将存储引擎重用于其他用途)

Limitations

限制

Sadly, writing a database is a very complex undertaking. The only way both these tools manage to ship something is by dropping all the hard features relating to high-availability and clustering.

遗憾的是,编写数据库是一项非常复杂的工作。这两种工具设法交付某些东西的唯一方法是删除与高可用性和集群相关的所有硬特性。

To put it bluntly, it's a single application running only a single node.

说白了就是一个应用,只运行一个节点。

Prometheus has no goal to support clustering and replication whatsoever. The official way to support failover is to "run 2 nodes and send data to both of them". Ouch. (Note that it's seriously the ONLY existing way possible, it's written countless times in the official documentation).

Prometheus 没有任何目标来支持集群和复制。支持故障转移的官方方法是“运行 2 个节点并向它们发送数据”。哎哟。(请注意,这是唯一可能的现有方式,它在官方文档中写了无数次)。

InfluxDBhas been talking about clustering for years... until it was officially abandoned in March. Clustering ain't on the table anymore for InfluxDB. Just forget it. When it will be done (supposing it ever is) it will only be available in the Enterprise Edition.

InfluxDB多年来一直在谈论集群......直到它在三月份被正式放弃。InfluxDB 不再需要聚类。把它忘了吧。当它完成时(假设它曾经是)它只会在企业版中可用。

https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/

https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/

Within the next few years, we will hopefully have a well-engineered time-series database that is handling all the hard problems relating to databases: replication, failover, data safety, scalability, backup...

在接下来的几年里,我们有望拥有一个精心设计的时间序列数据库,它可以处理与数据库相关的所有难题:复制、故障转移、数据安全、可扩展性、备份……

At the moment, there is no silver bullet.

目前,没有银弹。

What to do

该怎么办

Evaluate the volume of data to be expected.

评估预期的数据量。

100 metrics * 100 sources * 1 second => 10000 datapoints per second => 864 Mega-datapoints per day.

100 个指标 * 100 个源 * 1 秒 => 每秒 10000 个数据点 => 每天 864 个兆数据点。

The nice thing about times series databases is that they use a compact format, they compress well, they aggregate datapoints, and they clean old data. (Plus they come with features relevant to time data series.)

时间序列数据库的好处在于它们使用紧凑的格式,它们压缩得很好,它们聚合数据点,并且它们清理旧数据。(此外,它们还具有与时间数据系列相关的功能。)

Supposing that a datapoint is treated as 4 bytes, that's only a few Gigabytes per day. Lucky for us, there are systems with 10 cores and 10 TB drives readily available. That could probably run on a single node.

假设一个数据点被视为 4 个字节,那么每天只有几 GB。幸运的是,有 10 个内核和 10 TB 驱动器的系统随时可用。这可能会在单个节点上运行。

The alternative is to use a classic NoSQL database (Cassandra, ElasticSearch or Riak) then engineer the missing bits in the application. These databases may not be optimized for that kind of storage (or are they? modern databases are so complex and optimized, can't know for sure unless benchmarked).

另一种方法是使用经典的 NoSQL 数据库(Cassandra、ElasticSearch 或 Riak),然后设计应用程序中的缺失位。这些数据库可能没有针对那种存储进行优化(或者是吗?现代数据库是如此复杂和优化,除非经过基准测试才能确定)。

You should evaluate the capacity required by your application. Write a proof of concept with these various databases and measures things.

您应该评估您的应用程序所需的容量。使用这些不同的数据库编写概念证明并测量事物。

See if it falls within the limitations of InfluxDB. If so, it's probably the best bet. If not, you'll have to make your own solution on top of something else.

看看它是否属于 InfluxDB 的限制。如果是这样,这可能是最好的选择。如果没有,您将不得不在其他方面制定自己的解决方案。

回答by user3091890

InfluxDB simply cannot hold production load (metrics) from 1000 servers. It has some real problems with data ingestion and ends up stalled/hanged and unusable. We tried to use it for a while but once data amount reached some critical level it could not be used anymore. No memory or cpu upgrades helped. Therefore our experience is definitely avoid it, it's not mature product and has serious architectural design problems. And I am not even talking about sudden shift to commercial by Influx.

InfluxDB 根本无法容纳 1000 台服务器的生产负载(指标)。它在数据摄取方面存在一些实际问题,并最终停滞/挂起且无法使用。我们尝试使用它一段时间,但一旦数据量达到某个临界水平,它就不能再使用了。没有内存或 CPU 升级有帮助。因此我们的经验是绝对避免它,它不是成熟的产品并且具有严重的架构设计问题。我什至不是在谈论 Influx 突然转向商业。

Next we researched Prometheus and while it required to rewrite queries it now ingests 4 times more metrics without any problems whatsoever compared to what we tried to feed to Influx. And all that load is handled by single Prometheus server, it's fast, reliable, and dependable. This is our experience running huge international internet shop under pretty heavy load.

接下来,我们研究了 Prometheus,虽然它需要重写查询,但与我们尝试提供给 Influx 的指标相比,它现在摄取的指标是 4 倍,没有任何问题。所有负载都由单个 Prometheus 服务器处理,它快速、可靠且可靠。这是我们在相当重的负载下运行大型国际网店的经验。

回答by Travis Bear

IIRC current Prometheus implementation is designed around all the data fitting on a single server. If you have gigantic quantities of data, it may not all fit in Prometheus.

IIRC 当前的 Prometheus 实现是围绕单个服务器上的所有数据拟合而设计的。如果您有大量数据,Prometheus 中可能并不适合所有数据。