postgresql 如何在postgres中计算百分位数

Question

提问by lampdev

I Have table called timings where we are storing 1 million response timings for load testing , now we need to divide this data into 100 groups i.e. - first 500 records as one group and so on , and calculate percentile of each group , rather than average.

我有一个名为timings的表，我们在其中存储了100万个用于负载测试的响应时间，现在我们需要将这些数据分成100组，即前500条记录作为一组等等，并计算每组的百分位数，而不是平均值。

so far i tried this query

到目前为止我试过这个查询

Select quartile
     , avg(data) 
     , max(data) 
  FROM (

        SELECT data
             , ntile(500) over (order by data) as quartile
          FROM data
       ) x
 GROUP BY quartile
 ORDER BY quartile

but how do i have find the percentile

但我如何找到百分位数

Answer 1

回答by Paul A Jungwirth

Usually, if you want to know the percentile, you are safer using cume_distthan ntile. That is because ntilebehaves strangely when given few inputs. Consider:

通常情况下，如果你想知道百分，你是更安全的使用cume_dist比ntile。那是因为ntile当输入很少时，行为会很奇怪。考虑：

=# select v, 
          ntile(100) OVER (ORDER BY v),
          cume_dist() OVER (ORDER BY v)
   FROM (VALUES (1), (2), (4), (4)) x(v);

 v | ntile | cume_dist 
---+-------+-----------
 1 |     1 |      0.25
 2 |     2 |       0.5
 4 |     3 |         1
 4 |     4 |         1

You can see that ntileonly uses the first 4 out of 100 buckets, where cume_distalways gives you a number from 0 to 1. So if you want to find out the 99th percentile, you can just throw away everything with a cume_distunder 0.99 and take the smallest vfrom what's left.

你可以看到ntile只使用了 100 个桶中的前 4 个，其中cume_dist总是给你一个从 0 到 1 的数字。所以如果你想找出第 99 个百分位数，你可以扔掉所有cume_dist低于 0.99 的东西，取最小的v从剩下的。

If you are on Postgres 9.4+, then percentile_contand percentile_discmake it even easier, because you don't have to construct the buckets yourself. The former even gives you interpolation between values, which again may be useful if you have a small data set.

如果您对Postgres的9.4+，然后percentile_cont和percentile_disc使其更容易，因为你不必自己构建桶。前者甚至可以为您提供值之间的插值，如果您的数据集很小，这再次可能很有用。

Answer 2

回答by khampson

Edit:

编辑：

Please note that since I originally answered this question, Postgres has gotten additional aggregate functions to help with this. See percentile_discand percentile_conthere. These were introduced in 9.4.

请注意，自从我最初回答这个问题以来，Postgres 已经获得了额外的聚合函数来帮助解决这个问题。看到percentile_disc和percentile_cont在这里。这些是在 9.4 中引入的。

Original Answer:

原答案：

ntileis how one calculates percentiles(among other n-tiles, such as quartile, decile, etc.).

ntile是如何计算百分位数（在其他 n 分位数中，例如四分位数、十分位数等）。

ntilegroups the table into the specified number of buckets as equally as possible. If you specified 4 buckets, that would be a quartile. 10 would be a decile.

ntile将表尽可能平均地分组到指定数量的桶中。如果您指定了 4 个桶，那将是一个四分位数。10 将是一个十分位数。

For percentile, you would set the number of buckets to be 100.

对于percentile，您可以将桶数设置为 100。

I'm not sure where the 500 comes in here... if you want to determine which percentileyour data is in (i.e. divide the million timings as equally as possible into 100 buckets), you would use ntilewith an argument of 100, and the groups would have more than 500 entries.

我不确定 500 在这里的位置......如果你想确定你的数据在哪个百分位（即将百万次尽可能平均地分成 100 个桶），你可以使用ntile100 的参数，并且这些组将有 500 多个条目。

If you don't care about avgnor max, you can drop a bunch from your query. So it would look something like this:

如果你不关心avg或者max，你可以从你的查询中删除一堆。所以它看起来像这样：

SELECT data, ntile(100) over (order by data) AS percentile
FROM data
ORDER BY data

postgresql 如何在postgres中计算百分位数

提问by lampdev

回答by Paul A Jungwirth

回答by khampson

相关推荐

最近更新

标签

postgresql 如何在postgres中计算百分位数

提问by lampdev

回答by Paul A Jungwirth

回答by khampson

相关推荐

PostgreSQL UPDATE 子串替换

postgresql laravel属于有条件的

postgresql OpenERP 服务器错误 访问被拒绝

postgresql PGAdmin 中所有函数的搜索文本

相关推荐

最近更新

标签

postgresql OpenERP 服务器错误访问被拒绝