postgresql 查询一段时间内的DAU/MAU(每日)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24494373/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:31:29  来源:igfitidea点击:

Querying DAU/MAU over time (daily)

sqlpostgresql

提问by David Bailey

I have a daily sessions table with columns user_id and date. I'd like to graph out DAU/MAU (daily active users / monthly active users) on a daily basis. For example:

我有一个包含 user_id 和 date 列的每日会话表。我想每天绘制 DAU/MAU(日活跃用户/月活跃用户)的图表。例如:

Date         MAU      DAU     DAU/MAU
2014-06-01   20,000   5,000   20%
2014-06-02   21,000   4,000   19%
2014-06-03   20,050   3,050   17%
...          ...      ...     ...

Calculating daily actives is straightforward to calculate, but calculating the monthly actives e.g. the number of users that logged in the date-30 days, is causing problems. How is this achieved without a left join for each day?

计算每日活动很容易计算,但计算每月活动(例如,30 天内登录的用户数量)会导致问题。如果没有每天的左连接,这是如何实现的?

Edit: I'm using Postgres.

编辑:我正在使用 Postgres。

回答by Gordon Linoff

Assuming you have values for each day, you can get the total counts using a subquery and range between:

假设您每天都有值,您可以使用子查询和range between

with dau as (
      select date, count(userid) as dau
      from dailysessions ds
      group by date
     )
select date, dau,
       sum(dau) over (order by date rows between -29 preceding and current row) as mau
from dau;

Unfortunately, I think you want distinct users rather than just user counts. That makes the problem much more difficult, especially because Postgres doesn't support count(distinct)as a window function.

不幸的是,我认为您需要不同的用户而不仅仅是用户数量。这使问题变得更加困难,特别是因为 Postgres 不支持count(distinct)作为窗口函数。

I think you have to do some sort of self join for this. Here is one method:

我认为你必须为此做某种自我加入。这是一种方法:

with dau as (
      select date, count(distinct userid) as dau
      from dailysessions ds
      group by date
     )
select date, dau,
       (select count(distinct user_id)
        from dailysessions ds
        where ds.date between date - 29 * interval '1 day' and date
       ) as mau
from dau;

回答by Felipe Hoffa

This one uses COUNT DISTINCT to get the rolling 30 days DAU/MAU:

这个使用 COUNT DISTINCT 来获得滚动的 30 天 DAU/MAU:

(calculating reddit's user engagement in BigQuery - but the SQL is standard enough to be used on other databases)

(计算 reddit 在 BigQuery 中的用户参与度 - 但 SQL 足够标准,可以在其他数据库上使用)

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM (
    SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit') a
  JOIN (
    SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
    FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
    CROSS JOIN (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
      FROM [fh-bigquery:reddit_comments.2015_09]
      GROUP BY 1
    ) b
    WHERE subreddit='AskReddit'
    AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
    GROUP BY 1
  ) b
  ON a.day=b.stopday
  GROUP BY 1
)
ORDER BY 1

I went further at How to calculate DAU/MAU with BigQuery (engagement)

我进一步了解如何使用 BigQuery 计算 DAU/MAU(参与度)

回答by Ilkka Peltola

I've written about this on my blog.

我已经在我的博客上写过这个。

The DAU is easy, as you noticed. You can solve the MAU by first creating a view with boolean values for when a user activates and de-activates, like so:

正如您所注意到的,DAU 很简单。您可以通过首先创建一个包含用户激活和取消激活时的布尔值的视图来解决 MAU,如下所示:

CREATE OR REPLACE VIEW "vw_login" AS 
 SELECT *
    , LEAST (LEAD("date") OVER w, "date" + 30) AS "activeExpiry"
    , CASE WHEN LAG("date") OVER w IS NULL THEN true ELSE false AS "activated"
    , CASE
 WHEN LEAD("date") OVER w IS NULL THEN true
 WHEN LEAD("date") OVER w - "date" > 30 THEN true
 ELSE false
 END AS "churned"
    , CASE
 WHEN LAG("date") OVER w IS NULL THEN false
 WHEN "date" - LAG("date") OVER w <= 30 THEN false
 WHEN row_number() OVER w > 1 THEN true
 ELSE false
 END AS "resurrected"
   FROM "login"
   WINDOW w AS (PARTITION BY "user_id" ORDER BY "date")

This creates boolean values per user per day when they become active, when they churn and when they re-activate.

这会在每个用户每天活跃、流失和重新激活时创建布尔值。

Then do a daily aggregate of the same:

然后每天做一个相同的汇总:

CREATE OR REPLACE VIEW "vw_activity" AS
SELECT 
    SUM("activated"::int) "activated"
  , SUM("churned"::int) "churned"
  , SUM("resurrected"::int) "resurrected"
  , "date"
  FROM "vw_login"
  GROUP BY "date"
  ;

And finally calculate running totals of active MAUs by calculating the cumulative sums over the columns. You need to join the vw_activity twice, since the second one is joined to the day when the user becomes inactive (i.e. 30 days since their last login).

最后通过计算各列的累计总和来计算活动 MAU 的运行总数。您需要加入 vw_activity 两次,因为第二次加入到用户不活动的那一天(即自上次登录后的 30 天)。

I've included a date series in order to ensure that all days are present in your dataset. You can do without it too, but you might skip days in your dataset.

我已经包含了一个日期系列,以确保所有的日子都出现在您的数据集中。您也可以不用它,但您可能会跳过数据集中的几天。

SELECT
 d."date"
 , SUM(COALESCE(a.activated::int,0)
   - COALESCE(a2.churned::int,0)
   + COALESCE(a.resurrected::int,0)) OVER w
 , d."date", a."activated", a2."churned", a."resurrected" FROM
 generate_series('2010-01-01'::date, CURRENT_DATE, '1 day'::interval) d
 LEFT OUTER JOIN vw_activity a ON d."date" = a."date"
 LEFT OUTER JOIN vw_activity a2 ON d."date" = (a2."date" + INTERVAL '30 days')::date
 WINDOW w AS (ORDER BY d."date") ORDER BY d."date";

You can of course do this in a single query, but this helps understand the structure better.

您当然可以在单个查询中执行此操作,但这有助于更好地理解结构。

回答by a_horse_with_no_name

You didn't show us your complete table definition, but maybe something like this:

您没有向我们展示完整的表定义,但可能是这样的:

select date,
       count(*) over (partition by date_trunc('day', date) order by date) as dau,
       count(*) over (partition by date_trunc('month', date) order by date) as mau
from sessions
order by date;

To get the percentage without repeating the window functions, just wrap this in a derived table:

要在不重复窗口函数的情况下获得百分比,只需将其包装在派生表中:

select date, 
       dau,
       mau,
       dau::numeric / (case when mau = 0 then null else mau end) as pct
from (
    select date,
           count(*) over (partition by date_trunc('day', date) order by date) as dau,
           count(*) over (partition by date_trunc('month', date) order by date) as mau
    from sessions
) t
order by date;

Here is an example output:

这是一个示例输出:

postgres=> select * from sessions;
 session_date | user_id
--------------+---------
 2014-05-01   |       1
 2014-05-01   |       2
 2014-05-01   |       3
 2014-05-02   |       1
 2014-05-02   |       2
 2014-05-02   |       3
 2014-05-02   |       4
 2014-05-02   |       5
 2014-06-01   |       1
 2014-06-01   |       2
 2014-06-01   |       3
 2014-06-02   |       1
 2014-06-02   |       2
 2014-06-02   |       3
 2014-06-02   |       4
 2014-06-03   |       1
 2014-06-03   |       2
 2014-06-03   |       3
 2014-06-03   |       4
 2014-06-03   |       5
(20 rows)

postgres=> select session_date,
postgres->        dau,
postgres->        mau,
postgres->        round(dau::numeric / (case when mau = 0 then null else mau end),2) as pct
postgres-> from (
postgres(>     select session_date,
postgres(>            count(*) over (partition by date_trunc('day', session_date) order by session_date) as dau,
postgres(>            count(*) over (partition by date_trunc('month', session_date) order by session_date) as mau
postgres(>     from sessions
postgres(> ) t
postgres-> order by session_date;
 session_date | dau | mau | pct
--------------+-----+-----+------
 2014-05-01   |   3 |   3 | 1.00
 2014-05-01   |   3 |   3 | 1.00
 2014-05-01   |   3 |   3 | 1.00
 2014-05-02   |   5 |   8 | 0.63
 2014-05-02   |   5 |   8 | 0.63
 2014-05-02   |   5 |   8 | 0.63
 2014-05-02   |   5 |   8 | 0.63
 2014-05-02   |   5 |   8 | 0.63
 2014-06-01   |   3 |   3 | 1.00
 2014-06-01   |   3 |   3 | 1.00
 2014-06-01   |   3 |   3 | 1.00
 2014-06-02   |   4 |   7 | 0.57
 2014-06-02   |   4 |   7 | 0.57
 2014-06-02   |   4 |   7 | 0.57
 2014-06-02   |   4 |   7 | 0.57
 2014-06-03   |   5 |  12 | 0.42
 2014-06-03   |   5 |  12 | 0.42
 2014-06-03   |   5 |  12 | 0.42
 2014-06-03   |   5 |  12 | 0.42
 2014-06-03   |   5 |  12 | 0.42
(20 rows)

postgres=>