摘要:查询每分钟最大在线人数,这里处理逻辑基本和最高峰在线人数是一致的,但有一个问题,如果某一分钟内无任何操作记录,则不会出现该分钟的数据,我们就统计不到。所以这里额外考察了生成数据;
有如下数据记录直播平台主播上播及下播时间,根据该数据计算出平台每分钟最大直播人数。
++++| user_id | start_time | end_time |++++| 1 | 2024-04-29 01:00:00 | 2024-04-29 02:01:05 || 2 | 2024-04-29 01:05:00 | 2024-04-29 02:03:18 || 3 | 2024-04-29 02:00:00 | 2024-04-29 04:03:22 || 4 | 2024-04-29 03:15:07 | 2024-04-29 04:33:21 || 5 | 2024-04-29 03:34:16 | 2024-04-29 06:10:45 || 6 | 2024-04-29 05:22:00 | 2024-04-29 07:01:08 || 7 | 2024-04-29 06:11:03 | 2024-04-29 09:26:05 || 3 | 2024-04-29 08:00:00 | 2024-04-29 12:34:27 || 1 | 2024-04-29 11:00:00 | 2024-04-29 16:03:18 || 8 | 2024-04-29 15:00:00 | 2024-04-29 17:01:05 |++++查询每分钟最大在线人数,这里处理逻辑基本和最高峰在线人数是一致的,但有一个问题,如果某一分钟内无任何操作记录,则不会出现该分钟的数据,我们就统计不到。所以这里额外考察了生成数据;
1.首先对原始数据进行处理,生成主播上下播的日志数据,同时增加人数变化字段,主播上播为1,主播下播-1。新数据包含 user_id,action_time,change_cnt
--开播记录selectuser_id,start_time as action_time,1 as change_cntfrom t2_livestream_logunion all --下播记录selectuser_id,end_time as action_time,-1 as change_cntfrom t2_livestream_log查询结果
++++| _u1.user_id | _u1.action_time | _u1.change_cnt |++++| 1 | 2024-04-29 01:00:00 | 1 || 1 | 2024-04-29 02:01:05 | -1 || 2 | 2024-04-29 01:05:00 | 1 || 2 | 2024-04-29 02:03:18 | -1 || 3 | 2024-04-29 02:00:00 | 1 || 3 | 2024-04-29 04:03:22 | -1 || 4 | 2024-04-29 03:15:07 | 1 || 4 | 2024-04-29 04:33:21 | -1 || 5 | 2024-04-29 03:34:16 | 1 || 5 | 2024-04-29 06:10:45 | -1 || 6 | 2024-04-29 05:22:00 | 1 || 6 | 2024-04-29 07:01:08 | -1 || 7 | 2024-04-29 06:11:03 | 1 || 7 | 2024-04-29 09:26:05 | -1 || 3 | 2024-04-29 08:00:00 | 1 || 3 | 2024-04-29 12:34:27 | -1 || 1 | 2024-04-29 11:00:00 | 1 || 1 | 2024-04-29 16:03:18 | -1 || 8 | 2024-04-29 15:00:00 | 1 || 8 | 2024-04-29 17:01:05 | -1 |++++2.生成0~24*60-1条记录数据
select idx from (select posexplode(split(space(24*60),' ')) as (idx,value)) t查询结果,这里仅显示前10行数据
+------+| idx |+------+| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 |+------+3.根据2生成每分钟一条记录的心跳记录,心跳记录change_cnt= 0 ,代表没有主播上播,也没有主播下播。
SELECT0 as user_id,from_unixtime(unix_timestamp('2024-04-29','yyyy-MM-dd')+item*60,'yyyy-MM-dd HH:mm:ss') as action_time,0 as change_cntfrom (select posexplode(split(space(24*60),' ')) as (item,value)) tlimit 10;查询结果,这里仅显示前10行数据
++++| user_id | action_time | change_cnt |++++| 0 | 2024-04-29 00:00:00 | 0 || 0 | 2024-04-29 00:01:00 | 0 || 0 | 2024-04-29 00:02:00 | 0 || 0 | 2024-04-29 00:03:00 | 0 || 0 | 2024-04-29 00:04:00 | 0 || 0 | 2024-04-29 00:05:00 | 0 || 0 | 2024-04-29 00:06:00 | 0 || 0 | 2024-04-29 00:07:00 | 0 || 0 | 2024-04-29 00:08:00 | 0 || 0 | 2024-04-29 00:09:00 | 0 |++++4.汇总所有数据之后,对change_cnt累积求和,然后求出每分钟的最大值即可
with t_all as(--开播记录selectuser_id,start_time as action_time,1 as change_cntfrom t2_livestream_logunion all--下播记录selectuser_id,end_time as action_time,-1 as change_cntfrom t2_livestream_log--心跳数据union allSELECT0 as user_id,from_unixtime(unix_timestamp('2024-04-29','yyyy-MM-dd')+item*60,'yyyy-MM-dd HH:mm:ss') as action_time,0 as change_cntfrom (select posexplode(split(space(24*60),' ')) as (item,value)) t)selectdate_format(action_time,'yyyy-MM-dd HH:mm') as act_minute,max(online_cnt) as minute_max_cntfrom (select user_id,action_time,change_cnt,sum(change_cnt) over (order by action_time asc) online_cntfrom t_all)t1group by date_format(action_time,'yyyy-MM-dd HH:mm');查询结果,截取了部分有直播数据进行展示
+++| act_minute | minute_max_cnt |+++| 2024-04-29 16:40 | 1 || 2024-04-29 16:41 | 1 || 2024-04-29 16:42 | 1 || 2024-04-29 16:43 | 1 || 2024-04-29 16:44 | 1 || 2024-04-29 16:45 | 1 || 2024-04-29 16:46 | 1 || 2024-04-29 16:47 | 1 || 2024-04-29 16:48 | 1 || 2024-04-29 16:49 | 1 || 2024-04-29 16:50 | 1 || 2024-04-29 16:51 | 1 || 2024-04-29 16:52 | 1 || 2024-04-29 16:53 | 1 || 2024-04-29 16:54 | 1 || 2024-04-29 16:55 | 1 || 2024-04-29 16:56 | 1 || 2024-04-29 16:57 | 1 || 2024-04-29 16:58 | 1 |+++CREATE TABLE IF NOT EXISTS t2_livestream_log (user_id INT, -- 主播IDstart_time STRING, -- 开始时间end_time STRING -- 结束时间);insert into t2_livestream_log(user_id, start_time, end_time) values (1,'2024-04-29 01:00:00','2024-04-29 02:01:05'),(2,'2024-04-29 01:05:00','2024-04-29 02:03:18'),(3,'2024-04-29 02:00:00','2024-04-29 04:03:22'),(4,'2024-04-29 03:15:07','2024-04-29 04:33:21'),(5,'2024-04-29 03:34:16','2024-04-29 06:10:45'),(6,'2024-04-29 05:22:00','2024-04-29 07:01:08'),(7,'2024-04-29 06:11:03','2024-04-29 09:26:05'),(3,'2024-04-29 08:00:00','2024-04-29 12:34:27'),(1,'2024-04-29 11:00:00','2024-04-29 16:03:18'),(8,'2024-04-29 15:00:00','2024-04-29 17:01:05');来源:侦探神秘旅