• 1. Oracle Analytic Functions In PracticeBy dingjun123 2013.06http://blog.chinaunix.net/uid/7655508.html
  • 2. 主要内容
  • 3. 甲:我有个SQL,你能帮我用分析函数改写下吗? 乙:相关列有索引吗? 甲:owner有索引,选择性不错,我想用分析函数改写看看? 乙:哦,知道了,这是典型的top-n查询。 SELECT owner,object_type FROM demo2 WHERE owner='DINGJUN123' AND trunc(created,'dd') = (SELECT MAX(trunc(created,'dd')) FROM demo2 WHERE owner='DINGJUN123') SELECT owner,object_type FROM ( SELECT owner,object_type, dense_rank() over(ORDER BY trunc(created,'dd') DESC) rn FROM demo2 WHERE owner='DINGJUN123' ) WHERE rn=1分析函数作用欢迎进入今天的 分析函数学习之旅!
  • 4. 4子查询方法总行数:667827 返回9行 原始SQL : 逻辑读848,COST:1103 优点:最容易想到 缺点:多次访问表或索引分析函数方法分析SQL : 逻辑读423,COST:693 优点:减少表或索引的访问次数,逻辑读和COST 是常规方法的一半,SQL简单 缺点:需要排序操作进一步优化建立owner,trunc(created,’dd’) desc复合索引 作用对于原始SQL子查询可以快速扫描,分析函数消除排序 优化后原始SQL逻辑读/COST:111/171,分析函数:6/767 构建合适索引,消除 排序,是一种重要的 SQL优化手段------------------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------------------ | 0 | SELECT STATEMENT | | 202 | |* 1 | TABLE ACCESS BY INDEX ROWID | DEMO2 | 202 | |* 2 | INDEX RANGE SCAN | IDX_DEMO2 | 20237 | | 3 | SORT AGGREGATE | | 1 | | 4 | TABLE ACCESS BY INDEX ROWID| DEMO2 | 20237 | |* 5 | INDEX RANGE SCAN | IDX_DEMO2 | 20237 | Predicate Information (identified by operation id): ------------------------------------------------------------- 1 - filter(TRUNC(INTERNAL_FUNCTION("CREATED"),'fmdd')= (SELECT MAX(TRUNC(INTERNAL_FUNCTION("CREATED"),'fmdd')) FROM "DEMO2" "DEMO2" WHERE "OWNER"='DINGJUN123')) 2 - access("OWNER"='DINGJUN123') 5 - access("OWNER"='DINGJUN123')------------------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20237 | |* 1 | VIEW | | 20237 | |* 2 | WINDOW SORT PUSHED RANK | | 20237 | | 3 | TABLE ACCESS BY INDEX ROWID| DEMO2 | 20237 | |* 4 | INDEX RANGE SCAN | IDX_DEMO2 | 20237 | ------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN"=1) 2 - filter(DENSE_RANK() OVER ( ORDER BY TRUNC(INTERNAL_FUNCTION("CREATED"),'fmdd') DESC )<=1) 4 - access("OWNER"='DINGJUN123')分析函数作用
  • 5. 分析函数的作用总结--------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------- | 0 | SELECT STATEMENT | | 213 | | 1 | TABLE ACCESS BY INDEX ROWID| DEMO2 | 213 | |* 2 | INDEX RANGE SCAN | IDX_DEMO2 | 1 | | 3 | SORT AGGREGATE | | 1 | |* 4 | INDEX RANGE SCAN | IDX_DEMO2 | 21334 | ---------------------------------------------------------SELECT owner,object_type FROM demo2 WHERE owner='DINGJUN123' AND trunc(created,'dd') = (SELECT MAX(trunc(created,'dd')) FROM demo2 WHERE owner='DINGJUN123')1.减少表或索引的访问次数SELECT owner,object_type FROM ( SELECT owner,object_type, dense_rank() over(ORDER BY trunc(created,'dd') DESC) rn FROM demo2 WHERE owner='DINGJUN123' ) WHERE rn=1----------------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | 21334 | |* 1 | VIEW | | 21334 | |* 2 | WINDOW NOSORT STOPKEY | | 21334 | | 3 | TABLE ACCESS BY INDEX ROWID| DEMO2 | 21334 | |* 4 | INDEX RANGE SCAN | IDX_DEMO2 | 21334 | -----------------------------------------------------------2.实现复杂的行间计算,复杂聚合等SELECT empno,sal,deptno, SUM(sal) over (PARTITION BY deptno ORDER BY empno) sum_current FROM emp EMPNO SAL DEPTNO SUM_CURRENT ------ ---------- ---------- ----------- 7782 2450 10 2450 7839 5000 10 7450 7934 1300 10 8750 。。。
  • 6. 很多分析函数要求排序SELECT a.ID,a.sal,a.ext FROM t1 a, (SELECT ID,MAX(sal) max_sal FROM t1 GROUP BY ID ) b WHERE a.sal=b.max_sal AND a.ID=b.IDSELECT ID,sal,ext FROM ( SELECT ID,sal,ext,rank() over(PARTITION BY ID ORDER BY sal DESC) rn FROM t1 ) WHERE rn=1--------------------------------------------------------- | SELECT STATEMENT | | 1 | 65 | | | HASH JOIN | | 1 | 65 | 35M| | VIEW | | 990K| 24M| | | HASH GROUP BY | | 990K| 24M| | | TABLE ACCESS FULL| T1 | 990K| 24M| | | TABLE ACCESS FULL | T1 | 990K| 36M| | ---------------------------------------------------------Elapsed: 00:00:01.49------------------------------------------------------------- | SELECT STATEMENT | | 990K| 49M| | | VIEW | | 990K| 49M| | | WINDOW SORT PUSHED RANK| | 990K| 36M| 49M| | TABLE ACCESS FULL | T1 | 990K| 36M| | -------------------------------------------------------------Elapsed: 00:00:04.38CREATE TABLE t1 AS SELECT mod(LEVEL,1000) ID,LEVEL+1000 sal,MOD(LEVEL,10) ext FROM dual CONNECT BY LEVEL<1000000需要排序的分析函数,会消耗一定的资源,当然大多可以优化,对复杂的行间计算、累计值、移动平均等还是首选分析函数ID很多重复,inline view分组减少连接数量分析函数缺点
  • 7. 分析函数语法-图1
  • 8. 分析函数语法-图2
  • 9. PARTITION BY ORDER BY ROWS Vs RANGE Analytic Function UNBOUNDED PRECEDING FOLLOWING CURRENT ROW分析函数语法之关系文档注意点和限制
  • 10. 分析函数语法之partition by 通过partition by子句,将相同的行聚合到一起成为一组,之后当前行的分析函数计算 就是在这行对应的partition里。每个分析函数都可以使用partition by子句。 每行在对应的窗口内,应用分析函数,然后计算得到当前行对应的分析函数值。 partition by子句可以没有,如果也没有order by子句,那么表示当前行对应的窗口 范围是所有行。 11223SELECT deptno,empno,sal, SUM(sal) over(PARTITION BY deptno) sum_dept, SUM(sal) over() sum_all FROM emp DEPTNO EMPNO SAL SUM_DEPT SUM_ALL --------- ---------- ---------- ---------- ---------- 10 7782 2450 8750 32025 10 7934 1300 8750 32025 10 7839 5000 8750 32025 20 7902 3000 10875 32025 20 7566 2975 10875 32025 20 7876 1100 10875 32025 20 7369 800 10875 32025 20 7788 3000 10875 32025
  • 11. 分析函数语法之order by order by当前行默认窗口是当前行所属的partition第1行到当前行(根据order by顺序指定),无order by就是对应所属partition所有行。 order by默认是range窗口,对应逻辑窗口,保证分析函数值的唯一性,但是对排名分析函数特殊,因为排名函数不能带window。 order by如果有多个排序键且是range窗口,则必须要求对应的窗口是当前partition所有行、第1行到当前行、当前行到当前partition最后一行或当前行到当前行 123SELECT deptno, empno, sal, SUM(sal) over(PARTITION BY deptno ORDER BY sal) dept_current, SUM(sal) over(PARTITION BY deptno ORDER BY sal RANGE BETWEEN unbounded preceding AND CURRENT ROW) dept_current1 FROM emp DEPTNO EMPNO SAL DEPT_CURRENT DEPT_CURRENT1 ------ ---------- ---------- ------------ ------------- 10 7934 1300 1300 1300 10 7782 2450 3750 3750 10 7839 5000 8750 8750 20 7369 800 800 800 20 7876 1100 1900 1900 20 7566 2975 4875 4875 20 7788 3000 10875 10875 20 7902 3000 10875 10875 30 7900 950 950 950 30 7654 1250 3450 3450 30 7521 1250 3450 3450 30 7844 1500 4950 4950 30 7499 1600 6550 6550 30 7698 2850 9400 9400 逻辑当前行
  • 12. 分析函数语法之windowWITH t AS (SELECT (CASE WHEN LEVEL IN (1,2) THEN 1 WHEN LEVEL IN (4,5) THEN 6 ELSE LEVEL END) ID FROM dual CONNECT BY LEVEL<10) --默认排序,逻辑当前行 SELECT id,SUM(ID) over(ORDER BY ID) default_sum, --逻辑行上限到当前行,此当前行是逻辑当前行,和上面一样 SUM(ID) over(ORDER BY ID RANGE BETWEEN unbounded preceding AND CURRENT ROW) range_unbound_sum, --物理当前行,按排序后的行位置计算 SUM(ID) over(ORDER BY ID ROWS BETWEEN unbounded preceding AND CURRENT ROW) rows_unbound_sum, --下面两条与上面的比较,换成了对窗口有一定的限制,同样分为--逻辑行和物理行 SUM(ID) over(ORDER BY ID RANGE BETWEEN 1 preceding AND 2 following) range_sum, SUM(ID) over(ORDER BY ID ROWS BETWEEN 1 preceding AND 2 following) rows_sum FROM t; 有显式window子句,必须有order by。有的分析函数不能有显式window,如 row_number,rank,dense_rank等 window可以指定是逻辑窗口还是物理窗口,逻辑行用range,物理行用rows window窗口滑动的方向应该总是从上到下,特别注意range的计算方式和order by是升序还是降序有关。 123 ID DEFAULT_SUM RANGE_UNBOUND_SUM ROWS_UNBOUND_SUM RANGE_SUM ROWS_SUM --- ----------- ----------------- ---------------- ---------- ---------- 1 2 2 1 5 5 1 2 2 2 5 11 3 5 5 5 3 16 6 23 23 11 33 21 6 23 23 17 33 25 6 23 23 23 33 27 7 30 30 30 42 30 8 38 38 38 24 24 9 47 47 47 17 17
  • 13. 13语法综合之ROWS rows窗口是物理窗口,也就是排序后,按排序结果的行号对应确定窗口,指定rows窗口的话,必须要求排序键唯一,否则结果可能不稳定。 SELECT deptno, empno, sal, SUM(sal) over(PARTITION BY deptno ORDER BY sal ROWS BETWEEN unbounded preceding AND CURRENT ROW) dept_current FROM emp DEPTNO EMPNO SAL DEPT_CURRENT ------- ---------- ---------- ------------ 10 7934 1300 1300 10 7782 2450 3750 10 7839 5000 8750 20 7369 800 800 20 7876 1100 1900 20 7566 2975 4875 20 7788 3000 7875 20 7902 3000 10875 rows窗口既然是物理窗口,那么它的order by是允许带多个排序键的。后面讲解range与rows不同。rows窗口滑动顺序总是向下。它的含义就是当前行之前xx行为起点 ,当前行之后xx行为终点(其他类似)。 SELECT deptno, empno, sal, SUM(sal) over(PARTITION BY deptno ORDER BY sal DESC ROWS BETWEEN 1 preceding AND 2 preceding) sum_sal FROM emp DEPTNO EMPNO SAL SUM_SAL ------ ---------- ---------- ---------- 10 7839 5000 10 7782 2450 10 7934 1300 20 7788 3000 20 7902 3000 20 7566 2975 20 7876 1100 20 7369 800 30 7698 2850 30 7499 1600 30 7844 1500 30 7654 1250 30 7521 1250 30 7900 950终点 起点换成1 follwing
  • 14. 14语法综合之RANGE range窗口是默认窗口,它不管排序键是否是唯一,都能保证诸如聚合分析函数值的唯一(排名分析函数?必须唯一),range窗口反应的是行与行之间的逻辑关系(当前行的排序键的value加或减逻辑偏移量得到当前行对应的逻辑窗口的范围)。 range+order by带多个排序键,必须要求对应的窗口是当前partition所有行、第1行到当前行、当前行到当前partition最后一行或当前行到当前行。(Why?),升序range窗口preceding含义是比当前行小xx值,following含义是比当前行大xx值,降序range窗口preceding含义是比当前行大xx值,following含义是比当前行小xx值。 SELECT department_id, employee_id, salary, SUM(salary) over(PARTITION BY department_id ORDER BY salary RANGE BETWEEN 100 preceding AND 500 following) sum_sal FROM hr.employees;DEPARTMENT_ID EMPLOYEE_ID SALARY SUM_SAL ------------- ----------- ---------- ---------- 10 200 4400 4400 20 202 6000 6000 20 201 13000 13000 30 119 2500 10800 30 118 2600 13900 30 117 2800 8800 30 116 2900 8800 30 115 3100 3100 30 114 11000 11000换成rows含义变化
  • 15. 15语法综合之RANGESELECT department_id, employee_id, salary, SUM(salary) over(PARTITION BY department_id ORDER BY salary DESC RANGE BETWEEN 100 preceding AND 500 following) sum_sal FROM hr.employees;DEPARTMENT_ID EMPLOYEE_ID SALARY SUM_SAL ------------- ----------- ---------- ---------- 10 200 4400 4400 20 201 13000 13000 20 202 6000 6000 30 114 11000 11000 30 115 3100 11400 30 116 2900 10800 30 117 2800 10800 30 118 2600 5100 30 119 2500 5100 40 203 6500 6500 50 121 8200 24100 50 120 8000 15900 50 122 7900 15900 50 123 6500 6500 50 124 5800 5800 因为salary desc,所以100 preceding含义是比当前行的salary大100,以employee_id=117为例,也就是起点是2800+100=2900,终点是比当前行小500,2800-500=2300。那么employee_id=117行分析函数计算的窗口范围是employee_id between 116 and 119。
  • 16. 16语法综合之窗口范围 当前行对应的分析函数结果是根据当前行对应的窗口(逻辑OR物理)范围计算的。理解窗口的含义对分析函数的使用至关重要。窗口只有起点的,起点必须是CURRENT ROW之前(包含CURRENT ROW)。窗口内数据移动顺序必须是按照排序顺序,从上到下,否则要么报错,要么结果有问题。示例说明ROWS/RANGE UNBOUNDED PRECEDING终点CURRENT ROW,相当于BETWEEN 起点 BTEWEEN CURRENT ROWROWS/RANGE CURRENT ROW相当于BETWEEN CURRENT ROW AND CURRENT ROWROWS/RANGE value_expr PRECEDING相当于BETWEEN value_expr PRECEDING AND CURRENT ROWvalue_expr FOLLOWING报错,无效窗口表1-只指定起点的表2-有起点和终点示例说明ROWS/RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING和无ORDER BY一样,等价于没有写ORDER By和WINDOW子句ROWS/RANGE BETWEEN CURRENT ROW AND value_expr FOLLOWING当前行到对应value_exp偏移量的行ROWS/RANGE BETWEEN value_expr FOLLOWING AND value_expr1 FOLLOWING 起点是FOLLOWING,终点必须是FOLLOWINGROWS/RANGE BETWEEN value_expr PRECEDING AND value_expr1 FOLLOWING 终点是PRECEDING,起点必须是PRECEDINGROWS/RANGE BETWEEN CURRENT ROW AND value_expr PRECEDING报错,无效窗口
  • 17. 17分析函数分类分类内容说明使用频率排名分析函数ROW_NUMBER、RANK、DENSE_RANK、FIRST、LAST、 LAST_VALUE,FIRST_VALUEFIRST,LAST不应该算做函数,它主要与聚合函数联合使用强聚合分析函数AVG、COUNT、MAX,MIN,SUM强行比较分析函数LEAD、LAG解决当前行与前后行之间的关系中统计分析函数RATIO_TO_REPORT解决报表占比问题中行连接分析函数LISTAGG11g新特性中其他分析函数CORR、CORVAR_POP、CORVAR_SAMP、CUME_DIST、NTH_VALUE、NTILE、PERCENT_RANK、PERCENTILE_CONT、PERCENTILE_DISC、REGR_ (Linear Regression) Functions、STDDEV、STDDEV_POP、STDDEV_SAMP 、VAR_POP,VAR_SAMP、VARIANCE弱 按照各种分析函数使用频率以及业务特点,将分析函数分为排名分析函数、聚合分析函数、行比较分析函数、统计分析函数、行连接分析函数、其他分析函数(基本都是数学统计函数,很少使用)。很多分析函数,同时也可以作为普通的组函数使用,有的分析函数,比如FIRST,LAST、LISTAGG,很多时候,主要用它的普通分组函数功能,而不是它的分析函数功能。
  • 18. 排名分析函数row_number,dense_rank,rank 排名分析函数有row_number,dense_rank,rank,first,last,first_value,last_value等,其中row_number,dense_rank,rank,first,last都需要order by,first_value和last_value可以不指定。 SELECT empno,sal,deptno, row_number() over(PARTITION BY deptno ORDER BY sal) row_rn , rank() over(PARTITION BY deptno ORDER BY sal) rank_rn, dense_rank() over(PARTITION BY deptno ORDER BY sal) dense_rn FROM emp EMPNO SAL DEPTNO ROW_RN RANK_RN DENSE_RN ------ ---------- ---------- ---------- ---------- ---------- 7934 1300 10 1 1 1 7782 2450 10 2 2 2 7839 5000 10 3 3 3 7369 800 20 1 1 1 7876 1100 20 2 2 2 7566 2975 20 3 3 3 7902 3000 20 4 4 4 7788 3000 20 5 4 4 7900 950 30 1 1 1 7521 1250 30 2 2 2 7654 1250 30 3 2 2 7844 1500 30 4 4 3 7845 1500 30 5 4 3 7846 1500 30 6 4 3 7499 1600 30 7 7 4 7698 2850 30 8 8 5
  • 19. 19排名分析函数row_number,dense_rank,rank新特性 在9i以及之前的版本中使用row_number做分页,会有一定的问题,它没有做到谓词推进,10g之后已经做了优化。 SELECT empno,ename,job,mgr,deptno FROM ( SELECT empno,ename,job,mgr,deptno FROM scott.emp ORDER BY empno ) WHERE ROWNUM<10; ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 468 | 2 (0)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 9 | 468 | 2 (0)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| EMP | 12 | 300 | 2 (0)| 00:00:01 | | 4 | INDEX FULL SCAN | PK_EMP | 9 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<10) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent getsSELECT empno,ename,job,mgr,deptno FROM ( SELECT a.*,row_number() over(ORDER BY empno) rn FROM scott.emp a ) WHERE rn<10; ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 12 | 780 | 2 (0)| 00:00:01 | |* 1 | VIEW | | 12 | 780 | 2 (0)| 00:00:01 | |* 2 | WINDOW NOSORT STOPKEY | | 12 | 300 | 2 (0)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| EMP | 12 | 300 | 2 (0)| 00:00:01 | | 4 | INDEX FULL SCAN | PK_EMP | 12 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN"<10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMPNO")<10) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets
  • 20. 排名分析函数first、last FIRST/LAST可以作为组函数和分析函数,组函数不带OVER,分析函数带OVER,它必须结合KEEP关键字,KEEP关键字就是起到一个语义的作用:说明按照指定的排序规则找到TOP 1或BOTTOM 1,因为是DENSE_RANK,所以TOP1和BOTTOM1是个集合,然后外层用组函数对TOP 1或BOTTOM 1的行再处理 获得唯一结果。因此它的作用就是对某些列排名,对其他列计算。
  • 21. 排名分析函数first、lastSELECT yy,mm,NAME,s_qty,avg_price, (SELECT t.invent FROM sale t WHERE t.yy=s.yy AND t.mm=s.mm AND t.dd=s.mdd ) last_invent FROM ( SELECT yy,mm,NAME,SUM(qty) s_qty,AVG(price) avg_price,MAX(dd) mdd FROM sale GROUP BY yy,mm,NAME ) s ORDER BY yy,mm,NAMESELECT yy,mm,NAME,SUM(qty) s_qty,AVG(price) avg_price, MAX(invent) KEEP (dense_rank LAST ORDER BY dd) last_invent FROM sale GROUP BY yy,mm,NAME
  • 22. 22排名分析函数first_value、last_value FIRST_VALUE/LAST_VALUE只能用作分析函数,它返回的是按照指定排序,得到对应窗口集合的第1个值或最后一个值。它特别有用的是可以忽略NULL,常用于填补缺失数据。
  • 23. 23排名分析函数first_value、last_valueSELECT deptno, sal, ename, last_value(ename) over(PARTITION BY deptno ORDER BY sal ) last_ename FROM scott.emp;DEPTNO SAL ENAME LAST_ENAME ------ ---------- ---------- ---------- 10 1300 MILLER MILLER 10 2450 CLARK CLARK 10 5000 KING KING 20 800 SMITH SMITH 20 2975 JONES JONES 20 3000 FORD FORD 30 950 JAMES JAMES 30 1250 MARTIN WARD 30 1250 WARD WARD 30 1500 TURNER TURNER 30 1600 ALLEN ALLEN 30 2850 BLAKE BLAKESELECT deptno, sal, ename, last_value(ename) over(PARTITION BY deptno ORDER BY sal ROWS unbounded preceding) last_ename FROM scott.emp; DEPTNO SAL ENAME LAST_ENAME ------- ---------- ---------- ---------- 10 1300 MILLER MILLER 10 2450 CLARK CLARK 10 5000 KING KING 20 800 SMITH SMITH 20 2975 JONES JONES 20 3000 FORD FORD 30 950 JAMES JAMES 30 1250 WARD WARD 30 1250 MARTIN MARTIN 30 1500 TURNER TURNER 30 1600 ALLEN ALLEN 30 2850 BLAKE BLAKE
  • 24. 聚集分析函数 聚集分析函数有sum,max,min,avg,count等,很多组函数同时可以作为分析函数使用。 SELECT ename, hiredate, sal, SUM(sal) OVER (ORDER BY hiredate RANGE NUMTOYMINTERVAL(2,'month') PRECEDING) AS t_sal FROM scott.emp ENAME HIREDATE SAL T_SAL ---------- ------------------- ---------- ---------- SMITH 1980-12-17 00:00:00 800 800 ALLEN 1981-02-20 00:00:00 1600 1600 WARD 1981-02-22 00:00:00 1250 2850 JONES 1981-04-02 00:00:00 2975 5825 BLAKE 1981-05-01 00:00:00 2850 5825 CLARK 1981-06-09 00:00:00 2450 5300 TURNER 1981-09-08 00:00:00 1500 1500 MARTIN 1981-09-28 00:00:00 1250 2750 KING 1981-11-17 00:00:00 5000 6250 JAMES 1981-12-03 00:00:00 950 8950 FORD 1981-12-03 00:00:00 3000 8950 MILLER 1982-01-23 00:00:00 1300 5250 SCOTT 1987-04-19 00:00:00 3000 3000 ADAMS 1987-05-23 00:00:00 1100 4100 distinct限制 不能有order by
  • 25. 行比较分析函数 行比较分析函数有LEAD/LAG。无window子句,分别得到当前行物理偏移n行的值,默认偏移1,比较特殊,这里没有逻辑窗口的概念。 SELECT empno,hiredate, lead(hiredate,1) over(ORDER BY hiredate) lead_hiredate, lag(hiredate,1) over(ORDER BY hiredate) lag_hiredate FROM scott.emp EMPNO HIREDATE LEAD_HIREDA LAG_HIREDA ------ ----------- ----------- ----------- 7369 1980-12-17 1981-02-20 7499 1981-02-20 1981-02-22 1980-12-17 7521 1981-02-22 1981-04-02 1981-02-20 7566 1981-04-02 1981-05-01 1981-02-22 7698 1981-05-01 1981-06-09 1981-04-02 7782 1981-06-09 1981-09-08 1981-05-01 7844 1981-09-08 1981-09-28 1981-06-09 7654 1981-09-28 1981-11-17 1981-09-08 7839 1981-11-17 1981-12-03 1981-09-28 7900 1981-12-03 1981-12-03 1981-11-17 7902 1981-12-03 1982-01-23 1981-12-03 7934 1982-01-23 1987-04-19 1981-12-03 7788 1987-04-19 1987-05-23 1982-01-23 7876 1987-05-23 1987-04-19
  • 26. 统计分析函数 统计分析函数最常用的是ratio_to_report。可以有window子句。 SELECT department_id ,sum(salary) dept_sum, SUM(SUM(salary)) over() all_sum, round(SUM(salary)/(SUM(SUM(salary)) over()),2)*100||'%' ratio FROM hr.employees GROUP BY department_id ORDER BY 1SELECT department_id ,sum(salary) dept_sum, SUM(SUM(salary)) over() all_sum, round(ratio_to_report(SUM(salary)) over(),2)*100||'%' ratio FROM hr.employees GROUP BY department_id ORDER BY 1 DEPARTMENT_ID DEPT_SUM ALL_SUM RATIO ------------- ---------- ---------- ------- 10 4400 691416 1% 20 19000 691416 3% 30 24900 691416 4% 40 6500 691416 1% 50 156400 691416 23% 60 28800 691416 4% 70 10000 691416 1% 80 304500 691416 44% 90 58000 691416 8% 100 51608 691416 7% 110 20308 691416 3% 7000 691416 1%
  • 27. 27行连接分析函数 11g引入了一个新的函数LISTAGG,它可以作为组函数和分析函数,主要作用就是按照指定分组或者指定窗口范围,并且按照order by的行顺序,将列按照指定连接符连成一串。 SELECT deptno,listagg(ename,'||') within GROUP(ORDER BY empno) list_ename FROM scott.emp GROUP BY deptno; DEPTNO LIST_ENAME ------ ---------------------------------------- 10 CLARK||KING||MILLER 20 SMITH||JONES||FORD 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMESSELECT deptno,listagg(ename,'||') within GROUP(ORDER BY empno) over(PARTITION BY deptno) list_ename FROM scott.emp; DEPTNO LIST_ENAME ------- ------------------------------------------ 10 CLARK||KING||MILLER 10 CLARK||KING||MILLER 10 CLARK||KING||MILLER 20 SMITH||JONES||FORD 20 SMITH||JONES||FORD 20 SMITH||JONES||FORD 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMES 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMES 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMES 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMES 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMES 30 ALLEN||WARD||MARTIN||BLAKE||TURNER||JAMES
  • 28. 28分析函数专题案例专题案例说明连续数问题用分析函数解决经典连续数问题填充缺失行问题使用LAST_VALUE分析函数解决填充行问题模拟SQL*PLUS BREAK使用LAG模拟BREAK COLUMN重复行问题用分析函数解决重复行问题行列转换问题使用排名分析函数/LEAD解决行列转换金额摊派问题使用ROW_NUMBER分析函数解决经典金额均分问题自定义分析函数自定义一个连乘分析函数
  • 29. 连续数问题select id,num,val from test_tab; ID NUM VAL ---------- ---------- ---------- 1 1 50 1 2 100 1 3 150 1 5 250 2 1 100 2 3 400 3 1 100 3 2 200 SELECT ID,MIN(num),SUM(val) FROM ( SELECT ID,num,val, num-row_number() over(PARTITION BY ID ORDER BY num) rn FROM test_tab ) GROUP BY ID,rn ORDER BY 1,2 要求对ID相同,num连续的,查找最小num以及val求和。 ID MIN(NUM) SUM(VAL) ---- ---------- ---------- 1 1 300 1 5 250 2 1 100 2 3 400 3 1 300
  • 30. 填充缺失行问题 将num的值按id相同的,按月升序,如果当前行为空,将前面最近非空的num填充到当前行,否则找最近的后面行。 SELECT ID,mm,num, nvl(last_value(num IGNORE NULLS) over(PARTITION BY ID ORDER BY mm) , last_value(num IGNORE NULLS) over(PARTITION BY ID ORDER BY mm DESC) ) new_num FROM demo5 ORDER BY ID,mm ID MM NUM NEW_NUM -- ---------- ---------- ---------- 1 201001 3 3 1 201002 2 2 1 201003 2 1 201004 2 1 201005 1 1 2 201001 2 2 201002 2 2 2 201003 3 3 2 201004 3详细参考:http://blog.chinaunix.net/uid-7655508-id-3736949.html
  • 31. 模拟SQL*PLUS BREAK 问题 SQL*PLUS提供BREAK命令,就是当前列值与前面相同,则置NULL,这是报表常用的一种手段。 SQL> break ON department_id SQL> SELECT department_id,first_name 2 FROM hr.employees 3 WHERE department_id<40 4 ORDER BY 1,2; DEPARTMENT_ID FIRST_NAME ------------- -------------------- 10 Jennifer 20 Michael Pat 30 Alexander Den Guy Karen Shelli Sigal SELECT decode(lag(department_id,1) over(PARTITION BY department_id ORDER BY first_name), department_id,NULL,department_id ) newdepartment_id, first_name FROM hr.employees WHERE department_id<40 ORDER BY department_id,first_name
  • 32. 重复行问题 DELETE FROM duprows a WHERE a.ROWID <> (SELECT MIN(b.ROWID) FROM duprows b WHERE a.ext=b.ext) DELETE FROM duprows a WHERE a.ROWID IN (SELECT ROWID FROM (SELECT row_number() over(PARTITION BY b.ext ORDER BY b.ROWID) rn FROM duprows b ) c WHERE c.rn>1 )----------------------------------------------------- DELETE STATEMENT | | 2 DELETE | DUPROWS | HASH JOIN | | 2 VIEW | VW_SQ_1 | 3 SORT GROUP BY | | 3 TABLE ACCESS FULL| DUPROWS | 3 TABLE ACCESS FULL | DUPROWS | 3------------------------------------------------ DELETE STATEMENT | | 1 DELETE | DUPROWS | NESTED LOOPS | | 1 VIEW | VW_NSO_1 | 3 SORT UNIQUE | | 1 VIEW | | 3 WINDOW SORT | | 3 TABLE ACCESS FULL | DUPROWS | 3 TABLE ACCESS BY USER ROWID| DUPROWS | 1 ------------------------------------------------
  • 33. 33行列转换问题1 SQL> select * from tab; A B ----- ----- A 1 A 2 A 3 A 4 A 5 A 6 A 7SQL> set null null SQL> SELECT a,b,LEAD(b) OVER(PARTITION BY a ORDER BY b) c, 2 LEAD(b,2) OVER(PARTITION BY a ORDER BY b) d, 3 LEAD(b,3) OVER(PARTITION BY a ORDER BY b) e, 4 LEAD(b,4) OVER(PARTITION BY a ORDER BY b) f 5 FROM tab; A B C D E F ----- ----- ----- ----- ----- ----- A 1 2 3 4 5 A 2 3 4 5 6 A 3 4 5 6 7 A 4 5 6 7 null A 5 6 7 null null A 6 7 null null null A 7 null null null null
  • 34. 行列转换问题2 SQL> SELECT INDEX_NAME, COLUMN_NAME 2 FROM user_ind_columns 3 WHERE INDEX_NAME LIKE '%PK' 4 AND rownum < 10; INDEX_NAME COLUMN_NAME ------------------------------ -------------------------------------------------------------------------------- ALL_ORDERS_PK YEAR ALL_ORDERS_PK MONTH ALL_ORDERS_PK CUST_NBR ALL_ORDERS_PK REGION_ID ALL_ORDERS_PK SALESPERSON_ID ASSEMBLY_PK ASSEMBLY_TYPE ASSEMBLY_PK ASSEMBLY_ID A_ID_PK ID CUSTOMER_PK CUST_NBRselect INDEX_NAME, max(decode(rn, 1, COLUMN_NAME)) c1, max(decode(rn, 2, COLUMN_NAME)) c2, max(decode(rn, 3, COLUMN_NAME)) c3, max(decode(rn, 4, COLUMN_NAME)) c4, max(decode(rn, 4, COLUMN_NAME)) c5 from (select INDEX_NAME, TABLE_NAME, COLUMN_NAME, row_number() over(partition by INDEX_NAME order by COLUMN_NAME) rn from user_ind_columns where INDEX_NAME like '%PK' and rownum<10) t1 group by INDEX_NAMEINDEX_NAME C1 C2 C3 C4 C5 ------------------------------ --------------- --------------- ------------- ALL_ORDERS_PK CUST_NBR MONTH REGION_ID SALESPERSON_ID SALESPERSON_ID ASSEMBLY_PK ASSEMBLY_ID ASSEMBLY_TYPE A_ID_PK ID CUSTOMER_PK CUST_NBR
  • 35. 金额摊派问题SQL> select * from demo7_1; ID AMOUNT ---------- ---------- 1 100 2 50 已用时间: 00: 00: 00.01 SQL> select * from demo7_2; ID PERSONS ---------- ---------- 1 3 2 2 平均分派问题,如何将金额平均分摊,并且小数也分摊掉,避免误差。 SELECT ID, persons,(CASE WHEN rn <= (amount - amount2) * 100 THEN 0.01 ELSE 0 END) + je AS je,amount --然后排序,与总金额有差额的补0.01 FROM (SELECT t.*, SUM(je) OVER(PARTITION BY id) AS amount2, ROW_NUMBER() OVER(PARTITION BY id ORDER BY je DESC) rn FROM ( --先展开记录数,用trunc先平均,只舍不入 SELECT tt.* FROM (SELECT t2.id, t2.persons, TRUNC(t1.amount /t2.persons, 2) je, t1.amount amount FROM demo7_1 t1, demo7_2 t2 WHERE t1.id = t2.id ) tt, --构造最大的人数序列 (SELECT LEVEL rn FROM dual CONNECT BY LEVEL <= (SELECT MAX(persons) max_num FROM demo7_2) ) tm WHERE tt.persons >= tm.rn) t ) ID PERSONS JE AMOUNT --- ---------- ---------- ---------- 1 3 33.34 100 1 3 33.33 100 1 3 33.33 100 2 2 25 50 2 2 25 50
  • 36. 36自定义分析函数 自定义分析函数就是需要实现ORACLE规定的对象的接口方法,然后自定义函数使用这个对象即可(红色为必须实现的,其他可选)。返回值: ODCIConst.Success成功, ODCIConst.Error失败。 接口说明ODCIAggregateDelete()从当前组中删除输入值ODCIAggregateInitialize()初始化对象的aggregation context和实例,并将此context作为OUT参数返回ODCIAggregateIterate()核心逻辑,通过外部输入的行,迭代处理对应的值,并返回给context,忽略传入的值是NULL的ODCIAggregateMerge()合并两个context给单个对象实例ODCIAggregateTerminate()聚合计算,并做必要的清理工作,比如释放内存ODCIAggregateWrapContext()Integrates all external pieces of the current aggregation context to make the context self-contained
  • 37. 37自定义分析函数--1.定义对象 CREATE OR REPLACE TYPE TYPE_MULTI_AGG AS OBJECT ( results NUMBER, STATIC FUNCTION ODCIAGGREGATEINITIALIZE(sctx IN OUT TYPE_MULTI_AGG) RETURN NUMBER, MEMBER FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT TYPE_MULTI_AGG, VALUE IN NUMBER) RETURN NUMBER, MEMBER FUNCTION ODCIAGGREGATETERMINATE(SELF IN TYPE_MULTI_AGG, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER, MEMBER FUNCTION ODCIAGGREGATEMERGE(SELF IN OUT TYPE_MULTI_AGG, ctx2 IN TYPE_MULTI_AGG) RETURN NUMBER ); / --2.实现对象类型方法 CREATE OR REPLACE TYPE BODY TYPE_MULTI_AGG IS STATIC FUNCTION ODCIAGGREGATEINITIALIZE(sctx IN OUT TYPE_MULTI_AGG) RETURN NUMBER IS BEGIN --初始化返回值为1 sctx :=TYPE_MULTI_AGG(1); RETURN ODCICONST.SUCCESS; END; MEMBER FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT TYPE_MULTI_AGG, VALUE IN NUMBER) RETURN NUMBER IS BEGIN --核心逻辑 SELF.results :=SELF.results * VALUE; RETURN ODCICONST.SUCCESS; END; MEMBER FUNCTION ODCIAGGREGATETERMINATE(SELF IN TYPE_MULTI_AGG, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS BEGIN --返回最终值 returnValue := SELF.results; RETURN ODCICONST.SUCCESS; END; MEMBER FUNCTION ODCIAGGREGATEMERGE(SELF IN OUT TYPE_MULTI_AGG, ctx2 IN TYPE_MULTI_AGG) RETURN NUMBER IS BEGIN NULL; RETURN ODCICONST.SUCCESS; END; END; /--3.定义函数 CREATE OR REPLACE FUNCTION F_MULTI_AGG(val NUMBER) RETURN NUMBER AGGREGATE USING TYPE_MULTI_AGG; / SQL> SELECT ID,val,F_MULTI_AGG(val) over(PARTITION BY ID) multi_value 2 FROM agg_test; ID VAL MULTI_VALUE ---------- ---------- ----------- 1 10 200 1 20 200 2 10 3000 2 15 3000 2 20 3000 3 10 10 4 0 0 4 1000 0
  • 38. 分析函数注意点 1.搞清楚partition、order by、window之间的关系,特别是range,rows的区别。 2.搞清楚每种分析函数的功能和限制等。 3.搞清楚每种分析函数的使用场合、缺点等。
  • 39. 如何学习分析函数 分析函数是对标准SQL的有力补充,专门用于计算复杂的累积计算、移动平均、行间计算、统计报表等。 ORACLE不仅提供了官方分析函数,而且还提供了自定义聚集函数的功能,比如wmsys.wm_concat函数就是一个自定义聚集函数,详细可以参考ORACLE cartridge developer guide文档中的user-defined aggregate function部分。 学习分析函数以及ORACLE其他内容,要详细研究文档,特别注意文档中的注意点,自己总结和梳理属于自己的文档,在解决实际问题过程中自然就有意识想起自己学的知识。
  • 40. 谢 谢