• 1. 阿里巴巴分布式数据库 ——原理、实现和应用2012.7集团共享技术平台 分布式数据库 邱硕
  • 2. 分布式数据库中间件AppAppAppAppCobarOracleMySQLMySQLMySQLErosa OracleErosa MySQLErosa MySQLErosa MySQLEromangaOtterOracleMySQLMySQLOtterErosa OracleErosa MySQLErosa MySQLEromanga性能 容量 高可用 数据消费时效性 跨机房数据同步ASCDW...
  • 3. 分布式数据库中间件AppAppAppAppCobarOracleMySQLMySQLMySQLErosa OracleErosa MySQLErosa MySQLErosa MySQLEromangaOracleMySQLMySQLOtter性能 容量 高可用 数据消费时效性 跨机房数据同步Erosa OracleErosa MySQLErosa MySQLEromangaOtterASCDW...
  • 4. 分布式数据库中间件AppAppAppAppCobarOracleMySQLMySQLMySQLOracleMySQLMySQLOtterErosa OracleErosa MySQLErosa MySQLEromanga性能 容量 高可用 数据消费时效性 跨机房数据同步Erosa OracleErosa MySQLErosa MySQLErosa MySQLEromangaOtterASCDW...
  • 5. 分布式数据库中间件AppAppAppAppCobarOracleMySQLMySQLMySQLErosa OracleErosa MySQLErosa MySQLErosa MySQLEromangaOracleMySQLMySQLErosa OracleErosa MySQLErosa MySQLEromanga性能 容量 高可用 数据消费时效性 跨机房数据同步OtterASCDW...Otter
  • 6. 分布式数据库中间件AppAppAppAppCobarOracleMySQLMySQLMySQLErosa OracleErosa MySQLErosa MySQLErosa MySQLEromangaOracleMySQLMySQLOtter性能 容量 高可用 数据消费时效性 跨机房数据同步Erosa OracleErosa MySQLErosa MySQLEromangaOtterASCDW...
  • 7. 大纲中间件引入 Cobar策略 系统实现 实施应用
  • 8. Cobar之前Oracle单点数据库 性能问题 中文站offer总数:2008年1亿 -> 2011年3亿 高峰时:load 30、cpu使用率90% 数据库连接过多 可用性问题 Standby切换故障 成本和伸缩性问题 依赖高成本的硬件设备
  • 9. OracleMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQL单点:MySQL集群替换OracleMySQLMySQLMySQL
  • 10. Oracle单点数据库 性能问题 中文站offer总数:08年1亿 -> 今天3亿 高峰时:load 30、cpu使用率90% 数据库连接过多 可用性问题 Standby切换故障 成本和伸缩性问题 依赖高成本的硬件设备 Cobar引入 水平拆分IDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQLIDMEMBE_IDINFO4test1234…5test1234…IDMEMBE_IDINFO3abcd…9abcd…20abcd…MySQLMySQLCobarApp
  • 11. Oracle单点数据库 性能问题 中文站offer总数:08年1亿 -> 今天3亿 高峰时:load 30、cpu使用率90% 数据库连接过多 可用性问题 Standby切换故障 成本和伸缩性问题 依赖高成本的硬件设备 AppIDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQLIDMEMBE_IDINFO4test1234…5test1234…IDMEMBE_IDINFO3abcd…9abcd…20abcd…MySQLMySQLAppAppAppAppAppAppApp
  • 12. Cobar引入 连接复用AppIDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQLIDMEMBE_IDINFO4test1234…5test1234…IDMEMBE_IDINFO3abcd…9abcd…20abcd…MySQLMySQLCobar ProxyAppAppAppAppAppAppApp
  • 13. Oracle单点数据库 性能问题 中文站offer总数:08年1亿 -> 今天3亿 高峰时:load 30、cpu使用率90% 数据库连接过多 可用性问题 Standby切换故障 成本和伸缩性问题 依赖高成本的硬件设备 Cobar引入IDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQLCobarAppX
  • 14. Cobar引入 failoverIDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQL Master1CobarAppIDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQL Master2XMySQL Replication
  • 15. Cobar引入 failoverIDMEMBE_IDINFO1pavarotti17…11pavarotti17…CobarAppIDMEMBE_IDINFO1pavarotti17…11pavarotti17…MySQL ReplicationMySQL Master1MySQL Master2
  • 16. 大纲中间件引入 Cobar策略 系统实现 实施应用
  • 17. OracleMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQLMySQL拆分数据表
  • 18. 水平拆分IDMEMBE_IDINFO1pavarotti17…3abcd…4test1234…5test1234…9abcd…11pavarotti17…20abcd…
  • 19. 水平拆分IDMEMBE_IDINFO1pavarotti17…3abcd…4test1234…5test1234…9abcd…11pavarotti17…20abcd…拆分字段
  • 20. 水平拆分IDMEMBE_IDINFO1pavarotti17…3abcd…4test1234…5test1234…9abcd…11pavarotti17…20abcd…IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…f(pavarotti17)=库1f(test1234)=库1f(test1234)=库1f(pavarotti17)=库1f(abcd)=库2f(abcd)=库2f(abcd)=库2库1库2拆分字段
  • 21. 水平拆分IDMEMBE_IDINFO1pavarotti17…3abcd…4test1234…5test1234…9abcd…11pavarotti17…20abcd…IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…f(abcd)=库2库1库2拆分字段f(pavarotti17)=库1f(test1234)=库1f(test1234)=库1f(pavarotti17)=库1f(abcd)=库2f(abcd)=库2路由算法
  • 22. 路由算法pavarotti17f(pavarotti17)=库1
  • 23. 路由算法pavarotti17部分截取hash( ) = 3170972965401
  • 24. 路由算法hash(pavarott) = 3170972965401 % 1024 = 5370 1023
  • 25. 路由算法hash(pavarott) = 3170972965401 % 1024 = 5370 255256 511512 767768 1023256256256256分库1分库2分库3分库4
  • 26. 路由算法hash(pavarott) = 3170972965401 % 1024 = 5370 255256 511512 767768 1023256256256256分库1分库2分库3分库4
  • 27. 路由算法——扩容hash(pavarott) = 3170972965401 % 1024 = 5370 127 128 255256 383 384 511512 639 640 767768 895 896 1023128128128128128128128128分库1分库2分库3分库4
  • 28. 路由算法——扩容hash(pavarott) = 3170972965401 % 1024 = 5370 127 128 255256 383 384 511512 639 640 767768 895 896 1023128128128128128128128128分库1分库2分库3分库4分库5分库6分库7分库8 原分库1分库2分库3分库4 原 原 原
  • 29. 路由算法——非均匀分布hash(pavarott) = 3170972965401 % 1024 = 5370 511512 767768 895 896 1023512256128128分库1分库2分库3分库4
  • 30. 拆分表的数据访问——SQL转发IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…CobarAppselect * from tb1 where member_id=‘test1234’
  • 31. 拆分表的数据访问——SQL转发IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…CobarAppselect * from tb1 where member_id=‘test1234’
  • 32. IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…CobarAppSELECT * FROM tb1 WHERE member_id IN (‘test1234’,’pavarotti17’,’abcd’)拆分表的数据访问——SQL转发
  • 33. IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…CobarAppselect * from tb1 where member_id in (‘test1234’,’pavarotti17’)select * from tb1 where member_id in (‘abcd’)拆分表的数据访问——SQL转发
  • 34. IDMEMBE_IDINFO1pavarotti17…4test1234…5test1234…11pavarotti17…IDMEMBE_IDINFO3abcd…9abcd…20abcd…Result Merger前台 通信ResultSet:row1 row2ResultSet:row3 row4 row5ResultSet:row3 row1 row4 row5 row2拆分表的数据访问——结果返回
  • 35. 多维水平拆分visit表productuserinfoCoca-ColaA…pepsiC…FantaD…Coca-ColaA…Coca-ColaC…FantaB…7UpD…pepsiA…productuserinfoCoca-ColaA…Coca-ColaA…Coca-ColaC…productuserinfopepsiC…pepsiA…productuserinfoFantaD…FantaB…productuserinfo7UpD…SELECT * FROM visit WHERE user=‘A’
  • 36. 多维水平拆分visit表productuserinfoCoca-ColaA…pepsiC…FantaD…Coca-ColaA…Coca-ColaC…FantaB…7UpD…pepsiA…productuserinfoCoca-ColaA…Coca-ColaA…pepsiA…productuserinfopepsiC…Coca-ColaC…productuserinfoFantaD…7UpD…productuserinfoFantaB…SELECT * FROM visit WHERE product = ‘Coca-Cola’
  • 37. 分库1分库2分库3分库4分库5分库6分库7分库8分库9分库10分库11分库12分库13分库14分库15分库16product值 Hash取模user值 Hash取模01230123visit表USERPRODUCT一张表的多个字段同时作为拆分字段
  • 38. Hash(“A”)%4 = 分库1分库2分库3分库4分库5分库6分库7分库8分库9分库10分库11分库12分库13分库14分库15分库16product值 Hash取模user值 Hash取模01230123CocaCola AHash(“CocaCola”)%4 = SELECT * FROM visit WHERE product=‘ColaCola’ AND user=‘A’
  • 39. 分库1分库2分库3分库4分库5分库6分库7分库8分库9分库10分库11分库12分库13分库14分库15分库16product值 Hash取模user值 Hash取模01230123CocaColaHash(“CocaCola”)%4 = SELECT * FROM visit WHERE product=‘ColaCola’
  • 40. Hash(“A”)%4 = A分库1分库2分库3分库4分库5分库6分库7分库8分库9分库10分库11分库12分库13分库14分库15分库16product值 Hash取模user值 Hash取模01230123SELECT * FROM visit WHERE product=‘ColaCola’ AND user=‘A’
  • 41. Cobar的策略MySQL集群替代Oracle单点 基于表的水平拆分和分布 根据字段值的一致性Hash分布 多维拆分 数据查询方式 根据where中的拆分字段分发 SQL语句其他元素的处理 将Cobar收到的SQL语句做变换 分发到各个分库执行 对执行结果合并、处理 保证返回前端的内容满足语义
  • 42. JOIN有限的处理跨库JOIN问题 SELECT * FROM tb1 INNER JOIN tb2 ON t1.MEMBER_ID=t2.NAMEIDMEMBE_ID2zzzz4xyzxyzIDNAME4efghijk5aaaa6abcdIDMEMBE_ID1efghijk3xxxxxx5abcdIDNAME2zzzz3xyzxyztb1tb1tb2tb2IDMEMBER_IDIDNAME1efghijk4efghijk5abcd6abcd2zzzz2zzzz4xyzxyz3xyzxyz
  • 43. 迭代查询SELECT * FROM tb1 INNER JOIN tb2 ON t1.MEMBER_ID=t2.NAMEIDMEMBE_ID2zzzz4xyzxyzIDNAME4efghijk5aaaa6abcdIDMEMBE_ID1efghijk3xxxxxx5abcdIDNAME2zzzz3xyzxyztb1tb1tb2tb2FOR row1 IN select * FROM tb1{ ADD( SELECT * FROM tb2 WHERE tb2.name = row1.member_id )TO RESULT }
  • 44. 跨库索引IDMEMBE_ID2zzzz4xyzxyzIDNAME4efghijk5aaaa6abcdIDMEMBE_ID1efghijk3xxxxxx5abcdIDNAME2zzzz3xyzxyztb1tb1tb2tb2ID1ID2JOIN_COL22zzzz43xyzxyzidxID1ID2JOIN_COL14efghijk56abcdidx扫描idx,再根据每一行的id1,id2查到最终结果
  • 45. 跨库索引IDMEMBE_ID2zzzz4xyzxyzIDNAME4efghijk5aaaa6abcdIDMEMBE_ID1efghijk3xxxxxx5abcdIDNAME2zzzz3xyzxyztb1tb1tb2tb2SELECT * FROM tb1 INNER JOIN tb2 ON t1.MEMBER_ID=t2.NAME WHERE t1.id = 5SELECT * FROM idx WHERE id1 = 5再根据id1,id2查到最终结果
  • 46. 跨库索引IDMEMBE_ID2zzzz4xyzxyzIDNAME4efghijk5aaaa6abcdIDMEMBE_ID1efghijk3xxxxxx5abcdIDNAME2zzzz3xyzxyztb1tb1tb2tb2ID1ID2JOIN_COL22zzzz43xyzxyzidxID1ID2JOIN_COL14efghijk56abcdidx一定以 JOIN_COL 为索引的拆分字段吗?
  • 47. 跨库索引SELECT * FROM tb1 INNER JOIN tb2 ON t1.MEMBER_ID=t2.NAME WHERE t1.gmt>600IDMEMBE_IDGMT2zzzz5254xyzxyz1010IDNAME4efghijk5aaaa6abcdIDMEMBE_IDGMT1efghijk12053xxxxxx1315abcd604IDNAME2zzzz3xyzxyztb1tb1tb2tb2
  • 48. 跨库索引IDMEMBE_IDGMT2zzzz5254xyzxyz1010IDNAME4efghijk5aaaa6abcdIDMEMBE_IDGMT1efghijk12053xxxxxx1315abcd604IDNAME2zzzz3xyzxyztb1tb1tb2tb2ID1ID2JOIN_COL22zzzz43xyzxyzidxID1ID2JOIN_COL14efghijk56abcdidx
  • 49. IDMEMBE_ID2zzzz4xyzxyzIDNAME4efghijk5aaaa6abcdIDMEMBE_ID1efghijk3xxxxxx5abcdIDNAME2zzzz3xyzxyztb1tb1tb2tb2ID1ID2JOIN_COL22zzzz43xyzxyzidxID1ID2JOIN_COL14efghijk56abcdidxSELECT idx.id2, tb1.* FROM idx INNER JOIN tb1 ON idx.id1=tb1.id WHERE t1.gmt>600SELECT idx.id2, tb1.* FROM idx INNER JOIN tb1 ON idx.id1=tb1.id WHERE t1.gmt>600SELECT * FROM tb1 INNER JOIN tb2 ON t1.MEMBER_ID=t2.NAME WHERE t1.gmt>600跨库索引
  • 50. 跨库索引IDMEMBE_IDGMT2zzzz5254xyzxyz1010IDNAMETIME4efghijk1235aaaa9226abcd222IDMEMBE_IDGMT1efghijk12053xxxxxx1315abcd604IDNAMETIME2zzzz12013xyzxyz1111tb1tb1tb2tb2SELECT * FROM tb1 INNER JOIN tb2 ON t1.MEMBER_ID=t2.NAME WHERE t1.gmt>600 AND t2.time>600ID1ID2JOIN_COLTIME22zzzz120143xyzxyz1111idx
  • 51. 跨库索引ID1ID2JOIN_COLTIME22zzzz120143xyzxyz1111idxID1ID2JOIN_COLTIME14efghijk12356abcd222idx索引表的拆分 WHERE条件中的字段所在表的拆分字段,作为索引拆分字段 索引包含 两张表的主键 JOIN字段 WHERE中的其他字段 索引的更新 分布式事务的支持
  • 52. SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2select ... order by c1 limit 0, 6select ... order by c1 limit 0, 6select ... order by c1 limit 0, 6分库1分库2分库3cobarOrder By/Limit
  • 53. 分库1分库2分库3Order By/Limit456327810651111314973返回结果返回结果返回结果8SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 54. 分库1分库2分库3Order By/Limit456327810651111314973返回结果返回结果返回结果80最终结果集SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 55. 分库1分库2分库3Order By/Limit45632781065111314973返回结果返回结果返回结果81最终结果集SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 56. 分库1分库2分库3Order By/Limit4563781065111314973返回结果返回结果返回结果82最终结果集SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 57. 分库1分库2分库3Order By/Limit456781065111314973返回结果返回结果返回结果83最终结果集SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 58. 分库1分库2分库3Order By/Limit45678106511131497返回结果返回结果返回结果84最终结果集SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 59. 最终结果集分库1分库2分库3Order By/Limit5678106511131497返回结果返回结果返回结果844SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 60. 最终结果集分库1分库2分库3Order By/Limit4678106511131497返回结果返回结果返回结果845SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2
  • 61. Order By/Limit 方案总结select c1 from tb1 order by c1 limit 100000000, 2所有分库都要查询100000002条数据 Cobar需要遍历100000002条数据对如下SQL一次交互得到结果 Offset大小有限制
  • 62. Order By / Limit 优化目标:解决 查询量大问题 遍历量大问题 前提 各个分库数据分布大致一样
  • 63. select c1 from tb1 order by c1 limit 9999999, 4select ... order by c1 limit 33333333, 4select ... order by c1 limit 33333333, 4select ... order by c1 limit 33333333, 4分库1分库2分库3step1:分成3条语句发给分库
  • 64. 分库1分库2分库374538691076119返回结果返回结果返回结果select c1 from tb1 order by c1 limit 9999999, 4找出查询结果中最小和最大值
  • 65. 分库1分库2分库374538691076119返回结果返回结果返回结果step2:以最小值和最大值为界再查询select c1 from tb1 order by c1 limit 9999999, 43115311
  • 66. 分库1分库2分库374538691076119返回结果返回结果返回结果step3:反查出每一个返回结果的offsetselect c1 from tb1 order by c1 limit 9999999, 4311531133333331条33333333条33333332条
  • 67. 分库1分库2分库374538691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 431153119999996
  • 68. 分库1分库2分库374538691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 41153119999997
  • 69. 分库1分库2分库37458691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 41153119999998
  • 70. 分库1分库2分库37458691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 4115119999999最终结果集
  • 71. 最终结果集分库1分库2分库37458691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 4115119999999
  • 72. 最终结果集分库1分库2分库37458691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 4115119999999
  • 73. 最终结果集分库1分库2分库37458691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 4115119999999
  • 74. 最终结果集分库1分库2分库37458691076119返回结果返回结果返回结果类似于原始方案select c1 from tb1 order by c1 limit 9999999, 4115119999999
  • 75. Order By / Limit 再优化Step1不必得到全部结果 select min(c1) mi, max(c1) ma from (select c1 from tb1 order by c1 limit 3333333,4) t Step2和Step3合并 select * from (select * from tb1 where c1 between mi and ma) t1, (select count(*) from tb1 where c1
  • 76. Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956
  • 77. Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956SELECT sum(price), c1 FROM tb1 GROUP BY c1 ORDER BY c112.3 22227.9 1318.8 60499.9 567.6 1314.4 604
  • 78. Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956SELECT sum(price), c1 FROM tb1 GROUP BY c1 ORDER BY c112.3 22227.9 1318.8 6047.6 1314.4 604最终结果集 99.9 56
  • 79. Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956SELECT sum(price), c1 FROM tb1 GROUP BY c1 ORDER BY c112.3 22227.9 1318.8 6047.6 1314.4 604最终结果集 99.9 56
  • 80. 最终结果集 Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956SELECT sum(price), c1 FROM tb1 GROUP BY c1 ORDER BY c112.3 222215.5 1318.8 6044.4 60499.9 56
  • 81. 最终结果集 Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956SELECT sum(price), c1 FROM tb1 GROUP BY c1 ORDER BY c112.3 222215.5 13113.2 60499.9 56
  • 82. 最终结果集 Group BySELECT sum(price) FROM tb1 GROUP BY c1IDPRICEC1112.3222231.613158.860476.3131IDPRICEC124.460447.6131699.956SELECT sum(price), c1 FROM tb1 GROUP BY c1 ORDER BY c112.3 222215.5 13113.2 60499.9 56
  • 83. SQL执行策略总结WHERE - 基于SQL转发 JOIN - 迭代 分布式索引 ORDER BY/LIMIT - 多次查询减小数据量 GROUP BY - 增加ORDER BY
  • 84. Cobar 事务支持前端连接sql1sql2commit
  • 85. Cobar 1.2 事务支持前端连接分库1连接sql1sql2commit
  • 86. Cobar 1.2 事务支持前端连接分库1连接sql1commit分库2连接sql2分库3连接sql2
  • 87. Cobar 事务支持前端连接分库1连接sql1commit分库2连接sql2分库3连接sql2Commit有先后:隔离性问题 Commit有失败:一致性问题
  • 88. 大纲中间件引入 Cobar策略 水平拆分的数据分布 几种SQL元素的执行策略 事务策略 系统实现 实施应用
  • 89. schematableSpacedataNodedata sourcecndbpc2主备主备defaultoffer[0]主备offer[1]主备detail[0]主备defaultdefaultofferdetaildetail逻辑层次 —— 接口同MySQLjdbc:mysql://cobarIp:8066/cndb?user=foo&password=bar
  • 90. MySQLMySQLMySQLCobar结构Front-end CommunicationMySQL Protcol Adaptor (BIO)Application1MySQLMySQL ProtocolMySQLMySQLData NodesMonitor ConfigureHA PoolMySQLMySQLProcessor(1)Processor(n)SQL ExecutorSQL RouterSQL ParserResult MergerSQL ExecutorSQL RouterSQL ParserResult Merger... ManagerMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolManagement Protocol
  • 91. MySQLMySQLMySQLCobar结构Front-end CommunicationMySQL Protcol Adaptor (BIO)Application1MySQLMySQL ProtocolMySQLMySQLData NodesMonitor ConfigureHA PoolMySQLMySQLProcessor(1)Processor(n)SQL ExecutorSQL RouterSQL ParserResult MergerSQL ExecutorSQL RouterSQL ParserResult Merger... ManagerMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolManagement Protocol
  • 92. Application1JDBC和Server的通信协议MySQL JDBC DriverMySQL ServerMySQL Protocol
  • 93. JDBC和Server的通信协议PreparedStatement ps = conn.prepareStatement( "select * from tb1 where id=?"); ps.setLong(1, 12345); ResultSet rs = ps.executeQuery();MySQL ServerApplication1MySQL JDBC Driver
  • 94. JDBC和Server的通信协议MySQL ServerApplication1MySQL JDBC DriverPreparedStatement ps = conn.prepareStatement( "select * from tb1 where id=?"); ps.setLong(1, 12345); ResultSet rs = ps.executeQuery(); select * from tb1 where id=? stmt_id/param_num/columm_num parameter_type column_type
  • 95. JDBC和Server的通信协议PreparedStatement ps = conn.prepareStatement( "select * from tb1 where id=?"); ps.setLong(1, 12345); ResultSet rs = ps.executeQuery();MySQL ServerApplication1MySQL JDBC Driver
  • 96. JDBC和Server的通信协议PreparedStatement ps = conn.prepareStatement( "select * from tb1 where id=?"); ps.setLong(1, 12345); ResultSet rs = ps.executeQuery();MySQL ServerApplication1MySQL JDBC Driver stmt_id {param}+
    field_count column_type {column_val}+ {column_val}+
  • 97. JDBC和Server的通信协议PreparedStatement ps = conn.prepareStatement( "select * from tb1 where id=?"); ps.setLong(1, 12345); ResultSet rs = ps.executeQuery();Application1MySQL JDBC Driver stmt_id {param}+
    field_count column_type {column_val}+ {column_val}+MySQL Server
  • 98. JDBC和Server的通信协议Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery( “select * from tb1 where id=12345”);Application1MySQL JDBC Driver sql
    field_count column_type {column_val}+ {column_val}+MySQL Server
  • 99. JDBC和Server的通信协议Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery( “select * from tb1 where id=12345”);Application1MySQL JDBC Driver sql
    field_count column_type {column_val}+ {column_val}+MySQL ServerCobar Server
  • 100. Processor(1)Processor(n)MySQLMySQLMySQLCobar结构Front-end CommunicationMySQL Protcol Adaptor (BIO)Application1MySQLMySQL ProtocolMySQLMySQLData NodesMonitor ConfigureHA PoolMySQLMySQLSQL ExecutorSQL RouterSQL ParserResult MergerSQL ExecutorSQL RouterSQL ParserResult Merger... ManagerMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolManagement ProtocolSQL +ParametersResultSetMetaData ResultSet(Rows)
  • 101. Cobar 通信层统一管理NIO的Buffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer共16MBBuffer Pool
  • 102. Cobar 通信层统一管理NIO的Buffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer共16MBBuffer Pool select * from tb1 where id=?
  • 103. Cobar 通信层统一管理NIO的Buffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer共16MBBuffer Pool select * from tb1 where id=?
  • 104. Cobar 通信层统一管理NIO的Buffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer4KB ByteBuffer共16MBBuffer Pool15万TPS
  • 105. ... MySQLMySQLMySQLCobar结构Front-end CommunicationMySQL Protcol Adaptor (BIO)Application1MySQLMySQL ProtocolMySQLMySQLData NodesMonitor ConfigureHA PoolMySQLMySQLProcessor(1)Processor(n)SQL ExecutorSQL RouterResult MergerSQL ExecutorSQL RouterResult MergerManagerMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolManagement ProtocolSQL ParserSQL Parser
  • 106. SELECT id, member_id FROM wp_image WHERE member_id = ‘123’SQL Parserselectidmember_idwp_image=member_id‘123’exprListfrom
  • 107. 架构演变基于JavaCC生成SQL Parser 性能较差,优化不方便 仿照ANTLR生成的Parser结构手写 基于LL(*)的识别器,中间对象过多 基于LL(2)识别器的手写 第一版第二版第三版
  • 108. 架构调整LexertokentokentokentokenChar ReaderExpression ParserDML ParserSQL StringLexertokenExpression ParserDML ParserSQL char[]第三版LL(2)第二版LL(*)cache
  • 109. 功能对比第一版第二版第三版‘don\’t’NOYESYES.123e-2YESYESYES7ElevenNOYESYES_latin 0x123NOYESYES‘abc’ ‘def’NOYESYESINSERT('Quadratic', 3, 4, 'What')NOYESYESTRIM(LEADING 'x' FROM 'xxxbarxxx')NOYESYESUnion selectYESYESYEStable joinPARTPARTYES…………
  • 110. 基本元素语法元素MySQL 5.5 语法结构IdentifierKeywordLiteralPunctuation表达式Table ReferencesLimitOrderBy...DeleteUpdateInsertReplaceSelectCallSet/ShowSQL语句Variable
  • 111. 基本元素Keyword 227个关键字,不区分大小写 Identifier schema、table、column、index、alias 、keyword… SELECT aNy FROM t1 WHERE id IN AnY (SELECT id FROM t2) `select`、table1 ./*spam*/id、7Up Punctuation ,、:=、>、!、~ … Variable User-Def SystemVariable:@@version
  • 112. 基本元素 - LiteralString Literal N’abc’、_latin1’abc’、’abc’ ’def’ ”abc””d”、’abc\’d’ Number Literal 123、.123、123.、123.e4、.123E-4 123e4 vs 123e4f Hex/Bit Literal 0x89af、x’89af’、0b101011、b’101011’ _latin1 0x89af
  • 113. 基本元素语法元素MySQL 5.5 语法结构IdentifierKeywordLiteralPunctuation表达式Table ReferencesLimitOrderBy...DeleteUpdateInsertReplaceSelectCallSet/ShowSQL语句Variable
  • 114. 语法元素 – Table Referencestable_references: table_reference {',' table_reference} table_reference: table_factor { ['INNER'|'CROSS‘] 'JOIN' table_factor [join_condition] | 'STRAIGHT_JOIN' table_factor ['ON' conditional_expr] | ('LEFT'|'RIGHT') ['OUTER‘] 'JOIN' table_reference join_condition | 'NATURAL' [('LEFT'|'RIGHT') ['OUTER‘] ] 'JOIN' table_factor } table_factor: tbl_name [[’AS’] alias] [index_hint_list] | table_subquery [’AS’] alias | '(' table_references ')' join_condition: 'ON' conditional_expr | 'USING' '('column_list')' index_hint_list: index_hint {index_hint} index_hint: 'USE' ('INDEX'|'KEY') [ 'FOR' ('JOIN'|'ORDER' 'BY'|'GROUP' 'BY') ] '(' [index_list] ')' | 'IGNORE' ('INDEX'|'KEY') [ 'FOR' ('JOIN'|'ORDER' 'BY'|'GROUP' 'BY') ] '(' index_list ')' | 'FORCE' ('INDEX'|'KEY') [ 'FOR' ('JOIN'|'ORDER' 'BY'|'GROUP' 'BY') ] '(' index_list ')' index_list: index_name {',' index_name}
  • 115. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)规则1规则2
  • 116. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)table_factor规则1规则2
  • 117. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)table_factor规则1规则2table_referencetable_reference规则2table_factor
  • 118. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)table_factortable_factortable_factorjoin_condition规则1规则2table_referencetable_reference规则1规则2
  • 119. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)table_factortable_factortable_factorjoin_condition规则1规则2table_referencetable_referenceERROR!规则1规则2
  • 120. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)table_factortable_factortable_factortable_referencejoin_conditiontable_referencetable_factortable_factortable_factorjoin_condition规则1规则2table_referencetable_referenceERROR!规则1规则1规则2规则2
  • 121. MySQL不支持的MySQL语法table_reference: table_factor 'JOIN' table_factor [join_condition] table_reference: table_factor 'LEFT' 'JOIN' table_reference join_conditionLEFT JOINtb1t2JOINt3USING (id)table_factortable_factortable_factortable_referencejoin_conditiontable_referencetable_factortable_factortable_factorjoin_condition规则1规则2table_referencetable_referenceERROR!规则1规则1规则2规则2
  • 122. Table References的表达式特性table_references: table_reference {',' table_reference} table_reference : ’(’ table_references ')' | ’(’ query { ’UNION’ query} ’)’ [’AS’] alias query : ’(’ query ’)’ | ’SELECT ...’( ( select ...) UNOIN (select...) ) AS t1( ( t1, t2), t3 )LL(1)规则1规则2规则2规则1expr: ’(’ expr ’)’ | tableRefs | subquery类似于表达式:
  • 123. 基本元素语法元素MySQL 5.5 语法结构IdentifierKeywordLiteralPunctuation表达式Table ReferencesLimitOrderBy...DeleteUpdateInsertReplaceSelectCallSet/ShowSQL语句Variable
  • 124. 表达式优先级和结合型 MySQL Manual文档不精确 文档+实验 特殊函数 Keyword作为函数名:23个 INSERT('Quadratic', 3, 4, 'What') 非规则参数列表:13个 SELECT TRIM(BOTH 'x' FROM 'xxxbarx') 特殊Identifier 1 >= any + 2 1 >= any (select …)
  • 125. simple: "select id from t1“ short: " seLEcT id, member_id , image_path \t , image_size , STATUS, gmt_modified from wp_image wheRe \t\t\n id = ? AND member_id\t=\t-123.456" short2: "select count(*) from MESSAGE_REC_RECORD where RECEIVER_VACOUNT =? and RECEIVER_ID in ( ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? ) and RECEIVER_STATUS in ( ? , ? ) and SPAM_STATUS in ( ? ) and DELETE_STATUS = ?" long: "select ID, GMT_CREATE, GMT_MODIFIED, INBOX_FOLDER_ID, MESSAGE_ID, FEEDBACK_TYPE, TARGET_ID, TRADE_ID, SUBJECT, SENDER_ID, SENDER_TYPE, S_DISPLAY_NAME, SENDER_STATUS, RECEIVER_ID, RECEIVER_TYPE, R_DISPLAY_NAME, RECEIVER_STATUS, SPAM_STATUS, REPLY_STATUS, ATTACHMENT_STATUS, SENDER_COUNTRY, RECEIVER_COUNTRY,APP_FROM,APP_TO,APP_SOURCE,SENDER_VACOUNT,RECEIVER_VACOUNT, DISTRIBUTE_STATUS,ORG_RECEIVER_ID,CUSTOMER_ID,OPERATOR_ID,OPERATOR_NAME,FOLLOW_STATUS,DELETE_STATUS,FOLLOW_TIME,BATCH_COUNT from MESSAGE_REC_RECORD where RECEIVER_VACOUNT =? and ID = ?" long2: "select A.ID , A.GMT_CREATE , A.GMT_MODIFIED , A.MESSAGE_ID , A.SENDER_COMPANY_ID, A.SENDER_ID, A.SENDER_IP, A.SENDER_TYPE, A.SENDER_COMPANY , A.SENDER_ADDRESS , A.SENDER_COUNTRY , A.SENDER_ZIP , A.SENDER_PHONE , A.SENDER_FAX , A.SENDER_EMAIL , A.RECEIVER_COMPANY_ID, A.RECEIVER_ID, A.RECEIVER_TYPE, A.RECEIVER_COMPANY , A.RECEIVER_ADDRESS , A.RECEIVER_COUNTRY , A.RECEIVER_ZIP , A.RECEIVER_PHONE , A.RECEIVER_FAX , A.RECEIVER_EMAIL , A.TARGET_ID, A.FEEDBACK_TYPE, A.FEEDBACK_CATEGORY, A.SUBJECT, A.APP_SOURCE , A.ATTACHMENT_IDS , B.REMARK , A.S_DISPLAY_NAME , A.R_DISPLAY_NAME , A.RELATED_FEEDBACK_IDS , A.NOTICE_MESSAGE , B.PROPERTIES, A.APP_FROM,APP_TO,A.SENDER_VACOUNT,A.RECEIVER_VACOUNT from MESSAGE_DETAIL A,MESSAGE_DETAIL_REMARK_PROPERTIES B where A.ID = ? and A.ID = B.ID"
  • 126. 性能对比单位 微秒第一版第二版手写v2手写v2 sqlgenSimple25.61.760.9Short43.28.64.15.2Short263.417.69.8Long75.225.914.6Long294.435.422.3
  • 127. GC对比YGCYGCT第一版28380.803第二版4100.116第三版2290.057第三版+sql生成2630.074
  • 128. Processor(1)Processor(n)MySQLMySQLMySQLCobar结构Front-end CommunicationApplication1MySQLMySQL ProtocolMySQLMySQLMonitor ConfigureMySQLMySQLSQL ExecutorSQL RouterSQL ParserResult MergerSQL ExecutorSQL RouterSQL ParserResult Merger... ManagerMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolManagement ProtocolMySQL Protcol Adaptor (BIO)Data NodesHA Pool
  • 129. 后台数据访问逻辑层次HA PoolMySQL Protocol AdapterData NodeSMSMSMSMSMip:port/offerip:port/offerip:port/ibankip:port/ibankip:port/ibank数据库连接 基于协议数据包 与MySQL交互
  • 130. 基于MySQL协议 MySQL新Cobarselect * from offerselect * from offer
  • 131. 基于MySQL协议 MySQL新CobarHEADERFIELDEOFROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWROWEOFselect * from offerResult 2GBselect * from offer
  • 132. 后台数据访问逻辑层次SMHA PoolMySQL Protocol AdapterData Node主备连接池SMSMSMSMSM
  • 133. 后台数据访问逻辑层次SM心跳检测后端连接SMSMSMSMSMHA PoolMySQL Protocol AdapterData NodeX
  • 134. 后台数据访问逻辑层次SM心跳检测后端连接 主库失效自动切换 至备库 重置池中连接SMSMSMSMSMHA PoolMySQL Protocol AdapterData Node
  • 135. 分库2分库3分库4分库5分库6分库1后台数据访问逻辑层次SM水平拆分的分库SMSMSMSMSMHA PoolMySQL Protocol AdapterData Node
  • 136. 分库2分库3分库4分库5分库6分库1物理机灵活的层间对应关系SMSMSMSMSMSM物理机物理机物理机物理机物理机
  • 137. 大纲中间件引入 Cobar策略 系统实现 MySQL协议 通信 解析 后端连接 实施应用
  • 138. Cobar 的部署Cobar集群Cobar集群Cobar集群Cobar集群Cobar集群Cobar集群Cobar集群... 青岛杭州美国中文站国际站offer/ibank/snapshot ……message centerhermesproduct... ... Cobar ManagerCobar ManagerCobar Manager
  • 139. Cobar 的部署Cobar集群德胜机房 Cobar集群兴义机房 Cobar集群Cobar集群... 青岛杭州美国... ... Cobar ManagerCobar ManagerCobar Manager
  • 140. Cobar集群MySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolMySQL ProtocolHTTP
  • 141. 数据迁移场景 MySQL数据库节点扩容 拆分规则更改 目标 迁移过程中应用保持可用 数据不能丢失、多余或者不一致
  • 142. 迁移步骤数据的全量dump 变更数据的增量dump 路由规则切换 清理 分库1分库2分库1分库2分库3数据分片 数量变更
  • 143. 分库2分库1全量dump分库3分库1分库2分库3
  • 144. 分库2分库1全量dump分库3分库1分库2分库3dumpdump
  • 145. 此时仍使用两份分库的拆分规则 分库2分库1全量dump分库3invisibleinvisible让绿色数据对SQL语句不可见… WHERE ( … ) AND 非绿色数据
  • 146. 全量dump过程中,数据会持续变更 将binlog同步到新库中 分库2分库1增量dump分库3invisibleinvisiblebinlogbinlogApplyApply
  • 147. 暂停写入,等待binlog全部同步到新库 将路由规则更新为三个分库 分库2分库1路由规则切换分库3invisibleinvisiblebinlogbinlogApplyApply
  • 148. 删除老数据 清理分库2分库1分库3
  • 149. 大纲中间件引入 Cobar策略 系统实现 实施应用 部署结构 平滑迁移
  • 150. 联系我们http://code.alibabatech.com/wiki/display/cobar http://weibo.com/alicobar https://groups.google.com/group/ali-cobar