© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. shaofengshi@apache.org Sep 7, 2016 Apache Kylin on AWS Extreme OLAP on Cloud About Me 史少锋 | Shaofeng Shi Kyligence Inc 技术合伙人,资深架构师 Apache Kylin committer & PMC shaofeng@kyligence.io About Kyligence Inc. Enterprise Product Professio nal Services Contributing to the world class open source community Manageme nt and Automation Kylin on Cloud Solutions • A leading Big Data middleware company • Founded by Apache Kylin core members • http://kyligence.io  关于Apache Kylin  核心技术  经典案例  在AWS上运行Apache Kylin  Q&A Agenda Why Happiness Latency 10s O(N) vs O(1) All rights reserved ©Kyligence Inc. http://kyligence.io O(N) 数据量 查询时间 O(1) • 预计算 Programing vs SQL SQL Apache Kylin http://kylin.apache.org Extreme OLAP Engine for Big Data Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second level response time. kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form Apache, Apache Kylin, Kylin logo, Feather logo are trademark of ASF Kyligence and Kyligence logo are trademark of Kyligence Apache Kylin Journey Sept 2013 Project Kickoff Oct 2014 Apache Incubator Nov 2014 InfoWorld: Bossie Award Best Open Source Big Data Tool Apache Release v1.0 Apache Top Level Project Kyligence Founded Sept 2015 Nov 2015 Mar 2016 Go Live at eBay & Open Source on Github The Missing Middleware Data Mart OLAP Presentation Visualization Big Data Platform Data Lake Source Data Analytics Query Taxonomy Transaction Operation Strategy High Level Aggregation • Very High Level, e.g GMV by site by vertical by weeks Analysis Query • Middle level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail • Detail Level (Summary Table) Low Level Aggregation • First Level Aggragation Transaction Level •Transaction Data 11 80+% Analyti cs Kylin is designed to accelerate analytics queries performance on Hadoop  关于Apache Kylin  核心技术  经典案例  在AWS上运行Apache Kylin  Q&A Agenda time, item time, item, location time, item, location, supplier tim e ite m location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - 2. (9/15, milk, Urbana, *) - 3. (*, milk, Urbana, *) - 4. (*, milk, Chicago, *) - 5. (*, milk, *, *) - • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) OLAP Cube Cube - Balance Between Space and Time Architecture Map Reduce/Spark/Streaming… Kylin BI Tools, Web App… ANSI SQL Define Data Model Manage Jobs Explore the Data Interactive with BI Tool - Tableau Integration with Excel/PowerBI  关于Apache Kylin  核心技术  经典案例  在AWS上运行Apache Kylin  Q&A Agenda Use Case: Baidu Map Use Case: JD.com • Powered by Apache Kylin • JCloud J OS - API Statistics • JCloud 云海 – Low latency query engine • JCloud D ata Cloud – OLAP platform for online analytics tool All rights reserved ©Kyligence Inc. http://kyligence.io Use Case: VIP.com Use Case: NetEase All rights reserved ©Kyligence Inc. http://kyligence.io  关于Apache Kylin  核心技术  经典案例  在AWS上运行Apache Kylin  Q&A Agenda Why Kylin on AWS • Data on AWS • Move compute to data • Fast delivery • Setup a cluster in days • Unlimited capacity • Easy to expand • Worldwide access • Controllable cost • … Cloud is Future Run Kylin on AWS Solution 1: EC2 + Hadoop + Kylin Amazon EC2 Hadoop Cluster (HDFS, MR, Hive, HBase) VPC Run Kylin on AWS EC2: 10+ M1-xlarge instances for Hadoop 3 C3-xlarge instances for Kylin and others Hadoop: HDP 2.2 managed with Ambari; OLAP & Visualization: Apache Kylin 1.5.3 + Zeppelin + Saiku Cost: Around 3k $/month VPC Run Kylin on AWS Solution 2: EMR + Kylin Amazon EMR Amazon S3 bucket VPC Run Kylin on AWS Solution 3: EMR + Container Amazon EMR Amazon S3 bucket Q&A . 联系信息 - shaofeng@kyligence.io . 加入社区 - user@kylin.apache.org . 网站 - http://kyligence.io - http://ky lin.apache.org




