Revolutionizing Big Data in the Enterprise with Spark


Revolutionizing Big Data in the Enterprise with Spark Ion Stoica October 28, 2015 We Have Seen a Lot Worked with 100s companies to run Spark in production over five years Collaborate with all major Hadoop and Big Data vendors 2 How Does Spark Change Enterprise Big Data? • Unifying data sources • Unifying data processing 3 4 Unifying Data Sources Need to process data from • Multiple sources • Different data stores and locations • Different formats Traditional solutions: ETL data into data warehouse, … Traditional Data Warehouses ETL Slow to access and combine data Data Warehouse 6 Just-In-Time (JIT) Data Warehouse Process data in place or stream it • No need to wait for data to be ETLed 7 JIT Data Warehouse ETL Data Warehouse Process data in place or stream it • No need to wait for data to be ETLed Cache data in memory or SSDs 8 JIT Data Warehouse Low latency and easy to combine data: value! Analogy 9 Stream/cache & Play Download & Play Analogy 10 ETL & Query Data Source A ETL Data Warehouse Data Source B Data Source B Data Source A Data Source B Data Source B Stream/Cache + Query Top-3 Media Company Data sources • Traditional data warehouse: Customer transaction and profile data • S3: Clickstream and historical logs • Elasticsearch: User-submitted reviews and comments • Kafka: Streaming online event data Build Spark-based JIT Data Warehouse to perform real-time analytics 11 12 Unifying Data Processing Unified support for • Batch • Streaming • ML/Graphs • … 13 Spark: Unified Engine GraphXMLlib Core Spark Streaming Spark SQL SparkR Easy to manage, learn, and combine functionality Analogy First cellular phones Unified device (smartphone) Specialized devices Better Games Better GPSBetter Phone Analogy Batch processing Unified systemSpecialized systems Real-time analytics Instant fraud detection Better Apps Large On-line Service Company Leverages • Interactive query processing • ML and combines data from S3, Redshift, and HBase to provide • data analytics for product management team • advanced predictive analytics to deliver new services (e.g., customized inventory displays tailored to each user) 16 17 Demo Demo Setting 18 MLlib Core Spark Streaming Spark SQL HDFS RedShift
还剩17页未读

继续阅读

下载pdf到电脑,查找使用更方便

pdf的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享pdf获得金币 ] 0 人已下载

下载pdf

pdf贡献者

cpgc

贡献于2015-12-14

下载需要 8 金币 [金币充值 ]
亲,您也可以通过 分享原创pdf 来获得金币奖励!
下载pdf