Spark R 介绍


SparkR 1.  The orange button 2.  Audio Type 3.  Close apps 4.  Enlarge my screen 5.  Headphones 6.  Questions Pane SparkR 2 •  Lecture – slides and/or video will be made available within one week •  Live Demonstration •  Q & A SparkR 3 To attend a hands-on Spark training course which runs every Saturday, please visit: liondatasystems.com/courses 4SparkR •  This event has attracted nearly 900 registrants from various parts of the world. •  Thank you everyone for your support! SparkR 5 •  Shivaram Venkataraman •  Co-Author of SparkR •  PhD Student @ UC Berkeley •  Former Google Engineer SparkR 6 Introduction to SparkR Shivaram Venkataraman Big Data & R DataFrames Visualization Libraries Data + Background Engine for large-scale data processing Fast, Easy to Use Runs Everywhere – EC2, YARN, Mesos SparkR Interactive Shell Batch Scripts Outline SparkR DataFrames Architecture Demo SparkR Roadmap Big Data Processing + R Data Cleaning Filtering Aggregation Collect Subset DataFrames Visualization Libraries SparkR DataFrames High-level API for data manipulation Read in CSV, JSON, JDBC etc. dplyr-like syntax Example {"name":"Michael",."age":29}. {"name":"Andy",."age":30}. {"name":"Justin",."age":19}. {"name":"Bob",."age":22}. {"name":"Chris",."age":28}. {"name":"Garth",."age":36}. {"name":"Tasha",."age":24}. {"name":"Mac",."age":30}. {"name":"Neil",."age":32}. Example people.1.4.0 ) ./bin/sparkR or RStudio Useful for learning SparkR, demonstrations SparkR on EC2 Launch cluster with Spark’s EC2 scripts . ./sparkJec2.Js.2.Jt.r3.xlarge.–i..Jk..sparkr. . Follow r-bloggers.com/spark-1-4-for-rstudio/ Thanks Vincent Warmerdam ! SparkR Future Big Data & R Big Data Small Learning Partition Aggregate Large Scale Machine Learning Big Data Processing + R Data Cleaning Filtering Aggregation Collect Subset DataFrames Visualization Libraries 2(a). Partition Aggregate Data Collect Subset Best Model Params Parameter Tuning 2(b). Partition Aggregate Data Combine Models Model Averaging 3. Large Scale Machine Learning Data Featurize Learning Model Big Data & R Big Data Small Learning Partition Aggregate Large Scale Machine Learning SparkR: Unified approach Partition Aggregate Upcoming feature: Simple, parallel API for SparkR Ex: Parameter tuning, Model Averaging Integrated with DataFrames Use existing R packages Large Scale Machine Learning Integration with MLLib Support for GLM, KMeans etc. . . model.20 contributors including AMPLab, Databricks, Alteryx, Intel New contributions welcome ! SparkR Big data processing from R DataFrames in Spark 1.4 Future: Large Scale ML & more #"Local"Demo" " Sys.setenv(SPARK_HOME="/Users/shivaram/sparkE1.4.1")" .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),""R",""lib"),".libPaths()))" library(SparkR)" sc"%"" ""group_by(flights$dest)"%>%"" ""summarize(count"="n(flights$dest))" " top_dests"
还剩39页未读

继续阅读

下载pdf到电脑,查找使用更方便

pdf的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享pdf获得金币 ] 0 人已下载

下载pdf

pdf贡献者

dargun

贡献于2016-02-19

下载需要 8 金币 [金币充值 ]
亲,您也可以通过 分享原创pdf 来获得金币奖励!
下载pdf