Skip to content

h2oai/h2o-sparkling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is DEPRECATED! Please use the new Sparkling Water repository https://github.com/h2oai/sparkling-water!


h2o-sparkling

Makes interoperability between H2O and Spark trivial.

Requirements

  • Spark 1.0.0 (SQL component required)
  • Tachyon 0.4.1
  • Java 1.6+

Installation

  • First compile latest version of spark with SQL component
git clone spark
cd spark
sbt/sbt assembly publish-local
cd h2o-sparkling-demo
sbt assembly

Note: The assembly stage is important, since the demo is a Spark driver sending a jar-file containing implementation of a working job.

Run demo

Run local version

For this run no Spark cloud is required:

  • Execute an instance of H2O embedding Spark driver
cd h2o-sparkling-demo
sbt "run --local"

Run distributed version

For this run a Spark cloud is required:

  • Run master and one worker on local node
cd spark/sbin
./start-master.sh
./start-slave.sh 1 "spark://localhost:7077"
  • Assembly h2o-sparkling-demo jar file which can be sent by the driver to Spark cloud
cd h2o-sparkling-demo
sbt assembly
sbt "run --remote"

Run additional H2O node

cd h2o-sparkling-demo
sbt runH2O

Select different RDD2Frame extractor

Currently demo supports three extractors:

  • dummy - pull all data into driver and create a frame
  • file - ask Spark to save RDD as a file on local filesystem and then parse a stored file
  • tachyon - ask Spark to save RDD to tachyon filesystem, then H2O load a file from tachyon FS

The extractor can be selected via --extractor command line parameter, e.g., --extractor==tachyon

Running with Tachyon

  • Start Tachyon
cd tachyon/bin
./tachyon-start.sh

Example

Run a demo with Tachyon-based extractor againts remote Spark cloud:

cd h2o-sparkling-demo
sbt assembly
sbt "run --remote --extractor=tachyon"

Run airlines demo with file-based extractor againts remote Spark cloud running on non-default location:

sbt "run --remote --sparkMaster=spark://localhost:17077 --noshutdown --demo=airlines --extractor=file"

Doc

About

DEPRECATED! Use https://github.com/h2oai/sparkling-water repository! H2O and Spark interoperability based on Tachyon.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages