SlideShare a Scribd company logo
1 of 23
Hadoop 1.x vs Hadoop 2
Rommel Garcia
Solutions Engineer - Big Data
Hortonworks
Transition To Big Data
Relational Dimensional
(EDW)
Big Data
Data Explosion
3 Design Dimensions
Key Hadoop Data Types
Sentiment
Clickstream
Sensor/Machine
Geographic
Server Logs
Text
Hadoop is NOT
ESB
NoSQL
HPC
Relational
Real-time
The “Jack of all Trades”
Hadoop 1
Limited up to 4,000 nodes per cluster
O(# of tasks in a cluster)
JobTracker bottleneck - resource
management, job scheduling and monitoring
Only has one namespace for managing HDFS
Map and Reduce slots are static
Only job to run is MapReduce
Hadoop 1 - Basics
BBBB CCCC AAAA AAAA AAAA
AAAA BBBB CCCC CCCC BBBB
MapReduce (Computation Framework)
HDFS (Storage Framework)
Hadoop 1 - Reading
Files
Rack1 Rack2 Rack3 RackN
read file (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs,
block ids, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
heartbeat/
block reportread blocks
Hadoop 1 - Writing Files
Rack1 Rack2 Rack3 RackN
request write (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
block report
write blocks
replication pipelining
Hadoop 1 - Running
Jobs
Rack1 Rack2 Rack3 RackN
Hadoop Client
JobTracker
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
submit job
deploy job
part 0part 0part 0part 0
map
reduce
shuffle
Hadoop 1 - Security
UsersUsersUsersUsers
FF
II
RR
EE
WW
AA
LL
LL
LDAP/AD
Client Node/
Spoke Server
KDC
Hadoop Cluster
authN/authZ
service request
block token
delegate token
* block token is for accessing data
* delegate token is for running jobs
Encryption PluginEncryption Plugin
Hadoop 1 - APIs
org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Job
Hadoop 2
Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing
HDFS
Efficient cluster utilization (YARN)
MRv1 backward and forward compatible
Any apps can integrate with Hadoop
Beyond Java
Hadoop 2 - Basics
Hadoop 2 - Reading Files
(w/ NN Federation)
(w/ NN Federation)
Rack1 Rack2 Rack3 RackN
read file
fsimage/edit copy
Hadoop Client NN1/ns1
SNameNode
per NN
return DNs,
block ids, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
checkpoint
register/
heartbeat/
block report
read blocks
fs sync Backup NN
per NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
ns1 ns2 ns3 ns4
dn1, dn2
dn1, dn3
dn4, dn5 dn4, dn5
Block Pools
Hadoop 2 - Writing Files
Rack1 Rack2 Rack3 RackN
request write
Hadoop Client
return DNs, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
write blocks
replication pipelining
fsimage/edit copy
NN1/ns1
SNameNode
per NN
checkpoint
block report
fs sync Backup NN
per NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
Hadoop 2 - Running Jobs
RackN
NodeManager
NodeManager
NodeManager
Rack2
NodeManager
NodeManager
NodeManager
Rack1
NodeManager
NodeManager
NodeManager
C2.1
C1.4
AM2
C2.2 C2.3
AM1
C1.3
C1.2
C1.1
Hadoop Client 1
Hadoop Client 2
create app2
submit app1
submit app2
create app1
ASM Scheduler
queues
ASM Containers
NM ASM
Scheduler Resources
.......negotiates.......
.......reports to.......
.......partitions.......
ResourceManager
status report
Hadoop 2 - Security
FF
II
RR
EE
WW
AA
LL
LL
LDAP/AD
Knox Gateway Cluster
KDC
Hadoop Cluster
Enterprise/
Cloud SSO
Provider
JDBC ClientJDBC Client
REST ClientREST Client
FF
II
RR
EE
WW
AA
LL
LL
DMZ
Browser(HUE)Browser(HUE) Native Hive/HBase EncryptionNative Hive/HBase Encryption
Hadoop 2 - APIs
org.apache.hadoop.yarn.api.ApplicationClientProtocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocol
org.apache.hadoop.yarn.api.ContainerManagementProtoc
ol
Resources
http://hortonworks.com/products/hortonworks-sandbox/
http://hortonworks.com/products/hdp-2/
http://hortonworks.com/resources/
http://hadoopsummit.org/san-jose/
Hadoop Summit 2014
Thank you!
www.linkedin.com/in/rommelgarcia
twitter.com/rommelgarcia
rgarcia@hortonworks.com
Hortonworks

More Related Content

What's hot

Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDr. C.V. Suresh Babu
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaEdureka!
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Building Rich Domain Models
Building Rich Domain ModelsBuilding Rich Domain Models
Building Rich Domain ModelsChris Richardson
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 

What's hot (20)

Hbase
HbaseHbase
Hbase
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Voldemort
VoldemortVoldemort
Voldemort
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
 
Apache hive
Apache hiveApache hive
Apache hive
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Building Rich Domain Models
Building Rich Domain ModelsBuilding Rich Domain Models
Building Rich Domain Models
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 

Viewers also liked

Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Understanding The Gist
Understanding The GistUnderstanding The Gist
Understanding The Gistebenimzo
 
Top 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answersTop 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answersjomgori
 
Use of glass powder as fine aggregate in high strength concrete
Use of glass powder as fine aggregate in high strength concreteUse of glass powder as fine aggregate in high strength concrete
Use of glass powder as fine aggregate in high strength concreteJostin P Jose
 
Software Product Development - Simple Process flow
Software Product Development - Simple Process flowSoftware Product Development - Simple Process flow
Software Product Development - Simple Process flowSabina Siddiqi
 
Ecommerce and internet marketing
Ecommerce and internet marketingEcommerce and internet marketing
Ecommerce and internet marketingakkapeddi
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestionVinod Nayal
 
Bài 20: Mạng máy tính
Bài 20: Mạng máy tínhBài 20: Mạng máy tính
Bài 20: Mạng máy tínhChâu Trần
 
7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance7. The Software Development Process - Maintenance
7. The Software Development Process - MaintenanceForrester High School
 
Analysis of working capital management shriram piston finance
Analysis of working capital management  shriram piston  financeAnalysis of working capital management  shriram piston  finance
Analysis of working capital management shriram piston financeanuragmaurya
 
Online supply inventory system
Online supply inventory systemOnline supply inventory system
Online supply inventory systemrokista
 
Cold water supply system & Components
Cold water supply system & ComponentsCold water supply system & Components
Cold water supply system & Componentsashikin
 
Financial planning & forecasting
Financial planning & forecastingFinancial planning & forecasting
Financial planning & forecastingDavid thugu
 
Basic Photography 101
Basic Photography 101Basic Photography 101
Basic Photography 101Bas Olthoff
 

Viewers also liked (20)

Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Understanding The Gist
Understanding The GistUnderstanding The Gist
Understanding The Gist
 
Top 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answersTop 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answers
 
Manualtesting
ManualtestingManualtesting
Manualtesting
 
Use of glass powder as fine aggregate in high strength concrete
Use of glass powder as fine aggregate in high strength concreteUse of glass powder as fine aggregate in high strength concrete
Use of glass powder as fine aggregate in high strength concrete
 
Industrial housing
Industrial housingIndustrial housing
Industrial housing
 
Software Product Development - Simple Process flow
Software Product Development - Simple Process flowSoftware Product Development - Simple Process flow
Software Product Development - Simple Process flow
 
How Hedge Funds Are Structured
How Hedge Funds Are StructuredHow Hedge Funds Are Structured
How Hedge Funds Are Structured
 
Ecommerce and internet marketing
Ecommerce and internet marketingEcommerce and internet marketing
Ecommerce and internet marketing
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 
Bài 20: Mạng máy tính
Bài 20: Mạng máy tínhBài 20: Mạng máy tính
Bài 20: Mạng máy tính
 
Surgical Bleeding
Surgical BleedingSurgical Bleeding
Surgical Bleeding
 
7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance
 
Analysis of working capital management shriram piston finance
Analysis of working capital management  shriram piston  financeAnalysis of working capital management  shriram piston  finance
Analysis of working capital management shriram piston finance
 
Online supply inventory system
Online supply inventory systemOnline supply inventory system
Online supply inventory system
 
Enterprise Analysis
Enterprise AnalysisEnterprise Analysis
Enterprise Analysis
 
Cold water supply system & Components
Cold water supply system & ComponentsCold water supply system & Components
Cold water supply system & Components
 
Financial planning & forecasting
Financial planning & forecastingFinancial planning & forecasting
Financial planning & forecasting
 
Basic Photography 101
Basic Photography 101Basic Photography 101
Basic Photography 101
 

Similar to Hadoop 1.x vs 2

Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in DepthSyed Hadoop
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 

Similar to Hadoop 1.x vs 2 (20)

Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in Depth
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 

More from Rommel Garcia

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With HadoopRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 

More from Rommel Garcia (12)

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Hadoop 1.x vs 2

  • 1. Hadoop 1.x vs Hadoop 2 Rommel Garcia Solutions Engineer - Big Data Hortonworks
  • 2. Transition To Big Data Relational Dimensional (EDW) Big Data
  • 5. Key Hadoop Data Types Sentiment Clickstream Sensor/Machine Geographic Server Logs Text
  • 7. Hadoop 1 Limited up to 4,000 nodes per cluster O(# of tasks in a cluster) JobTracker bottleneck - resource management, job scheduling and monitoring Only has one namespace for managing HDFS Map and Reduce slots are static Only job to run is MapReduce
  • 8. Hadoop 1 - Basics BBBB CCCC AAAA AAAA AAAA AAAA BBBB CCCC CCCC BBBB MapReduce (Computation Framework) HDFS (Storage Framework)
  • 9. Hadoop 1 - Reading Files Rack1 Rack2 Rack3 RackN read file (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, block ids, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint heartbeat/ block reportread blocks
  • 10. Hadoop 1 - Writing Files Rack1 Rack2 Rack3 RackN request write (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint block report write blocks replication pipelining
  • 11. Hadoop 1 - Running Jobs Rack1 Rack2 Rack3 RackN Hadoop Client JobTracker DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT submit job deploy job part 0part 0part 0part 0 map reduce shuffle
  • 12. Hadoop 1 - Security UsersUsersUsersUsers FF II RR EE WW AA LL LL LDAP/AD Client Node/ Spoke Server KDC Hadoop Cluster authN/authZ service request block token delegate token * block token is for accessing data * delegate token is for running jobs Encryption PluginEncryption Plugin
  • 13. Hadoop 1 - APIs org.apache.hadoop.mapreduce.Partitioner org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer org.apache.hadoop.mapreduce.Job
  • 14. Hadoop 2 Potentially up to 10,000 nodes per cluster O(cluster size) Supports multiple namespace for managing HDFS Efficient cluster utilization (YARN) MRv1 backward and forward compatible Any apps can integrate with Hadoop Beyond Java
  • 15. Hadoop 2 - Basics
  • 16. Hadoop 2 - Reading Files (w/ NN Federation) (w/ NN Federation) Rack1 Rack2 Rack3 RackN read file fsimage/edit copy Hadoop Client NN1/ns1 SNameNode per NN return DNs, block ids, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM checkpoint register/ heartbeat/ block report read blocks fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or ns1 ns2 ns3 ns4 dn1, dn2 dn1, dn3 dn4, dn5 dn4, dn5 Block Pools
  • 17. Hadoop 2 - Writing Files Rack1 Rack2 Rack3 RackN request write Hadoop Client return DNs, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM write blocks replication pipelining fsimage/edit copy NN1/ns1 SNameNode per NN checkpoint block report fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or
  • 18. Hadoop 2 - Running Jobs RackN NodeManager NodeManager NodeManager Rack2 NodeManager NodeManager NodeManager Rack1 NodeManager NodeManager NodeManager C2.1 C1.4 AM2 C2.2 C2.3 AM1 C1.3 C1.2 C1.1 Hadoop Client 1 Hadoop Client 2 create app2 submit app1 submit app2 create app1 ASM Scheduler queues ASM Containers NM ASM Scheduler Resources .......negotiates....... .......reports to....... .......partitions....... ResourceManager status report
  • 19. Hadoop 2 - Security FF II RR EE WW AA LL LL LDAP/AD Knox Gateway Cluster KDC Hadoop Cluster Enterprise/ Cloud SSO Provider JDBC ClientJDBC Client REST ClientREST Client FF II RR EE WW AA LL LL DMZ Browser(HUE)Browser(HUE) Native Hive/HBase EncryptionNative Hive/HBase Encryption
  • 20. Hadoop 2 - APIs org.apache.hadoop.yarn.api.ApplicationClientProtocol org.apache.hadoop.yarn.api.ApplicationMasterProtocol org.apache.hadoop.yarn.api.ContainerManagementProtoc ol