SlideShare a Scribd company logo
1 of 37
What is the point of Hadoop?
Matthew Aslett
Research Director, 451 Research




                              © 2013 by The 451 Group. All rights reserved
 Matthew Aslett
• Research Director, Data Management and Analytics
 matthew.aslett@451research.com
 www.twitter.com/maslett


 Responsible for data management
and analytics research agenda

 Focus on operational and analytic
databases, including NoSQL,
NewSQL, and Hadoop

 With 451 Research since 2007




                                      © 2013 by The 451 Group. All rights reserved
Unique combination of research, analysis & data
Emerging tech market segment focus
Daily qualitative & quantitative insight
Analyst advisory & Go-to-market support
Global events




                           © 2013 by The 451 Group. All rights reserved
Company Overview




  One company with 3 operating             200+ staff
   divisions                                1,300+ client organizations:
  Syndicated research, advisory,            enterprises, vendors, service
   professional services, datacenter         providers, and investment firms
   certification, and events                Organic and growth through
  Global focus                              acquisition




                                   © 2013 by The 451 Group. All rights reserved
What is the point of Hadoop?

Hadoop’s greatest asset is its
flexibility: it can be used for
multiple roles and use-cases

But that is also a challenge,
and can lead to confusion
and disillusionment

Each user and vendor has
their own perspective on
Hadoop’s role




                                  © 2013 by The 451 Group. All rights reserved
The Blind Men and the Elephant

“It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.”

John Godfrey Saxe (1872)




                                   © 2013 by The 451 Group. All rights reserved
The Blind Men and the Elephant

“After Hadoop finishes
filtering the data, the place
you want to put that data
is in Oracle Database.”

Larry Ellison (2011)




                                © 2013 by The 451 Group. All rights reserved
Oracle Big Data Appliance
Apache Hadoop


NoSQL Database


Oracle Tools


                        Oracle Database
Data Integrator for Oracle Database


Data Loader
                                    Big data
                                                                                         Big data
R distribution                    processing/i
                                                                                         analytics
                                   ntegration




                                          © 2013 by The 451 Group. All rights reserved
What is the point of Hadoop?




                         Big data
   Big data                                                               Big data
                       processing/i
   storage                                                                analytics
                        ntegration




                           © 2013 by The 451 Group. All rights reserved
Big Data
“Big data” - the realization of greater business intelligence by
storing, processing and analyzing data that was previously ignored due to the
limitations of traditional data management technologies due to the three Vs:




   Volume                 Velocity                       Variety
   The volume of data     The data is being              The data lacks the
   is too large for       produced at a rate             structure to make it
   traditional database   that is beyond the             suitable for storage
   software tools to      performance limits             and analysis in
   cope with              of traditional                 traditional databases
                          systems                        and data warehouses




                                        © 2013 by The 451 Group. All rights reserved
Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.
 Inspired by ‘Total Football’
  – a new approach to soccer
  that emerged in the late 1960s,
  in Amsterdam

 Total Data is making the most
  efficient use of existing and
  new data management
  resources to deliver value

 Not another name for Big Data: if your data is big, the way you
  manage it should be total


                                  © 2013 by The 451 Group. All rights reserved
Big Data and Total Data
                                                             Big Data:
                                                             The growing volume, velocity
                                                             and variety of data

                                                             Big Data Technologies:
                                                             New technologies being
                                                             adopted to store and process
                  BIG                                        that data
                TOTAL
                  BIG
                 DATA




                                      DQ
                 DATA
             TECHNOLOGY                                      Total Data:
                Volume                                       The user trends driving the
                                                             adoption of Big Data
                                                             Technologies to store and
               Predictive                                    process Big Data and the
               analytics                                     management alongside
                                                             existing data management
                                                             technologies.




                            © 2013 by The 451 Group. All rights reserved
Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.




Totality
The desire to process
and analyze data in
its entirety, rather
than analyzing a
sample of data and
extrapolating the
results.




                                  © 2013 by The 451 Group. All rights reserved
Totality

                                  Big data
   Big data
                                processing/i
   storage
                                 ntegration



 Prior to adopting Hadoop, only had transactional and
  summarized non-transactional data stored in its EDW
 The vast majority of its log data was discarded as not valuable
  enough to be efficiently processed in an enterprise data warehouse
 Now using Hadoop to process hundreds of GBs of log data
  produced by the millions of searches and transactions performed
  on its site each day
 Creating data exports to R, and aggregating data to its existing data
  warehouse for analysis



                                     © 2013 by The 451 Group. All rights reserved
Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.




Totality                Exploration
The desire to process   The interest in
and analyze data in     exploratory analytic
its entirety, rather    approaches, in which
than analyzing a        schema is defined in
sample of data and      response to the
extrapolating the       nature of the query.
results.




                                           © 2013 by The 451 Group. All rights reserved
Exploration
Traditional data warehouses:

Schema on write

    Application        Schema                         RDBMS                        SQL


Hadoop:

Schema on read

    Application         Hadoop                        Schema                    MapReduce




                                 © 2013 by The 451 Group. All rights reserved
Exploration

                                Big data
   Big data                                                                      Big data
                              processing/i
   storage                                                                       analytics
                               ntegration



 The company wanted to perform analysis on customer
  data in order to create geo-targeted advertising
 The required data was already present in its data warehouse
  but was modeled in a way that would not allow Orbitz to
  efficiently process the query
 Extracting the data into Hadoop enabled the company to query
  it in a way the data warehouse was never designed for




                                  © 2013 by The 451 Group. All rights reserved
Hadoop adoption process

                                                                          Big data
                 Big data                                                                                                            Big data
                                                                        processing/i
                 storage                                                                                                             analytics
                                                                         ntegration

          Google File System                                         Google MapReduce                                          Google Dremel
         Research paper                                              Research paper                                             Research paper
         published: 2003                                             published: 2004                                            published: 2010
                                                                                                                                 Google Tenzing
                                                                                                                                Research paper
                                                                                                                                published: 2011

                                                                                                     ANALYTICS
                                                                                        PROCESSING
                                                                            STORAGE




                                               INNOVATORS                                                               EARLY ADOPTERS
Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png
Licensed under the Creative Commons Attribution 2.5 License.




                                                                                      © 2013 by The 451 Group. All rights reserved
Crossing the Chasm
           Hadoop as (just) a low cost storage option is not fulfilling its potential
           Processing and integration is not the complete picture
           Hadoop-based analytics unlocks the value of previously ignored data
           Attempting to fast forward to analytics, missing out the
              processing/integration stage, creates silos and will result in disillusionment


                                                           PROCESSING
                                                                        ANALYTICS
                                                 STORAGE




                                EARLY
        INNOVATORS            ADOPTERS                                              EARLY MAJORITY             LATE MAJORITY            LAGGARDS
Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png
Licensed under the Creative Commons Attribution 2.5 License.




                                                                                         © 2013 by The 451 Group. All rights reserved
Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.




Totality                Exploration                 Frequency
The desire to process   The interest in             The desire to
and analyze data in     exploratory analytic        increase the rate of
its entirety, rather    approaches, in which        analysis in order to
than analyzing a        schema is defined in        generate more
sample of data and      response to the             accurate and timely
extrapolating the       nature of the query.        business intelligence.
results.




                                           © 2013 by The 451 Group. All rights reserved
Frequency




 Formerly AT&T Advertising solutions and AT&T Interactive
 Faced with increasing volume of traffic through
    distribution network
   Wanted to provide intra-day reporting, but faced days of
    report-lag due to loading multiple databases
   Moved data processing to Hadoop, enabling the creation
    of a single common data layer for all applications
   Report-lag reduced to hours, rather than days
   New insight enabled by more frequent analysis and being able to
    process all the data




                                    © 2013 by The 451 Group. All rights reserved
Total Data
The adoption of non-traditional data processing technologies
is also driven by the user’s particular data processing requirements.




Totality                Exploration                 Frequency                             Dependency
The desire to process   The interest in             The desire to                         The reliance on
and analyze data in     exploratory analytic        increase the rate of                  existing technologies
its entirety, rather    approaches, in which        analysis in order to                  and skills, and the
than analyzing a        schema is defined in        generate more                         need to balance
sample of data and      response to the             accurate and timely                   investment in those
extrapolating the       nature of the query.        business intelligence.                existing technologies
results.                                                                                  and skills with the
                                                                                          adoption of new
                                                                                          techniques.




                                           © 2013 by The 451 Group. All rights reserved
SQL meets Hadoop
                                   RDBMS and Hadoop
      SQL on Hadoop                                                             Operational SQL on Hadoop
                                     co-processing
• Hive                        • Hadapt Adaptive Analytic                       • Drawn to Scale
  • Project Stinger             Platform                                         • Spire
  • Apache Tez (proposed)
                              • Teradata Aster SQL-H                           • Splice Machine
• Impala                                                                         • Splice SQL Engine
  • Cloudera Enterprise RTQ   • Rainstor Big Data Analytics
                                on Hadoop
• Apache Drill
  • (incubating)              • EMC Greenplum HAWQ

• Phoenix project             • Microsoft PolyBase
  • For HBase
                              • Citus Data CitusDB
• Lingual
  • For Cascading and         • IBM Big SQL
    Hadoop




                                           © 2013 by The 451 Group. All rights reserved
Crossing the Chasm
             Project maturity
             Vendor ecosystem
             Mainstream interest
             Geographic adoption


                                                           PROCESSING
                                                                        ANALYTICS
                                                 STORAGE




                                EARLY
        INNOVATORS            ADOPTERS                                              EARLY MAJORITY             LATE MAJORITY            LAGGARDS
Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png
Licensed under the Creative Commons Attribution 2.5 License.




                                                                                         © 2013 by The 451 Group. All rights reserved
Project maturity




      Feb 2006                                                    Dec 2012


                   © 2013 by The 451 Group. All rights reserved
Vendor ecosystem

70+ different                         120+ different
companies, 200+                       companies, 750+
individuals                           individuals

                       Hortonworks                                                             Hortonworks
      The rest             37%                           The rest                                  27%
        29%                                               31%

                  HADOOP                                                       ALL
                   CORE                                                      HADOOP
                                                                             PROJECTS
   Facebook
      7%                                                                                            Cloudera
                                                           Facebook
                                                                                                      15%
          Yahoo!                                             11%
                     Cloudera                                                          Yahoo
           12%
                       15%                                                              16%

                            Contributors by lines of
                           code by current employer


                                        © 2013 by The 451 Group. All rights reserved
Vendor ecosystem
                                               Academia
            Unknown/indi                          1%
               viduals
                 4%




               Users           ALL                           Hadoop
               38%           HADOOP                          vendors
                             PROJECTS                          51%




                                                                          Contributors by lines of
             Other vendors                                               code by current employer
                  6%                                                       and contributor type


                              © 2013 by The 451 Group. All rights reserved
Mainstream interest




Source: Indeed.com February, 2013


                                    © 2013 by The 451 Group. All rights reserved
Mainstream interest




Source: Indeed.com February, 2013


                                    © 2013 by The 451 Group. All rights reserved
Largest employers of Hadoop skills
                           Yahoo
                        Microsoft
                          Google
     Current employer




                            eBay
                         Amazon
                             IBM
                         LinkedIn
                           Oracle
                            EMC
                            Cisco
                        Cloudera

                                    0.0   0.5     1.0       1.5             2.0             2.5        3.0   3.5
                                          % of total LinkedIn profiles mentioning Hadoop
Source: LinkedIn: August 2012


                                                        © 2013 by The 451 Group. All rights reserved
Largest employers of Hadoop skills
                           Yahoo
                        Microsoft
                          Google
     Current employer




                         Amazon
                             IBM
                            eBay
                          Oracle
                         LinkedIn
                             Tata
                              HP
                            Cisco

                                    0.0   0.5     1.0       1.5             2.0             2.5        3.0   3.5
                                          % of total LinkedIn profiles mentioning Hadoop
Source: LinkedIn: February 2013


                                                        © 2013 by The 451 Group. All rights reserved
Geographic adoption
Seattle                       UK
 3.7%                        3.0%
          NYC
          4.8%




           LA          DC
                      3.0%
          3.5%
                                                                              China
                                                                              3.6%




                                                              India
                                                              9.7%

           Bay area
            28.2%
                                                                              LinkedIn search result
                                                                              December 2011


                               © 2013 by The 451 Group. All rights reserved
Geographic adoption
Seattle                      UK
 3.9%       NYC             3.4%
            4.7%




            LA        DC
           2.8%      3.1%

                                                                             China
                                                                             4.4%




                                                             India
                                                             11.2%

          Bay area
           24.9%
                                                                             LinkedIn search result
                                                                             August, 2012


                              © 2013 by The 451 Group. All rights reserved
Geographic adoption
Seattle                      UK
 3.9%        NYC            3.4%
             4.6%




             LA       DC
            2.7%     3.1%

                                                                             China
                                                                             4.8%




                                                             India
                                                             13.5%

          Bay area
           22.9%
                                                                             LinkedIn search result
                                                                             February 2013


                              © 2013 by The 451 Group. All rights reserved
Geographic adoption
                          USA        ROW
 40000                                                                   Total: 38,049
 35000

 30000                                                                       41.7%

 25000
                             Total: 22,178
 20000
                                   39.6%
 15000
           Total: 9,079                                                      58.3%
 10000
              35.6%                60.4%
  5000
              64.4%
    0
         December 2011       August 2012                                 February 2013
                                                                               LinkedIn search result


                          © 2013 by The 451 Group. All rights reserved
Conclusions
 Hadoop’s greatest asset is its flexibility, but that is also a challenge,
  and can lead to confusion and disillusionment among later adopters

 Hadoop is enabling greater business intelligence by storing, processing and
  analyzing data that was previously ignored due to the limitations of
  traditional data management technologies

 Storage, processing, and analyzing of data is a process that has enabled
  early adopters to understand Hadoop’s role in the wider landscape

 Attempting to fast forward to analytics, missing out the
  processing/integration stage, creates silos and will result in disillusionment

 The Hadoop ecosystem is vibrant, with strength in depth, and breadth

 Growing mainstream interest and geographic adoption means Hadoop is
  well-positioned to cross the chasm into mainstream adoption


                                       © 2013 by The 451 Group. All rights reserved
Questions? Comments?




                       © 2013 by The 451 Group. All rights reserved

More Related Content

What's hot

Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Cloudera, Inc.
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Big Data Marketing Analytics
Big Data Marketing AnalyticsBig Data Marketing Analytics
Big Data Marketing AnalyticsAkash Tyagi
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive OverviewRCG Global Services
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Surveyijeei-iaes
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 

What's hot (20)

Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big Data Marketing Analytics
Big Data Marketing AnalyticsBig Data Marketing Analytics
Big Data Marketing Analytics
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive Overview
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data
Big dataBig data
Big data
 

Similar to What is the Point of Hadoop

Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataEMC
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)Ajay Ohri
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a securityTyrone Systems
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Big data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopSamiraChandan
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalAccenture the Netherlands
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 

Similar to What is the Point of Hadoop (20)

Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast Data
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Big data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and Hadoop
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - Technical
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Big Data
Big DataBig Data
Big Data
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

What is the Point of Hadoop

  • 1. What is the point of Hadoop? Matthew Aslett Research Director, 451 Research © 2013 by The 451 Group. All rights reserved
  • 2.  Matthew Aslett • Research Director, Data Management and Analytics  matthew.aslett@451research.com  www.twitter.com/maslett  Responsible for data management and analytics research agenda  Focus on operational and analytic databases, including NoSQL, NewSQL, and Hadoop  With 451 Research since 2007 © 2013 by The 451 Group. All rights reserved
  • 3. Unique combination of research, analysis & data Emerging tech market segment focus Daily qualitative & quantitative insight Analyst advisory & Go-to-market support Global events © 2013 by The 451 Group. All rights reserved
  • 4. Company Overview  One company with 3 operating  200+ staff divisions  1,300+ client organizations:  Syndicated research, advisory, enterprises, vendors, service professional services, datacenter providers, and investment firms certification, and events  Organic and growth through  Global focus acquisition © 2013 by The 451 Group. All rights reserved
  • 5. What is the point of Hadoop? Hadoop’s greatest asset is its flexibility: it can be used for multiple roles and use-cases But that is also a challenge, and can lead to confusion and disillusionment Each user and vendor has their own perspective on Hadoop’s role © 2013 by The 451 Group. All rights reserved
  • 6. The Blind Men and the Elephant “It was six men of Indostan To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind.” John Godfrey Saxe (1872) © 2013 by The 451 Group. All rights reserved
  • 7. The Blind Men and the Elephant “After Hadoop finishes filtering the data, the place you want to put that data is in Oracle Database.” Larry Ellison (2011) © 2013 by The 451 Group. All rights reserved
  • 8. Oracle Big Data Appliance Apache Hadoop NoSQL Database Oracle Tools Oracle Database Data Integrator for Oracle Database Data Loader Big data Big data R distribution processing/i analytics ntegration © 2013 by The 451 Group. All rights reserved
  • 9. What is the point of Hadoop? Big data Big data Big data processing/i storage analytics ntegration © 2013 by The 451 Group. All rights reserved
  • 10. Big Data “Big data” - the realization of greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies due to the three Vs: Volume Velocity Variety The volume of data The data is being The data lacks the is too large for produced at a rate structure to make it traditional database that is beyond the suitable for storage software tools to performance limits and analysis in cope with of traditional traditional databases systems and data warehouses © 2013 by The 451 Group. All rights reserved
  • 11. Total Data The adoption of non-traditional data processing technologies is also driven by the user’s particular data processing requirements.  Inspired by ‘Total Football’ – a new approach to soccer that emerged in the late 1960s, in Amsterdam  Total Data is making the most efficient use of existing and new data management resources to deliver value  Not another name for Big Data: if your data is big, the way you manage it should be total © 2013 by The 451 Group. All rights reserved
  • 12. Big Data and Total Data Big Data: The growing volume, velocity and variety of data Big Data Technologies: New technologies being adopted to store and process BIG that data TOTAL BIG DATA DQ DATA TECHNOLOGY Total Data: Volume The user trends driving the adoption of Big Data Technologies to store and Predictive process Big Data and the analytics management alongside existing data management technologies. © 2013 by The 451 Group. All rights reserved
  • 13. Total Data The adoption of non-traditional data processing technologies is also driven by the user’s particular data processing requirements. Totality The desire to process and analyze data in its entirety, rather than analyzing a sample of data and extrapolating the results. © 2013 by The 451 Group. All rights reserved
  • 14. Totality Big data Big data processing/i storage ntegration  Prior to adopting Hadoop, only had transactional and summarized non-transactional data stored in its EDW  The vast majority of its log data was discarded as not valuable enough to be efficiently processed in an enterprise data warehouse  Now using Hadoop to process hundreds of GBs of log data produced by the millions of searches and transactions performed on its site each day  Creating data exports to R, and aggregating data to its existing data warehouse for analysis © 2013 by The 451 Group. All rights reserved
  • 15. Total Data The adoption of non-traditional data processing technologies is also driven by the user’s particular data processing requirements. Totality Exploration The desire to process The interest in and analyze data in exploratory analytic its entirety, rather approaches, in which than analyzing a schema is defined in sample of data and response to the extrapolating the nature of the query. results. © 2013 by The 451 Group. All rights reserved
  • 16. Exploration Traditional data warehouses: Schema on write Application Schema RDBMS SQL Hadoop: Schema on read Application Hadoop Schema MapReduce © 2013 by The 451 Group. All rights reserved
  • 17. Exploration Big data Big data Big data processing/i storage analytics ntegration  The company wanted to perform analysis on customer data in order to create geo-targeted advertising  The required data was already present in its data warehouse but was modeled in a way that would not allow Orbitz to efficiently process the query  Extracting the data into Hadoop enabled the company to query it in a way the data warehouse was never designed for © 2013 by The 451 Group. All rights reserved
  • 18. Hadoop adoption process Big data Big data Big data processing/i storage analytics ntegration  Google File System  Google MapReduce  Google Dremel Research paper Research paper Research paper published: 2003 published: 2004 published: 2010  Google Tenzing Research paper published: 2011 ANALYTICS PROCESSING STORAGE INNOVATORS EARLY ADOPTERS Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png Licensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
  • 19. Crossing the Chasm  Hadoop as (just) a low cost storage option is not fulfilling its potential  Processing and integration is not the complete picture  Hadoop-based analytics unlocks the value of previously ignored data  Attempting to fast forward to analytics, missing out the processing/integration stage, creates silos and will result in disillusionment PROCESSING ANALYTICS STORAGE EARLY INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDS Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png Licensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
  • 20. Total Data The adoption of non-traditional data processing technologies is also driven by the user’s particular data processing requirements. Totality Exploration Frequency The desire to process The interest in The desire to and analyze data in exploratory analytic increase the rate of its entirety, rather approaches, in which analysis in order to than analyzing a schema is defined in generate more sample of data and response to the accurate and timely extrapolating the nature of the query. business intelligence. results. © 2013 by The 451 Group. All rights reserved
  • 21. Frequency  Formerly AT&T Advertising solutions and AT&T Interactive  Faced with increasing volume of traffic through distribution network  Wanted to provide intra-day reporting, but faced days of report-lag due to loading multiple databases  Moved data processing to Hadoop, enabling the creation of a single common data layer for all applications  Report-lag reduced to hours, rather than days  New insight enabled by more frequent analysis and being able to process all the data © 2013 by The 451 Group. All rights reserved
  • 22. Total Data The adoption of non-traditional data processing technologies is also driven by the user’s particular data processing requirements. Totality Exploration Frequency Dependency The desire to process The interest in The desire to The reliance on and analyze data in exploratory analytic increase the rate of existing technologies its entirety, rather approaches, in which analysis in order to and skills, and the than analyzing a schema is defined in generate more need to balance sample of data and response to the accurate and timely investment in those extrapolating the nature of the query. business intelligence. existing technologies results. and skills with the adoption of new techniques. © 2013 by The 451 Group. All rights reserved
  • 23. SQL meets Hadoop RDBMS and Hadoop SQL on Hadoop Operational SQL on Hadoop co-processing • Hive • Hadapt Adaptive Analytic • Drawn to Scale • Project Stinger Platform • Spire • Apache Tez (proposed) • Teradata Aster SQL-H • Splice Machine • Impala • Splice SQL Engine • Cloudera Enterprise RTQ • Rainstor Big Data Analytics on Hadoop • Apache Drill • (incubating) • EMC Greenplum HAWQ • Phoenix project • Microsoft PolyBase • For HBase • Citus Data CitusDB • Lingual • For Cascading and • IBM Big SQL Hadoop © 2013 by The 451 Group. All rights reserved
  • 24. Crossing the Chasm  Project maturity  Vendor ecosystem  Mainstream interest  Geographic adoption PROCESSING ANALYTICS STORAGE EARLY INNOVATORS ADOPTERS EARLY MAJORITY LATE MAJORITY LAGGARDS Image source: http://en.wikipedia.org/wiki/File:DiffusionOfInnovation.png Licensed under the Creative Commons Attribution 2.5 License. © 2013 by The 451 Group. All rights reserved
  • 25. Project maturity Feb 2006 Dec 2012 © 2013 by The 451 Group. All rights reserved
  • 26. Vendor ecosystem 70+ different 120+ different companies, 200+ companies, 750+ individuals individuals Hortonworks Hortonworks The rest 37% The rest 27% 29% 31% HADOOP ALL CORE HADOOP PROJECTS Facebook 7% Cloudera Facebook 15% Yahoo! 11% Cloudera Yahoo 12% 15% 16% Contributors by lines of code by current employer © 2013 by The 451 Group. All rights reserved
  • 27. Vendor ecosystem Academia Unknown/indi 1% viduals 4% Users ALL Hadoop 38% HADOOP vendors PROJECTS 51% Contributors by lines of Other vendors code by current employer 6% and contributor type © 2013 by The 451 Group. All rights reserved
  • 28. Mainstream interest Source: Indeed.com February, 2013 © 2013 by The 451 Group. All rights reserved
  • 29. Mainstream interest Source: Indeed.com February, 2013 © 2013 by The 451 Group. All rights reserved
  • 30. Largest employers of Hadoop skills Yahoo Microsoft Google Current employer eBay Amazon IBM LinkedIn Oracle EMC Cisco Cloudera 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 % of total LinkedIn profiles mentioning Hadoop Source: LinkedIn: August 2012 © 2013 by The 451 Group. All rights reserved
  • 31. Largest employers of Hadoop skills Yahoo Microsoft Google Current employer Amazon IBM eBay Oracle LinkedIn Tata HP Cisco 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 % of total LinkedIn profiles mentioning Hadoop Source: LinkedIn: February 2013 © 2013 by The 451 Group. All rights reserved
  • 32. Geographic adoption Seattle UK 3.7% 3.0% NYC 4.8% LA DC 3.0% 3.5% China 3.6% India 9.7% Bay area 28.2% LinkedIn search result December 2011 © 2013 by The 451 Group. All rights reserved
  • 33. Geographic adoption Seattle UK 3.9% NYC 3.4% 4.7% LA DC 2.8% 3.1% China 4.4% India 11.2% Bay area 24.9% LinkedIn search result August, 2012 © 2013 by The 451 Group. All rights reserved
  • 34. Geographic adoption Seattle UK 3.9% NYC 3.4% 4.6% LA DC 2.7% 3.1% China 4.8% India 13.5% Bay area 22.9% LinkedIn search result February 2013 © 2013 by The 451 Group. All rights reserved
  • 35. Geographic adoption USA ROW 40000 Total: 38,049 35000 30000 41.7% 25000 Total: 22,178 20000 39.6% 15000 Total: 9,079 58.3% 10000 35.6% 60.4% 5000 64.4% 0 December 2011 August 2012 February 2013 LinkedIn search result © 2013 by The 451 Group. All rights reserved
  • 36. Conclusions  Hadoop’s greatest asset is its flexibility, but that is also a challenge, and can lead to confusion and disillusionment among later adopters  Hadoop is enabling greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies  Storage, processing, and analyzing of data is a process that has enabled early adopters to understand Hadoop’s role in the wider landscape  Attempting to fast forward to analytics, missing out the processing/integration stage, creates silos and will result in disillusionment  The Hadoop ecosystem is vibrant, with strength in depth, and breadth  Growing mainstream interest and geographic adoption means Hadoop is well-positioned to cross the chasm into mainstream adoption © 2013 by The 451 Group. All rights reserved
  • 37. Questions? Comments? © 2013 by The 451 Group. All rights reserved