SlideShare a Scribd company logo
1 of 41
Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
“Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
17 Semantic Analyzer (1/2) AST QB + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN     + TOK_TABREF         …     + TOK_TABREF         …     + “=”         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
18 Semantic Analyzer (2/2) AST QB       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB     + TOK_TABNAME         +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
19 Semantic Analyzer (2/2) AST QB       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT     + TOK_SELEXPR         …      + TOK_SELEXPR         …     + TOK_SELEXPR         …     + TOK_SELEXPR         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo  + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT     + TOK_SELEXPR         + "."              + TOK_TABLE_OR_COL                  + "a"              + "user"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "a"              + "prono"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "maker"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”=     Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Plan Generator (result) 24 LCF  OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6)  StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF  Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
End 7/6/2011 37
Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3             File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)

More Related Content

What's hot

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache HiveDataWorks Summit
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Sandesh Rao
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 

What's hot (20)

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 

Similar to Internal Hive

Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinChad Cooper
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Ilya Grigorik
 
Python 3000
Python 3000Python 3000
Python 3000Bob Chao
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Roel Hartman
 
Computer science project work
Computer science project workComputer science project work
Computer science project workrahulchamp2345
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing frameworkIndicThreads
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеYandex
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerConnor McDonald
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly LanguageMotaz Saad
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database ConnectivityRanjan Kumar
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparqlkwangsub kim
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?brynary
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkChristian Trabold
 

Similar to Internal Hive (20)

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Hive_p
Hive_pHive_p
Hive_p
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
 
Python 3000
Python 3000Python 3000
Python 3000
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
 
Code Management
Code ManagementCode Management
Code Management
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
CloudKit
CloudKitCloudKit
CloudKit
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
 
Html5
Html5Html5
Html5
 
Php
PhpPhp
Php
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
 

More from Recruit Technologies

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場Recruit Technologies
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びRecruit Technologies
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Recruit Technologies
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話Recruit Technologies
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所Recruit Technologies
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Recruit Technologies
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例Recruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後Recruit Technologies
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Recruit Technologies
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するRecruit Technologies
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントRecruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルRecruit Technologies
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~Recruit Technologies
 

More from Recruit Technologies (20)

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Internal Hive

  • 1. Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
  • 3. Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
  • 4. Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
  • 5. Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
  • 6. Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
  • 7. Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
  • 8. Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
  • 9. Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
  • 10. Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
  • 11. “Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
  • 13. Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
  • 14. Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 15. Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 16. Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 17. 17 Semantic Analyzer (1/2) AST QB + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
  • 18. 18 Semantic Analyzer (2/2) AST QB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB + TOK_TABNAME +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
  • 19. 19 Semantic Analyzer (2/2) AST QB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
  • 20. 20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
  • 21. 21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 22. 22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 23. 23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”= Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 24. Logical Plan Generator (result) 24 LCF OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 25. Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
  • 26. Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
  • 27. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
  • 29. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
  • 30. 30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6) StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
  • 31. OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
  • 32. 32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 33. 33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
  • 34. 34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
  • 35. In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
  • 36. In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 38. Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
  • 39. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 40. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 41. Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)