Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10880

The bucket number is not respected in insert overwrite.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.2.0
    • 1.3.0, 2.0.0
    • None
    • None

    Description

      When hive.enforce.bucketing is true, the bucket number defined in the table is no longer respected in current master and 1.2.
      Reproduce:

      CREATE TABLE IF NOT EXISTS buckettestinput( 
      data string 
      ) 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      
      CREATE TABLE IF NOT EXISTS buckettestoutput1( 
      data string 
      )CLUSTERED BY(data) 
      INTO 2 BUCKETS 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      
      CREATE TABLE IF NOT EXISTS buckettestoutput2( 
      data string 
      )CLUSTERED BY(data) 
      INTO 2 BUCKETS 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      

      Then I inserted the following data into the "buckettestinput" table:

      firstinsert1 
      firstinsert2 
      firstinsert3 
      firstinsert4 
      firstinsert5 
      firstinsert6 
      firstinsert7 
      firstinsert8 
      secondinsert1 
      secondinsert2 
      secondinsert3 
      secondinsert4 
      secondinsert5 
      secondinsert6 
      secondinsert7 
      secondinsert8
      
      set hive.enforce.bucketing = true; 
      set hive.enforce.sorting=true;
      insert overwrite table buckettestoutput1 
      select * from buckettestinput where data like 'first%';
      set hive.auto.convert.sortmerge.join=true; 
      set hive.optimize.bucketmapjoin = true; 
      set hive.optimize.bucketmapjoin.sortedmerge = true; 
      select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
      
      Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 1 (state=42000,code=10141)
      

      The related debug information related to insert overwrite:

      0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1 
      select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1 
      0: jdbc:hive2://localhost:10000> ;
      select * from buckettestinput where data like ' 
      first%';
      INFO  : Number of reduce tasks determined at compile time: 2
      INFO  : In order to change the average load for a reducer (in bytes):
      INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
      INFO  : In order to limit the maximum number of reducers:
      INFO  :   set hive.exec.reducers.max=<number>
      INFO  : In order to set a constant number of reducers:
      INFO  :   set mapred.reduce.tasks=<number>
      INFO  : Job running in-process (local Hadoop)
      INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
      INFO  : Ended Job = job_local107155352_0001
      INFO  : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000
      INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52, rawDataSize=48]
      No rows affected (1.692 seconds)
      

      Insert use dynamic partition does not have the issue.

      Attachments

        1. HIVE-10880.1.patch
          9 kB
          Yongzhi Chen
        2. HIVE-10880.2.patch
          62 kB
          Yongzhi Chen
        3. HIVE-10880.3.patch
          63 kB
          Yongzhi Chen
        4. HIVE-10880.4.patch
          62 kB
          Yongzhi Chen

        Issue Links

          Activity

            People

              ychena Yongzhi Chen
              ychena Yongzhi Chen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: