Support Questions

davorin · ‎08-01-2017

Versions:

HDP HDP-2.5.0.0

Hive 1.2.1.2.5

Hive llap2.1

Hi,

I migrated cluster to HDP 2.5 and then I switch HiveQL Queries to run on Hive LLAP. Now, I have a problem with Hive Compactor and SparkSQL because they can't read Hive transactional table loaded with INSERT statements run on Hive LLAP (2.1). The table is the same but new insert produces different delta directory name. Delta directory name now has suffix _0000. Hive Compactor and SparkSQL can't parse number xxxxx_0000 which is part of delta directory name.

This is table properties:

PARTITIONED BY(
  content_id int,
  analytic_date date'
  )
CLUSTERED BY(document_id) INTO 2 BUCKETS
stored as orc
tblproperties (
  "orc.compress"="NONE",
  "transactional"="true"
  );

This is Compactor error from Hive Metastore:

2017-08-01 00:01:37,186 ERROR [Thread-14]: compactor.Initiator (Initiator.java:run(153)) - Caught exception while trying to determine if we should compact id:0,dbname:jupiter_one,tableName:j1_doc,partName:content_id=107/analytic_date=2017-07-31,state:^@,type:null,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.  Marking failed to avoid repeated failures, java.lang.NumberFormatException: For input string: "23569546_0000"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:348)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:437)
        at org.apache.hadoop.hive.ql.txn.compactor.Initiator.determineCompactionType(Initiator.java:256)
        at org.apache.hadoop.hive.ql.txn.compactor.Initiator.checkForCompaction(Initiator.java:229)
        at org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:150)

This is table directory structure when I run INSERT statment in Hive Interactive LLAP(2.1): (not work)

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00000 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00001 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00002 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00003

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569548_23569548_0000

This is table directory structure when I run INSERT statment in Hive 1.2: (work):

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00000 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00001 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00002 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00003

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569542_23569542

Is there a possibility to avoid _0000 suffix in delta directory name? In delta_23569546_23569546_0000 the first two numbers are transactional id. What _0000 means in the name of delta directory.

Is this a bug or a feature? Can you advise me how to deal with this problem?

We have a bulk insert of 250MB data from staging table every 15 minutes. If I change some properties will Hive llap write normally buckets.

ekoifman · ‎08-01-2017

This suffix is a feature when you are using LLAP and there is no way to avoid it. Is upgrading to HDP 2.6 an option? Compactor in 2.6 is able to handle it. If you make the target table transactional=false it won't be creating any delta directories. If you use transactional=true but don't go through LLAP on 2.5 you won't see this suffix.

View solution in original post

ekoifman · ‎08-01-2017

This suffix is a feature when you are using LLAP and there is no way to avoid it. Is upgrading to HDP 2.6 an option? Compactor in 2.6 is able to handle it. If you make the target table transactional=false it won't be creating any delta directories. If you use transactional=true but don't go through LLAP on 2.5 you won't see this suffix.

davorin · ‎08-02-2017

For now, an upgrade is not the option. Ok, I will use HIve llap only for SELECT queries.

Today we try that flow transactional=true, HIve 1.2 (not llap), but 70% of the compactions failed. There is another problem Spark 1.6.2. returns the wrong result. It's a bit different then results from Hive(we try 1.2 and llap).

Thanks for the answer.

ekoifman · ‎08-02-2017

since you already created directories in delta_23569546_23569546_0000 format, the compactor can't understand then. if for each X in delta_X_X you only have 1 directory (which should be the case) you can just rename it by stripping the suffix. This should let the compactor proceed. This will interfere with ongoing queries of course.

davorin · ‎08-03-2017

Yeah, ok. I already solve this problem. I rename delta files delta_23569546_23569546_0000 to delta_23569546_23569546. I create new table and insert date from old table in new one. Now, my insert is run on Hive 1.2.

SY0C64110 · ‎08-20-2018

Hi,

we are facing the same issue.... can you help us on how to rename the delta file names. to workout on top of this!

Thank you very much.

SY0C64110 · ‎08-20-2018

Hi,

we are facing the same issue.... can you help us on how to rename the delta file names. to workout on top of this!

Thank you very much.

Cloudera Community

Support Questions

Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

Support Questions

​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name