Created 08-01-2017 01:22 PM
Versions:
HDP HDP-2.5.0.0
Hive 1.2.1.2.5
Hive llap2.1
Hi,
I migrated cluster to HDP 2.5 and then I switch HiveQL Queries to run on Hive LLAP. Now, I have a problem with Hive Compactor and SparkSQL because they can't read Hive transactional table loaded with INSERT statements run on Hive LLAP (2.1). The table is the same but new insert produces different delta directory name. Delta directory name now has suffix _0000. Hive Compactor and SparkSQL can't parse number xxxxx_0000 which is part of delta directory name.
This is table properties:
PARTITIONED BY( content_id int, analytic_date date' ) CLUSTERED BY(document_id) INTO 2 BUCKETS stored as orc tblproperties ( "orc.compress"="NONE", "transactional"="true" );
This is Compactor error from Hive Metastore:
2017-08-01 00:01:37,186 ERROR [Thread-14]: compactor.Initiator (Initiator.java:run(153)) - Caught exception while trying to determine if we should compact id:0,dbname:jupiter_one,tableName:j1_doc,partName:content_id=107/analytic_date=2017-07-31,state:^@,type:null,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. Marking failed to avoid repeated failures, java.lang.NumberFormatException: For input string: "23569546_0000" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:348) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:437) at org.apache.hadoop.hive.ql.txn.compactor.Initiator.determineCompactionType(Initiator.java:256) at org.apache.hadoop.hive.ql.txn.compactor.Initiator.checkForCompaction(Initiator.java:229) at org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:150)
/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000
/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00000 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00001 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00002 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00003
/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569548_23569548_0000
/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541
/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00000 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00001 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00002 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00003
/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569542_23569542
Is there a possibility to avoid _0000 suffix in delta directory name? In delta_23569546_23569546_0000 the first two numbers are transactional id. What _0000 means in the name of delta directory.
Is this a bug or a feature? Can you advise me how to deal with this problem?
We have a bulk insert of 250MB data from staging table every 15 minutes. If I change some properties will Hive llap write normally buckets.
Created 08-01-2017 03:54 PM
This suffix is a feature when you are using LLAP and there is no way to avoid it. Is upgrading to HDP 2.6 an option? Compactor in 2.6 is able to handle it. If you make the target table transactional=false it won't be creating any delta directories. If you use transactional=true but don't go through LLAP on 2.5 you won't see this suffix.
Created 08-01-2017 03:54 PM
This suffix is a feature when you are using LLAP and there is no way to avoid it. Is upgrading to HDP 2.6 an option? Compactor in 2.6 is able to handle it. If you make the target table transactional=false it won't be creating any delta directories. If you use transactional=true but don't go through LLAP on 2.5 you won't see this suffix.
Created 08-02-2017 09:30 AM
For now, an upgrade is not the option. Ok, I will use HIve llap only for SELECT queries.
Today we try that flow transactional=true, HIve 1.2 (not llap), but 70% of the compactions failed. There is another problem Spark 1.6.2. returns the wrong result. It's a bit different then results from Hive(we try 1.2 and llap).
Thanks for the answer.
Created 08-02-2017 06:29 PM
since you already created directories in delta_23569546_23569546_0000 format, the compactor can't understand then. if for each X in delta_X_X you only have 1 directory (which should be the case) you can just rename it by stripping the suffix. This should let the compactor proceed. This will interfere with ongoing queries of course.
Created 08-03-2017 09:51 AM
Yeah, ok. I already solve this problem. I rename delta files delta_23569546_23569546_0000 to delta_23569546_23569546. I create new table and insert date from old table in new one. Now, my insert is run on Hive 1.2.
Created 08-20-2018 07:36 PM
Hi,
we are facing the same issue.... can you help us on how to rename the delta file names. to workout on top of this!
Thank you very much.
Created 08-20-2018 07:36 PM
Hi,
we are facing the same issue.... can you help us on how to rename the delta file names. to workout on top of this!
Thank you very much.