Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

Solved Go to solution
Highlighted

​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

New Contributor

Versions:

HDP HDP-2.5.0.0

Hive 1.2.1.2.5

Hive llap2.1

Hi,

I migrated cluster to HDP 2.5 and then I switch HiveQL Queries to run on Hive LLAP. Now, I have a problem with Hive Compactor and SparkSQL because they can't read Hive transactional table loaded with INSERT statements run on Hive LLAP (2.1). The table is the same but new insert produces different delta directory name. Delta directory name now has suffix _0000. Hive Compactor and SparkSQL can't parse number xxxxx_0000 which is part of delta directory name.

This is table properties:

PARTITIONED BY(
  content_id int,
  analytic_date date'
  )
CLUSTERED BY(document_id) INTO 2 BUCKETS
stored as orc
tblproperties (
  "orc.compress"="NONE",
  "transactional"="true"
  );

This is Compactor error from Hive Metastore:

2017-08-01 00:01:37,186 ERROR [Thread-14]: compactor.Initiator (Initiator.java:run(153)) - Caught exception while trying to determine if we should compact id:0,dbname:jupiter_one,tableName:j1_doc,partName:content_id=107/analytic_date=2017-07-31,state:^@,type:null,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.  Marking failed to avoid repeated failures, java.lang.NumberFormatException: For input string: "23569546_0000"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:348)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:437)
        at org.apache.hadoop.hive.ql.txn.compactor.Initiator.determineCompactionType(Initiator.java:256)
        at org.apache.hadoop.hive.ql.txn.compactor.Initiator.checkForCompaction(Initiator.java:229)
        at org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:150)
  • This is table directory structure when I run INSERT statment in Hive Interactive LLAP(2.1): (not work)

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00000 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00001 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00002 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569546_23569546_0000/bucket_00003

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569548_23569548_0000

  • This is table directory structure when I run INSERT statment in Hive 1.2: (work):

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00000 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00001 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00002 /apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569541_23569541/bucket_00003

/apps/hive/warehouse/jupiter_one.db/j1_doc/content_id=100/analytic_date=2017-07-31/delta_23569542_23569542

Is there a possibility to avoid _0000 suffix in delta directory name? In delta_23569546_23569546_0000 the first two numbers are transactional id. What _0000 means in the name of delta directory.

Is this a bug or a feature? Can you advise me how to deal with this problem?

We have a bulk insert of 250MB data from staging table every 15 minutes. If I change some properties will Hive llap write normally buckets.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

Expert Contributor

This suffix is a feature when you are using LLAP and there is no way to avoid it. Is upgrading to HDP 2.6 an option? Compactor in 2.6 is able to handle it. If you make the target table transactional=false it won't be creating any delta directories. If you use transactional=true but don't go through LLAP on 2.5 you won't see this suffix.

View solution in original post

6 REPLIES 6
Highlighted

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

Expert Contributor

This suffix is a feature when you are using LLAP and there is no way to avoid it. Is upgrading to HDP 2.6 an option? Compactor in 2.6 is able to handle it. If you make the target table transactional=false it won't be creating any delta directories. If you use transactional=true but don't go through LLAP on 2.5 you won't see this suffix.

View solution in original post

Highlighted

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

New Contributor

For now, an upgrade is not the option. Ok, I will use HIve llap only for SELECT queries.

Today we try that flow transactional=true, HIve 1.2 (not llap), but 70% of the compactions failed. There is another problem Spark 1.6.2. returns the wrong result. It's a bit different then results from Hive(we try 1.2 and llap).

Thanks for the answer.

Highlighted

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

Expert Contributor

since you already created directories in delta_23569546_23569546_0000 format, the compactor can't understand then. if for each X in delta_X_X you only have 1 directory (which should be the case) you can just rename it by stripping the suffix. This should let the compactor proceed. This will interfere with ongoing queries of course.

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

New Contributor

Yeah, ok. I already solve this problem. I rename delta files delta_23569546_23569546_0000 to delta_23569546_23569546. I create new table and insert date from old table in new one. Now, my insert is run on Hive 1.2.

Highlighted

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

New Contributor

Hi,

we are facing the same issue.... can you help us on how to rename the delta file names. to workout on top of this!

Thank you very much.

Highlighted

Re: ​Compactor and SparkSQL do not work after data is inserted into a transactional table with INSERT statement ran on Hive LLAP! NumberFormatException delta directory name

New Contributor

Hi,

we are facing the same issue.... can you help us on how to rename the delta file names. to workout on top of this!

Thank you very much.

Don't have an account?
Coming from Hortonworks? Activate your account here