Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE ACID table - Not enough history available for (0,x) Oldest available base

Solved Go to solution

HIVE ACID table - Not enough history available for (0,x) Oldest available base

Rising Star

I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.

I set these ACID settings in dev cluster:

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 
hive.support.concurrency=true 
hive.enforce.bucketing=true 
hive.exec.dynamic.partition.mode=nonstrict 
hive.compactor.initiator.on=true 
hive.compactor.worker.threads=3

then I import table from prod to dev:

hive> export table hana.easy_check to 'export/easy_check'; 
hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/ 
hive> import from 'export/easy_check';

However, when I run any sql query on this table in dev cluster I get an error:

2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
	at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
	at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
	at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
	at java.util.concurrent.FutureTask.run(FutureTask.java:271)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
	... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
	... 4 more

What is wrong?

Both Hive 1.2.1

# Detailed Table Information
Database:               hana
Owner:                  hive
CreateTime:             Wed Apr 19 13:27:00 MSK 2017
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type:             MANAGED_TABLE
Table Parameters:
        NO_AUTO_COMPACTION      false
        compactor.mapreduce.map.memory.mb       2048
        compactorthreshold.hive.compactor.delta.num.threshold   4
        compactorthreshold.hive.compactor.delta.pct.threshold   0.3
        last_modified_by        hive
        last_modified_time      1489647024
        orc.bloom.filter.columns        calday, request, material
        orc.compress            ZLIB
        orc.compress.size       262144
        orc.create.index        true
        orc.row.index.stride    5000
        orc.stripe.size         67108864
        transactional           true
        transient_lastDdlTime   1492597620

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            1
Bucket Columns:         [material]
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
1 ACCEPTED SOLUTION

Accepted Solutions

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

Expert Contributor

This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.

6 REPLIES 6

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

Expert Contributor

This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

Rising Star
@Eugene Koifman

Thanks for the clarification!

Highlighted

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

New Contributor

@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

Expert Contributor

There isn't. Perhaps @thejas has a recommendation.

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

New Contributor

The import/export can be time consuming, you could try distcp'ing the non-transactional partitions over to the DR non-transactional and using MSCK REPAIR TABLE to pick them up? You'd still need to run the copy from non-tranasctional to transactional again.

Re: HIVE ACID table - Not enough history available for (0,x) Oldest available base

Contributor

Hi all!

Where is I can find information about limitation Hadoop Distcp ?

(transactional / non-transactional etc )

Don't have an account?
Coming from Hortonworks? Activate your account here