Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HIVE ACID table - Not enough history available for (0,x) Oldest available base

avatar
Expert Contributor

I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.

I set these ACID settings in dev cluster:

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 
hive.support.concurrency=true 
hive.enforce.bucketing=true 
hive.exec.dynamic.partition.mode=nonstrict 
hive.compactor.initiator.on=true 
hive.compactor.worker.threads=3

then I import table from prod to dev:

hive> export table hana.easy_check to 'export/easy_check'; 
hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/ 
hive> import from 'export/easy_check';

However, when I run any sql query on this table in dev cluster I get an error:

2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
	at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
	at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
	at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
	at java.util.concurrent.FutureTask.run(FutureTask.java:271)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
	... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
	... 4 more

What is wrong?

Both Hive 1.2.1

# Detailed Table Information
Database:               hana
Owner:                  hive
CreateTime:             Wed Apr 19 13:27:00 MSK 2017
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type:             MANAGED_TABLE
Table Parameters:
        NO_AUTO_COMPACTION      false
        compactor.mapreduce.map.memory.mb       2048
        compactorthreshold.hive.compactor.delta.num.threshold   4
        compactorthreshold.hive.compactor.delta.pct.threshold   0.3
        last_modified_by        hive
        last_modified_time      1489647024
        orc.bloom.filter.columns        calday, request, material
        orc.compress            ZLIB
        orc.compress.size       262144
        orc.create.index        true
        orc.row.index.stride    5000
        orc.stripe.size         67108864
        transactional           true
        transient_lastDdlTime   1492597620

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            1
Bucket Columns:         [material]
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
1 ACCEPTED SOLUTION

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
6 REPLIES 6

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor
@Eugene Koifman

Thanks for the clarification!

avatar
Contributor

@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?

avatar
Super Collaborator

There isn't. Perhaps @thejas has a recommendation.

avatar
Explorer

The import/export can be time consuming, you could try distcp'ing the non-transactional partitions over to the DR non-transactional and using MSCK REPAIR TABLE to pick them up? You'd still need to run the copy from non-tranasctional to transactional again.

avatar
Expert Contributor

Hi all!

Where is I can find information about limitation Hadoop Distcp ?

(transactional / non-transactional etc )