Support Questions

Find answers, ask questions, and share your expertise

HIVE ACID table - Not enough history available for (0,x) Oldest available base

avatar
Expert Contributor

I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.

I set these ACID settings in dev cluster:

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 
hive.support.concurrency=true 
hive.enforce.bucketing=true 
hive.exec.dynamic.partition.mode=nonstrict 
hive.compactor.initiator.on=true 
hive.compactor.worker.threads=3

then I import table from prod to dev:

hive> export table hana.easy_check to 'export/easy_check'; 
hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/ 
hive> import from 'export/easy_check';

However, when I run any sql query on this table in dev cluster I get an error:

2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
	at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
	at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
	at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
	at java.util.concurrent.FutureTask.run(FutureTask.java:271)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
	... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
	... 4 more

What is wrong?

Both Hive 1.2.1

# Detailed Table Information
Database:               hana
Owner:                  hive
CreateTime:             Wed Apr 19 13:27:00 MSK 2017
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type:             MANAGED_TABLE
Table Parameters:
        NO_AUTO_COMPACTION      false
        compactor.mapreduce.map.memory.mb       2048
        compactorthreshold.hive.compactor.delta.num.threshold   4
        compactorthreshold.hive.compactor.delta.pct.threshold   0.3
        last_modified_by        hive
        last_modified_time      1489647024
        orc.bloom.filter.columns        calday, request, material
        orc.compress            ZLIB
        orc.compress.size       262144
        orc.create.index        true
        orc.row.index.stride    5000
        orc.stripe.size         67108864
        transactional           true
        transient_lastDdlTime   1492597620

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            1
Bucket Columns:         [material]
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
1 ACCEPTED SOLUTION

avatar
Super Collaborator

This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.

avatar
Expert Contributor
@Eugene Koifman

Thanks for the clarification!

avatar
Contributor

@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?

avatar
Super Collaborator

There isn't. Perhaps @thejas has a recommendation.

avatar
Explorer

The import/export can be time consuming, you could try distcp'ing the non-transactional partitions over to the DR non-transactional and using MSCK REPAIR TABLE to pick them up? You'd still need to run the copy from non-tranasctional to transactional again.

avatar
Expert Contributor

Hi all!

Where is I can find information about limitation Hadoop Distcp ?

(transactional / non-transactional etc )