Created 04-19-2017 11:04 AM
I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.
I set these ACID settings in dev cluster:
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.support.concurrency=true hive.enforce.bucketing=true hive.exec.dynamic.partition.mode=nonstrict hive.compactor.initiator.on=true hive.compactor.worker.threads=3
then I import table from prod to dev:
hive> export table hana.easy_check to 'export/easy_check'; hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/ hive> import from 'export/easy_check';
However, when I run any sql query on this table in dev cluster I get an error:
2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
	at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
	at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
	at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
	at java.util.concurrent.FutureTask.run(FutureTask.java:271)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
	... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x).  Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
	at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
	... 4 moreWhat is wrong?
Both Hive 1.2.1
# Detailed Table Information
Database:               hana
Owner:                  hive
CreateTime:             Wed Apr 19 13:27:00 MSK 2017
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type:             MANAGED_TABLE
Table Parameters:
        NO_AUTO_COMPACTION      false
        compactor.mapreduce.map.memory.mb       2048
        compactorthreshold.hive.compactor.delta.num.threshold   4
        compactorthreshold.hive.compactor.delta.pct.threshold   0.3
        last_modified_by        hive
        last_modified_time      1489647024
        orc.bloom.filter.columns        calday, request, material
        orc.compress            ZLIB
        orc.compress.size       262144
        orc.create.index        true
        orc.row.index.stride    5000
        orc.stripe.size         67108864
        transactional           true
        transient_lastDdlTime   1492597620
# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            1
Bucket Columns:         [material]
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
					
				
			
			
				
			
			
			
			
			
			
			
		Created 07-10-2017 05:21 PM
This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.
Created 07-10-2017 05:21 PM
This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.
Created 07-11-2017 12:11 AM
Thanks for the clarification!
Created 09-14-2017 04:03 AM
@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?
Created 09-14-2017 04:10 PM
There isn't. Perhaps @thejas has a recommendation.
Created 11-08-2017 09:43 AM
The import/export can be time consuming, you could try distcp'ing the non-transactional partitions over to the DR non-transactional and using MSCK REPAIR TABLE to pick them up? You'd still need to run the copy from non-tranasctional to transactional again.
Created 11-15-2017 07:28 AM
Hi all!
Where is I can find information about limitation Hadoop Distcp ?
(transactional / non-transactional etc )
 
					
				
				
			
		
