Created 04-19-2017 11:04 AM
I'm trying to copy a transaction table from a production cluster HDP 2.5 to a dev cluster HDP 2.6.
I set these ACID settings in dev cluster:
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.support.concurrency=true hive.enforce.bucketing=true hive.exec.dynamic.partition.mode=nonstrict hive.compactor.initiator.on=true hive.compactor.worker.threads=3
then I import table from prod to dev:
hive> export table hana.easy_check to 'export/easy_check'; hadoop distcp -prbugp hdfs://hdp-nn1:8020/user/hive/export/easy_check/ hdfs://dev-nn2:8020/user/hive/export/ hive> import from 'export/easy_check';
However, when I run any sql query on this table in dev cluster I get an error:
2017-04-19 11:08:33,879 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: easy_check initializer failed, vertex=vertex_1492584180580_0005_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: serious problem
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:319)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1140)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:150)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:135)
at com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
at java.util.concurrent.FutureTask.setException(FutureTask.java:251)
at java.util.concurrent.FutureTask.run(FutureTask.java:271)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Not enough history available for (0,x). Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1235)
... 15 more
Caused by: java.io.IOException: Not enough history available for (0,x). Oldest available base: hdfs://development/apps/hive/warehouse/hana.db/easy_check/ym=2017-01/base_0001497
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:594)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:773)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$600(OrcInputFormat.java:738)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:763)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:760)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:760)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:738)
... 4 moreWhat is wrong?
Both Hive 1.2.1
# Detailed Table Information
Database: hana
Owner: hive
CreateTime: Wed Apr 19 13:27:00 MSK 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://development/apps/hive/warehouse/hana.db/easy_check
Table Type: MANAGED_TABLE
Table Parameters:
NO_AUTO_COMPACTION false
compactor.mapreduce.map.memory.mb 2048
compactorthreshold.hive.compactor.delta.num.threshold 4
compactorthreshold.hive.compactor.delta.pct.threshold 0.3
last_modified_by hive
last_modified_time 1489647024
orc.bloom.filter.columns calday, request, material
orc.compress ZLIB
orc.compress.size 262144
orc.create.index true
orc.row.index.stride 5000
orc.stripe.size 67108864
transactional true
transient_lastDdlTime 1492597620
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 1
Bucket Columns: [material]
Sort Columns: []
Storage Desc Params:
serialization.format 1
Created 07-10-2017 05:21 PM
This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.
Created 07-10-2017 05:21 PM
This is not supported. Transactional table data cannot be simply copied from cluster to cluster. Each cluster maintains a global transaction ID sequence which is embedded in the data files and file names of transactional tables. Copying the data files confuses the target system. The only way to do this right now is to copy the data to a non-acid table on source cluster using "Insert ... Select..." and then using import/export to transfer it to target side.
Created 07-11-2017 12:11 AM
Thanks for the clarification!
Created 09-14-2017 04:03 AM
@Eugene Koifman is there any other workaround possible that could cut down the time to go through the procedure of replicating an acid table to a secondary cluster? What is the recommendation for DR on acid tables ?
Created 09-14-2017 04:10 PM
There isn't. Perhaps @thejas has a recommendation.
Created 11-08-2017 09:43 AM
The import/export can be time consuming, you could try distcp'ing the non-transactional partitions over to the DR non-transactional and using MSCK REPAIR TABLE to pick them up? You'd still need to run the copy from non-tranasctional to transactional again.
Created 11-15-2017 07:28 AM
Hi all!
Where is I can find information about limitation Hadoop Distcp ?
(transactional / non-transactional etc )