Created 05-13-2019 10:53 AM
We have a table in Hive which is partitioned on one column . It is holding over 2 TBs of data .
We want to create a new table which has 3 partitions and we want add data from the older table to new one.
What approach should we take ?
P.S.- We are using HDP 2.3
Created 05-14-2019 02:29 AM
I was assuming that your new table have 3 partition columns.
Best option would be Using Insert statement to dynamically create partitions in new table
Ex:
Insert into new_table partition(partition_col1,partition_col2,partition_col3) select non_partition_cols,partition_col1,partition_col2,partition_col3 from old_table;
Use distributeBy,sortBy to control number of files created in the directory.
Refer to this link for more details regards to distributeBy,sortBy.
Created 05-15-2019 08:59 AM
Hi @Shu
we tried the approach and got following error
1)
hive> Insert into new_table partition( new_partition1,new_partition2, new_partition3)
> select col1,
> col2 ,
> col3 ,
> colX ,
> new_partition1 ,
> new_partition2 ,
> new_partition3 from old_table where createdate='2016-11-09' ;
FAILED: ArrayIndexOutOfBoundsException -1
Created 05-15-2019 04:58 PM
2)
hive> Insert into new_table partition( new_partition1,new_partition2, new_partition3)
> select col1,
> col2 ,
> col3 ,
> colX ,
> new_partition1 ,
> new_partition2 ,
> new_partition3 from old_table;
Query ID = hdfs_20190514114256_81a413f7-49eb-4460-a16f-4bef38f7954a
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1538560024513_0256)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED 10 0 0 10 4 0
Reducer 2 KILLED 302 0 0 302 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 5.35 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1538560024513_0256_1_00, diagnostics=[Task failed, taskId=task_1538560024513_0256_1_00_000002, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000002 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835643652 found 1557834788261
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000009 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835644617 found 1557834789325
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000010 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835645700 found 1557834790432
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000013 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835646991 found 1557834791443
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:9, Vertex vertex_1538560024513_0256_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1538560024513_0256_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:302, Vertex vertex_1538560024513_0256_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1538560024513_0256_1_00, diagnostics=[Task failed, taskId=task_1538560024513_0256_1_00_000002, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000002 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835643652 found 1557834788261
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000009 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835644617 found 1557834789325
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000010 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835645700 found 1557834790432
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000013 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1557835646991 found 1557834791443
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)
at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:9, Vertex vertex_1538560024513_0256_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1538560024513_0256_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:302, Vertex vertex_1538560024513_0256_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1