Support Questions
Find answers, ask questions, and share your expertise

Creating new table in Hive using data of the older table

We have a table in Hive which is partitioned on one column . It is holding over 2 TBs of data .

We want to create a new table which has 3 partitions and we want add data from the older table to new one.

What approach should we take ?

P.S.- We are using HDP 2.3

3 REPLIES 3

Re: Creating new table in Hive using data of the older table

Super Guru

@Krupal Jagtap

I was assuming that your new table have 3 partition columns.


Best option would be Using Insert statement to dynamically create partitions in new table

Ex:

Insert into new_table partition(partition_col1,partition_col2,partition_col3) 
select non_partition_cols,partition_col1,partition_col2,partition_col3 from old_table;

Use distributeBy,sortBy to control number of files created in the directory.

Refer to this link for more details regards to distributeBy,sortBy.

Re: Creating new table in Hive using data of the older table

Hi @Shu


we tried the approach and got following error


1)

hive> Insert into new_table partition( new_partition1,new_partition2, new_partition3)

> select col1,

> col2 ,

> col3 ,

> colX ,

> new_partition1 ,

> new_partition2 ,

> new_partition3 from old_table where createdate='2016-11-09' ;

FAILED: ArrayIndexOutOfBoundsException -1

Re: Creating new table in Hive using data of the older table

2)

hive> Insert into new_table partition( new_partition1,new_partition2, new_partition3)

> select col1,

> col2 ,

> col3 ,

> colX ,

> new_partition1 ,

> new_partition2 ,

> new_partition3 from old_table;


Query ID = hdfs_20190514114256_81a413f7-49eb-4460-a16f-4bef38f7954a

Total jobs = 1

Launching Job 1 out of 1

Tez session was closed. Reopening...

Session re-established.



Status: Running (Executing on YARN cluster with App id application_1538560024513_0256)


--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 FAILED 10 0 0 10 4 0

Reducer 2 KILLED 302 0 0 302 0 0

--------------------------------------------------------------------------------

VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 5.35 s

--------------------------------------------------------------------------------

Status: Failed

Vertex failed, vertexName=Map 1, vertexId=vertex_1538560024513_0256_1_00, diagnostics=[Task failed, taskId=task_1538560024513_0256_1_00_000002, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000002 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835643652 found 1557834788261

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

], TaskAttempt 1 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000009 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835644617 found 1557834789325

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

], TaskAttempt 2 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000010 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835645700 found 1557834790432

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

], TaskAttempt 3 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000013 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835646991 found 1557834791443

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:9, Vertex vertex_1538560024513_0256_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]

Vertex killed, vertexName=Reducer 2, vertexId=vertex_1538560024513_0256_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:302, Vertex vertex_1538560024513_0256_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1538560024513_0256_1_00, diagnostics=[Task failed, taskId=task_1538560024513_0256_1_00_000002, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000002 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835643652 found 1557834788261

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

], TaskAttempt 1 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000009 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835644617 found 1557834789325

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

], TaskAttempt 2 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000010 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835645700 found 1557834790432

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

], TaskAttempt 3 failed, info=[Container launch failed for container_e486_1538560024513_0256_02_000013 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.

This token is expired. current time is 1557835646991 found 1557834791443

Note: System times on machines may be out of sync. Check system time and time zones.

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)

at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:180)

at org.apache.tez.dag.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:384)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:9, Vertex vertex_1538560024513_0256_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1538560024513_0256_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:302, Vertex vertex_1538560024513_0256_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1