Support Questions

Find answers, ask questions, and share your expertise

Add an auto_inceremnt ID to a Hive table

avatar
Explorer

Hello every body,

 

I have a Hive table and trying to find a solution to add an incremental primary key to it. Here is my solution:

 

 create table new_table as select row_number() over () as ID, * from old_table;

 

It will create a new table with a new incremental column(ID). Its working well on small tables but when I'm running it on a bigger table(20M records/500 columns), it will fail with this message:

 

Examining task ID: task_1444013233108_0091_r_000558 (and more) from job job_1444013233108_0091
Examining task ID: task_1444013233108_0091_r_000000 (and more) from job job_1444013233108_0091

Task with the most failures(4):
-----
Task ID:
  task_1444013233108_0091_r_000000

URL:
  http://chd.moneyball.guru:8088/taskdetails.jsp?jobid=job_1444013233108_0091&tipid=task_1444013233108...
-----
Diagnostic Messages for this Task:
Exception from container-launch.
Container id: container_1444013233108_0091_01_000715
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 255


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 142  Reduce: 568   Cumulative CPU: 9791.07 sec   HDFS Read: 38198769932 HDFS Write: 54432 FAIL
Total MapReduce CPU Time Spent: 0 days 2 hours 43 minutes 11 seconds 70 msec

 

-----------------------------

I also tried to limit the number of records in select command:

 create table new_table as select row_number() over () as ID, * from old_table limit 1000;

 

Do you have any idea about this error?

 

Thanks

2 REPLIES 2

avatar
Explorer

Can you try using cluster by:

 

 create table new_table as select row_number() over () as ID, * from old_table cluster by ID;

avatar
New Contributor

Did you find the solution I have the same problem