Created on 10-07-2015 09:15 AM - edited 09-16-2022 02:43 AM
Hello every body,
I have a Hive table and trying to find a solution to add an incremental primary key to it. Here is my solution:
create table new_table as select row_number() over () as ID, * from old_table;
It will create a new table with a new incremental column(ID). Its working well on small tables but when I'm running it on a bigger table(20M records/500 columns), it will fail with this message:
Examining task ID: task_1444013233108_0091_r_000558 (and more) from job job_1444013233108_0091
Examining task ID: task_1444013233108_0091_r_000000 (and more) from job job_1444013233108_0091
Task with the most failures(4):
-----
Task ID:
task_1444013233108_0091_r_000000
URL:
http://chd.moneyball.guru:8088/taskdetails.jsp?jobid=job_1444013233108_0091&tipid=task_1444013233108...
-----
Diagnostic Messages for this Task:
Exception from container-launch.
Container id: container_1444013233108_0091_01_000715
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 255
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 142 Reduce: 568 Cumulative CPU: 9791.07 sec HDFS Read: 38198769932 HDFS Write: 54432 FAIL
Total MapReduce CPU Time Spent: 0 days 2 hours 43 minutes 11 seconds 70 msec
-----------------------------
I also tried to limit the number of records in select command:
create table new_table as select row_number() over () as ID, * from old_table limit 1000;
Do you have any idea about this error?
Thanks
Created 10-07-2015 02:16 PM
Can you try using cluster by:
create table new_table as select row_number() over () as ID, * from old_table cluster by ID;
Created 03-21-2017 07:46 AM
Did you find the solution I have the same problem