About rezab

rezab · ‎10-07-2015

Hello every body, I have a Hive table and trying to find a solution to add an incremental primary key to it. Here is my solution: create table new_table as select row_number() over () as ID, * from old_table; It will create a new table with a new incremental column(ID). Its working well on small tables but when I'm running it on a bigger table(20M records/500 columns), it will fail with this message: Examining task ID: task_1444013233108_0091_r_000558 (and more) from job job_1444013233108_0091 Examining task ID: task_1444013233108_0091_r_000000 (and more) from job job_1444013233108_0091 Task with the most failures(4): ----- Task ID: task_1444013233108_0091_r_000000 URL: http://chd.moneyball.guru:8088/taskdetails.jsp?jobid=job_1444013233108_0091&tipid=task_1444013233108_0091_r_000000 ----- Diagnostic Messages for this Task: Exception from container-launch. Container id: container_1444013233108_0091_01_000715 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 255 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 142 Reduce: 568 Cumulative CPU: 9791.07 sec HDFS Read: 38198769932 HDFS Write: 54432 FAIL Total MapReduce CPU Time Spent: 0 days 2 hours 43 minutes 11 seconds 70 msec ----------------------------- I also tried to limit the number of records in select command: create table new_table as select row_number() over () as ID, * from old_table limit 1000; Do you have any idea about this error? Thanks

rezab · ‎09-13-2015

Thanks for your reply. You are right. I saw this in executor logs: Exception in thread "qtp1529675476-45" java.lang.OutOfMemoryError: GC overhead limit exceeded Exception in thread "Spark Context Cleaner" java.lang.OutOfMemoryError: GC overhead limit exceeded Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError: GC overhead limit exceeded What can I do to fix this? I'm using Spark on YARN and spark memory allocation is dynamic. Also my Hive table is around 70G. Does it mean that I need 70G memory for spark to process them?

rezab · ‎09-12-2015

Hi, I'm trying to run a Spark application on YARN in a single node instance with 32G RAM. Its working well for a small dataset. But for a bigger table its failing with this error: Application application_1442094222971_0008 failed 2 times due to AM Container for appattempt_1442094222971_0008_000002 exited with exitCode: 11 For more detailed output, check application tracking page:http://chd.moneyball.guru:8088/proxy/application_1442094222971_0008/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1442094222971_0008_02_000001 Exit code: 11 Stack trace: ExitCodeException exitCode=11: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 11 Failing this attempt. Failing the application. ----------------------------- Here is the stdout of container: [2015-09-12T20:53:28.368-04:00] [DataProcessing] [WARN] [] [org.apache.spark.Logging$class] [tid:Driver Heartbeater] [userID:yarn] Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@2c3b1696,BlockManagerId(2, chd.moneyball.guru, 60663))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) [2015-09-12T20:53:32.314-04:00] [DataProcessing] [WARN] [] [org.apache.spark.Logging$class] [tid:Executor task launch worker-0] [userID:yarn] Error sending message [message = GetLocations(rdd_4_1839)] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221) at org.apache.spark.storage.BlockManagerMaster.getLocations(BlockManagerMaster.scala:70) at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:591) at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:578) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:622) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2015-09-12T20:53:42.123-04:00] [DataProcessing] [ERROR] [] [org.apache.spark.Logging$class] [tid:sparkExecutor-akka.actor.default-dispatcher-3] [userID:yarn] Driver Disassociated [akka.tcp://sparkExecutor@chd.moneyball.guru:38443] -> [akka.tcp://sparkDriver@chd.moneyball.guru:43977] disassociated! Shutting down. Any help? Thanks

Online	Offline
Last Visited	‎11-23-2015 04:35 PM

Member Since	‎08-29-2015 09:46 AM
Last Visited	‎11-23-2015 04:35 PM
Posts	5

Cloudera Community

Add an auto_inceremnt ID to a Hive table

Re: Spark not working when I'm using a big dataset

Spark not working when I'm using a big dataset