Member since
01-27-2017
6
Posts
0
Kudos Received
0
Solutions
02-20-2018
03:30 PM
still facing error in HDP2.6.4 & Livy 0.4.0, Steps: 1. <em>Livy session was created as spark.</em>
2.<em> Added scala jar(my function)</em>
3.<em> livyClient.run(livyJob()) </em>gives error: <em> 18/02/20 14:53:44 INFO InteractiveSession: Interactive session 12 created [appid: application_1518698685392_0040, owner: null, proxyUser: Some(hive), state: idle, kind: spark, info: {driverLogUrl=http://hdp04d03.fuzzyl.com:8042/node/containerlogs/container_e04_1518698685392_0040_01_000001/hive, sparkUiUrl=http://ambari04.fuzzyl.com:8088/proxy/application_1518698685392_0040/}]
18/02/20 14:53:44 INFO RSCClient: Received result for 51f4a85f-a499-4dc4-ad9f-9a0e7a42ea64
18/02/20 14:53:44 ERROR SessionServlet$: internal error
java.util.concurrent.ExecutionException: java.lang.RuntimeException: py4j.Py4JException: Error while obtaining a new communication channel
py4j.CallbackClient.getConnectionLock(CallbackClient.java:218)
py4j.CallbackClient.sendCommand(CallbackClient.java:337)
py4j.CallbackClient.sendCommand(CallbackClient.java:316)
py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:103)
com.sun.proxy.$Proxy24.getLocalTmpDirPath(Unknown Source)
org.apache.livy.repl.PythonInterpreter.addPyFile(PythonInterpreter.scala:264)
org.apache.livy.repl.ReplDriver$anonfun$addJarOrPyFile$1.apply(ReplDriver.scala:110)
org.apache.livy.repl.ReplDriver$anonfun$addJarOrPyFile$1.apply(ReplDriver.scala:110)
scala.Option.foreach(Option.scala:257)
org.apache.livy.repl.ReplDriver.addJarOrPyFile(ReplDriver.scala:110)
org.apache.livy.rsc.driver.JobContextImpl.addJarOrPyFile(JobContextImpl.java:100)
org.apache.livy.rsc.driver.AddJarJob.call(AddJarJob.java:39)
org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:57)
org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:34)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at org.apache.livy.rsc.JobHandleImpl.get(JobHandleImpl.java:60)
at org.apache.livy.server.interactive.InteractiveSession.addJar(InteractiveSession.scala:542)
at org.apache.livy.server.interactive.InteractiveSessionServlet.org$apache$livy$server$interactive$InteractiveSessionServlet$addJarOrPyFile(InteractiveSessionServlet.scala:241)
at org.apache.livy.server.interactive.InteractiveSessionServlet$anonfun$19$anonfun$apply$16.apply(InteractiveSessionServlet.scala:208)
at org.apache.livy.server.interactive.InteractiveSessionServlet$anonfun$19$anonfun$apply$16.apply(InteractiveSessionServlet.scala:207)
at org.apache.livy.server.interactive.SessionHeartbeatNotifier$anonfun$withModifyAccessSession$1.apply(SessionHeartbeat.scala:76)
at org.apache.livy.server.interactive.SessionHeartbeatNotifier$anonfun$withModifyAccessSession$1.apply(SessionHeartbeat.scala:74)
at org.apache.livy.server.SessionServlet.doWithSession(SessionServlet.scala:221)
at org.apache.livy.server.SessionServlet.withModifyAccessSession(SessionServlet.scala:212)
at org.apache.livy.server.interactive.InteractiveSessionServlet.org$apache$livy$server$interactive$SessionHeartbeatNotifier$super$withModifyAccessSession(InteractiveSessionServlet.scala:40)
at org.apache.livy.server.interactive.SessionHeartbeatNotifier$class.withModifyAccessSession(SessionHeartbeat.scala:74)
at org.apache.livy.server.interactive.InteractiveSessionServlet.withModifyAccessSession(InteractiveSessionServlet.scala:40)
at org.apache.livy.server.interactive.InteractiveSessionServlet$anonfun$19.apply(InteractiveSessionServlet.scala:207)
at org.apache.livy.server.interactive.InteractiveSessionServlet$anonfun$19.apply(InteractiveSessionServlet.scala:206)
at org.apache.livy.server.JsonServlet.org$apache$livy$server$JsonServlet$doAction(JsonServlet.scala:113)
at org.apache.livy.server.JsonServlet$anonfun$jpost$1.apply(JsonServlet.scala:75)
at org.scalatra.ScalatraBase$class.org$scalatra$ScalatraBase$liftAction(ScalatraBase.scala:270)
at org.scalatra.ScalatraBase$anonfun$invoke$1.apply(ScalatraBase.scala:265)
at org.scalatra.ScalatraBase$anonfun$invoke$1.apply(ScalatraBase.scala:265)
at org.scalatra.ApiFormats$class.withRouteMultiParams(ApiFormats.scala:178)
at org.apache.livy.server.JsonServlet.withRouteMultiParams(JsonServlet.scala:39)
at org.scalatra.ScalatraBase$class.invoke(ScalatraBase.scala:264)
at org.scalatra.ScalatraServlet.invoke(ScalatraServlet.scala:49)
at org.scalatra.ScalatraBase$anonfun$runRoutes$1$anonfun$apply$8.apply(ScalatraBase.scala:240)
at org.scalatra.ScalatraBase$anonfun$runRoutes$1$anonfun$apply$8.apply(ScalatraBase.scala:238)
at scala.Option.flatMap(Option.scala:170)
at org.scalatra.ScalatraBase$anonfun$runRoutes$1.apply(ScalatraBase.scala:238)
at org.scalatra.ScalatraBase$anonfun$runRoutes$1.apply(ScalatraBase.scala:237)
at scala.collection.immutable.Stream.flatMap(Stream.scala:446)</em>
... View more
07-06-2017
05:30 AM
While testing like this, it does not read hive-site.xml, spark-env.sh of the cluster. Is there a way to make it read spark config present in the cluster?
... View more
07-06-2017
02:58 AM
On spark thrift server, after create function, when function is called (SELECT SparkUDTF('txt','table')) gives error that function is not recognized.
... View more
07-05-2017
01:35 PM
After setting the config in SparkSession source code: SparkSession spark = SparkSession .builder() .enableHiveSupport() .master("yarn-client") .appName("SampleSparkUDTF_yarnV1") .config("spark.yarn.jars","hdfs:///hdp/apps/2.6.1.0-129/spark2") .config("spark.yarn.am.extraJavaOptions","-Dhdp.version=2.6.1.0-129") .config("spark.driver.extra.JavaOptions","-Dhdp.version=2.6.1.0-129") .config("spark.executor.memory","4g") .getOrCreate(); While testing via HS2 & this is the error: beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:10000> …… ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:868) at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:860) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:102) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more
... View more
07-05-2017
12:22 PM
Steps: 1. Created a java class extending hive GenericUDTF, created SparkSession in that: Public Class sparkUDTF extends GenericHiveUDTF {
...
static Long sparkJob(String tableName) {
SparkSession spark = SparkSession.builder().enableHiveSupport().master("yarn-client").appName("SampleSparkUDTF_yarnV1").getOrCreate();
Dataset inputData = spark.read().table(tableName); //input to the function “text”, “hive table”
Long countRows = inputData.count(); //access hive table
return countRows;
}
} 2. Copied this custom UDTF jar into hdfs and also into auxlib 3. Copied /usr/hdp/<2.6.x>/spark2/jars/*.jar
into /usr/hdp/<2.6.x>/hive/auxlib/ 4. Connecting to HS2 using beeline to run this Spark UDT: beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver
CREATE TABLE TestTable (i int);INSERT INTO TestTable VALUES (1);
CREATE FUNCTION
SparkUDT AS 'SparkHiveUDTF' using jar
'hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar' ;
SELECT SparkUDT('tbl','TestTable'); On HDP 2.6 cluster, Spark 2.1 - causes this error: Caused by: java.lang.IllegalStateException: Library directory '/hadoop/yarn/local/usercache/hive/appcache/application_1499162780176_0014/container_e03_1 499162780176_0014_01_000005/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:260)
at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:380)
at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:570)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97)
at SparkHiveUDTF.process(SparkHiveUDTF.java:78)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
... View more
Labels:
07-05-2017
12:21 PM
Error while trying to access a table in hive db, used hive GenericUDTF , used .enableHiveSupport() during SparkSession(), running via hiveServer2, in a HDP 2.5 with Spark2 The stack trace is: 2017-07-03 11:13:43,623 ERROR [HiveServer2-Background-Pool: Thread-1061]: SessionState (SessionState.java:printError(989)) - Status: Failed
2017-07-03 11:13:43,623 ERROR [HiveServer2-Background-Pool: Thread-1061]: SessionState (SessionState.java:printError(989)) - Vertex failed, vertexName=Map 1, vertexId=vertex_1499067308783_0051_1_00, diagnostics=[Task failed, taskId=task_1499067308783_0051_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more
Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)
at org.apache.spark.sql.SparkSession$Builder$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
at com.fuzzylogix.experiments.udf.hiveSparkUDF.SampleSparkUDTF_yarnV1.sparkJob(SampleSparkUDTF_yarnV1.java:97)
at com.fuzzylogix.experiments.udf.hiveSparkUDF.SampleSparkUDTF_yarnV1.process(SampleSparkUDTF_yarnV1.java:78)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109)
... View more