Member since
04-03-2019
962
Posts
1743
Kudos Received
146
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 17739 | 03-08-2019 06:33 PM | |
| 7166 | 02-15-2019 08:47 PM |
06-11-2018
11:58 PM
Due to conflict in Jackson jar versions, Oozie job with spark2 action(spark action with spark2 sharelib) may get failed with below error: 2018-06-05 16:53:04,567 [Thread-20] INFO org.apache.spark.SparkContext - Created broadcast 0 from showString at NativeMethodAccessorImpl.java:0
Traceback (most recent call last):
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/stg_gl_account_classification_master.py", line 9, in <module>
gacm.show()
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 318, in show
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.showString.
: java.lang.ExceptionInInitializerError
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$anonfun$org$apache$spark$sql$Dataset$execute$1$1.apply(Dataset.scala:2386)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$execute$1(Dataset.scala:2385)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2128)
at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.4.4
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:56)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:549)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 27 more . Why this error? By default, 'oozie' directory in Oozie sharelib has jackson jars with 2.4.4 version and spark2 sharelib has latest versions of jackson jars. . To fix this error, please follow below steps: Step 1: Move older jackson jars from default oozie sharelib to other directory: hadoop fs -mv /user/oozie/share/lib/lib_<ts>/oozie/jackson*/user/oozie/share/lib/lib_<ts>/oozie.old . Step 2: Update oozie sharelib: oozie admin -oozie http://<oozie-server-hostname>:11000/oozie -sharelibupdate . Please check this article for more details about oozie spark2 action. . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
... View more
Labels:
06-08-2018
12:09 AM
Please follow below steps to run spark2 action via Oozie on HDP clusters. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/ch_oozie-spark-action.html Your Oozie job may get failed with below error because of jar conflicts between 'oozie' sharelib and 'spark2' sharelib. Error: 2018-06-04 13:27:32,652 WARN SparkActionExecutor:523 - SERVER[XXXX] USER[XXXX] GROUP[-] TOKEN[] APP[XXXX] JOB[0000000-<XXXXX>-oozie-oozi-W] ACTION[0000000-<XXXXXX>-oozie-oozi-W@spark2] Launcher exception: Attempt to add (hdfs://XXXX/user/oozie/share/lib/lib_XXXXX/oozie/aws-java-sdk-kms-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://XXXXX/user/oozie/share/lib/lib_20170727191559/oozie/aws-java-sdk-kms-1.10.6.jar) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:632)
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:623)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:623)
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:622)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:622)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1231)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:750)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:237)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) . Please run below commands to fix this error: Note - You may need to take backup before running rm commands. hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/aws*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/azure*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/hadoop-aws*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/hadoop-azure*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/ok*
hadoop fs -mv /user/oozie/share/lib/lib_<ts>/oozie/jackson* /user/oozie/share/lib/lib_<ts>/oozie.old . Please run below command to update Oozie sharelib: oozie admin -oozie http://<oozie-server-hostname>:11000/oozie -sharelibupdate . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
... View more
Labels:
10-16-2017
10:02 PM
1 Kudo
Please follow below steps for running SparkR script via Oozie . 1. Install R packages on all the node managers yum -y install R R-devel libcurl-devel openssl-devel . 2. Keep your R script ready Here is the sample script library(SparkR)
sc <- sparkR.init(appName="SparkR-sample")
sqlContext <- sparkRSQL.init(sc)
localDF <- data.frame(name=c("ABC", "blah", "blah"), age=c(39, 32, 81))
df <- createDataFrame(sqlContext, localDF)
printSchema(df)
sparkR.stop() . 3. Create workflow.xml Here is the working example: <workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkFileCopy'>
<global>
<configuration>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.env</name>
<value>SPARK_HOME=/usr/hdp/2.5.3.0-37/spark</value>
</property>
<property>
<name>oozie.launcher.mapred.child.env</name>
<value>SPARK_HOME=/usr/hdp/2.5.3.0-37/spark</value>
</property>
</configuration>
</global>
<start to='spark-node' />
<action name='spark-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/>
</prepare>
<master>${master}</master>
<name>SparkR</name>
<jar>${nameNode}/user/${wf:user()}/spark.R</jar>
<spark-opts>--driver-memory 512m --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.3.0</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app> . 4. Make sure that you don't have sparkr.zip in workflow/lib directory or Oozie sharelib or in <file> tag in the workflow, or else it will cause conflicts. . Upload workflow to hdfs and run it. It should work. This has been successfully tested on HDP-2.5.X & HDP-2.6.X . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! Reference - https://developer.ibm.com/hadoop/2017/06/30/scheduling-spark-job-written-pyspark-sparkr-yarn-oozie
... View more
Labels:
10-06-2017
10:31 PM
Please follow below steps to modify quicklinks for Oozie service in Ambari Note - This tutorial has been successfully tried and tested on Ambari 2.4.2.0 and Ambari 2.5.2.0 1. Please make sure that your /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/OOZIE/metainfo.xml looks like below. <?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<metainfo>
<schemaVersion>2.0</schemaVersion>
<services>
<service>
<name>OOZIE</name>
<extends>common-services/OOZIE/4.0.0.2.0</extends>
<quickLinksConfigurations>
<quickLinksConfiguration>
<fileName>quicklinks.json</fileName>
<default>true</default>
</quickLinksConfiguration>
</quickLinksConfigurations>
</service>
</services>
</metainfo>
. 2. Edit /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/OOZIE/quicklinks/quicklinks.json and modify "url" field to your loadbalancer's URL e.g. "url" : "https://<load-balancer-hostname:<port-number>/oozie>", . 3. Execute below command cp /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/OOZIE/quicklinks/quicklinks.json /var/lib/ambari-server/resources/common-services/OOZIE/4.2.0.2.3/quicklinks/quicklinks.json Note - Modify version numbers for Oozie if required. . 4. Restart Ambari server . 5. Try to access quicklinks for Oozie, it should point you to load balancer URL . . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
... View more
Labels:
08-13-2018
12:22 PM
Hello Kuldeep, Can you let me know how to set log4j Property based on file size and MaxBackupIndex. Thanks Rakesh
... View more
05-31-2017
12:15 AM
The screen shot is out of date? I have Ambari 2.4.2 and they are different. I have tried to configure as close to you steps as I can, I got "Service Hive check failed: Server Error"
... View more
07-03-2018
02:34 PM
How to runn sqoop job? is my sqoop job name id Inc_dat, how to run this using oozie?
... View more
01-24-2018
03:14 AM
The doc https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_spark-component-guide/content/config-sts-user-imp.html doesn't say Kerberos is required in Prerequisites, but do you know if Spark 1.6 impersonation requires Kerberos (unlike Hive)?
... View more
03-17-2017
12:50 AM
Hello Artem,
thanks, adding an interpreter line worked. I don't know how could I forget
that...? I think, i'm doing lot of multi tasking. Also I don't have
python 3 installed so I was running on python 2. Once again, thank you for
quick response. Really appreciate it. Sam
... View more