Member since
04-03-2019
962
Posts
1743
Kudos Received
146
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11061 | 03-08-2019 06:33 PM | |
4780 | 02-15-2019 08:47 PM | |
4099 | 09-26-2018 06:02 PM | |
10424 | 09-07-2018 10:33 PM | |
5502 | 04-25-2018 01:55 AM |
09-07-2018
10:33 PM
We got it working by adding a tag to centos image with below commands: docker tag centos local/centos Here is the modified distributed shell command to run: yarn jar $DJAR -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos -shell_command "sleep 120"-jar $DJAR -num_containers 1 Note - For multi-node cluster, you will have to run docker tag command on every node manager as root user. Please also make sure that you have added "local" registry as trusted registry in yarn configurations. Hope this helps! Special thanks to @rmaruthiyodan
... View more
06-11-2018
11:58 PM
Due to conflict in Jackson jar versions, Oozie job with spark2 action(spark action with spark2 sharelib) may get failed with below error: 2018-06-05 16:53:04,567 [Thread-20] INFO org.apache.spark.SparkContext - Created broadcast 0 from showString at NativeMethodAccessorImpl.java:0
Traceback (most recent call last):
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/stg_gl_account_classification_master.py", line 9, in <module>
gacm.show()
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 318, in show
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/grid/9/hadoop/yarn/local/usercache/XXXX/appcache/application_1528131553123_0280/container_e81_1528131553123_0280_01_000002/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.showString.
: java.lang.ExceptionInInitializerError
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset$anonfun$org$apache$spark$sql$Dataset$execute$1$1.apply(Dataset.scala:2386)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$execute$1(Dataset.scala:2385)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2128)
at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.4.4
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:56)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:549)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 27 more . Why this error? By default, 'oozie' directory in Oozie sharelib has jackson jars with 2.4.4 version and spark2 sharelib has latest versions of jackson jars. . To fix this error, please follow below steps: Step 1: Move older jackson jars from default oozie sharelib to other directory: hadoop fs -mv /user/oozie/share/lib/lib_<ts>/oozie/jackson*/user/oozie/share/lib/lib_<ts>/oozie.old . Step 2: Update oozie sharelib: oozie admin -oozie http://<oozie-server-hostname>:11000/oozie -sharelibupdate . Please check this article for more details about oozie spark2 action. . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
... View more
Labels:
06-08-2018
12:09 AM
Please follow below steps to run spark2 action via Oozie on HDP clusters. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/ch_oozie-spark-action.html Your Oozie job may get failed with below error because of jar conflicts between 'oozie' sharelib and 'spark2' sharelib. Error: 2018-06-04 13:27:32,652 WARN SparkActionExecutor:523 - SERVER[XXXX] USER[XXXX] GROUP[-] TOKEN[] APP[XXXX] JOB[0000000-<XXXXX>-oozie-oozi-W] ACTION[0000000-<XXXXXX>-oozie-oozi-W@spark2] Launcher exception: Attempt to add (hdfs://XXXX/user/oozie/share/lib/lib_XXXXX/oozie/aws-java-sdk-kms-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://XXXXX/user/oozie/share/lib/lib_20170727191559/oozie/aws-java-sdk-kms-1.10.6.jar) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:632)
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:623)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:623)
at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:622)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:622)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:895)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1231)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:750)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:237)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) . Please run below commands to fix this error: Note - You may need to take backup before running rm commands. hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/aws*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/azure*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/hadoop-aws*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/hadoop-azure*
hadoop fs -rm /user/oozie/share/lib/lib_<ts>/spark2/ok*
hadoop fs -mv /user/oozie/share/lib/lib_<ts>/oozie/jackson* /user/oozie/share/lib/lib_<ts>/oozie.old . Please run below command to update Oozie sharelib: oozie admin -oozie http://<oozie-server-hostname>:11000/oozie -sharelibupdate . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
... View more
Labels:
05-16-2018
12:59 PM
I had similar issue and fixed with the following ambari repo. wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.1.5/ambari.repo -O /etc/yum.repos.d/ambari.repo
... View more
10-16-2017
10:02 PM
1 Kudo
Please follow below steps for running SparkR script via Oozie . 1. Install R packages on all the node managers yum -y install R R-devel libcurl-devel openssl-devel . 2. Keep your R script ready Here is the sample script library(SparkR)
sc <- sparkR.init(appName="SparkR-sample")
sqlContext <- sparkRSQL.init(sc)
localDF <- data.frame(name=c("ABC", "blah", "blah"), age=c(39, 32, 81))
df <- createDataFrame(sqlContext, localDF)
printSchema(df)
sparkR.stop() . 3. Create workflow.xml Here is the working example: <workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkFileCopy'>
<global>
<configuration>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.env</name>
<value>SPARK_HOME=/usr/hdp/2.5.3.0-37/spark</value>
</property>
<property>
<name>oozie.launcher.mapred.child.env</name>
<value>SPARK_HOME=/usr/hdp/2.5.3.0-37/spark</value>
</property>
</configuration>
</global>
<start to='spark-node' />
<action name='spark-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/>
</prepare>
<master>${master}</master>
<name>SparkR</name>
<jar>${nameNode}/user/${wf:user()}/spark.R</jar>
<spark-opts>--driver-memory 512m --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.3.0</spark-opts>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app> . 4. Make sure that you don't have sparkr.zip in workflow/lib directory or Oozie sharelib or in <file> tag in the workflow, or else it will cause conflicts. . Upload workflow to hdfs and run it. It should work. This has been successfully tested on HDP-2.5.X & HDP-2.6.X . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! Reference - https://developer.ibm.com/hadoop/2017/06/30/scheduling-spark-job-written-pyspark-sparkr-yarn-oozie
... View more
Labels:
10-06-2017
10:31 PM
Please follow below steps to modify quicklinks for Oozie service in Ambari Note - This tutorial has been successfully tried and tested on Ambari 2.4.2.0 and Ambari 2.5.2.0 1. Please make sure that your /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/OOZIE/metainfo.xml looks like below. <?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<metainfo>
<schemaVersion>2.0</schemaVersion>
<services>
<service>
<name>OOZIE</name>
<extends>common-services/OOZIE/4.0.0.2.0</extends>
<quickLinksConfigurations>
<quickLinksConfiguration>
<fileName>quicklinks.json</fileName>
<default>true</default>
</quickLinksConfiguration>
</quickLinksConfigurations>
</service>
</services>
</metainfo>
. 2. Edit /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/OOZIE/quicklinks/quicklinks.json and modify "url" field to your loadbalancer's URL e.g. "url" : "https://<load-balancer-hostname:<port-number>/oozie>", . 3. Execute below command cp /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/OOZIE/quicklinks/quicklinks.json /var/lib/ambari-server/resources/common-services/OOZIE/4.2.0.2.3/quicklinks/quicklinks.json Note - Modify version numbers for Oozie if required. . 4. Restart Ambari server . 5. Try to access quicklinks for Oozie, it should point you to load balancer URL . . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
... View more
Labels:
06-14-2018
01:08 AM
@Krishna Srinivas Glad to know that this helped! 🙂
... View more
08-02-2017
02:22 PM
@Kuldeep Kulkarni /var/lib/ambari-server/resources/scripts/configs.sh
-u <ambari-admin-username> -p <ambari-admin-password> set <ambari-server-hostname> <cluster-name> ozie-env oozie_user_nofile_limit 32000
and oozie_user_nproc_limit 16000 had to remove the word advanced its just oozie-env 🙂 but its all working now! thank you soooooo much!!! and thank you @Geoffrey Shelton Okot for the help as well!!! you guys are awesome im very grateful for the help.
... View more