Member since
04-29-2016
14
Posts
0
Kudos Received
0
Solutions
07-21-2016
04:23 PM
Hi Kanwar, Did you place *-site.xml files in Oozie share lib? You need to place *-site.xml files in Oozie share lib and then update the share lib and restart Oozie. Hope it helps.
... View more
06-17-2016
04:57 PM
Thanks @Sunile Manjee for the details, now got pretty good idea of what's going on there. I will look those options.
... View more
06-17-2016
01:49 AM
Hi, I'm trying to install a small HDP cluster on AWS EC2. It's working fine as expected except one painful issue related to hostname. In my /etc/hosts, I have this (example): 10.0.0.1 hdpmaster.emp.net hdpmaster ip-10-0-0-1.compute.internal
10.0.0.2 hdprm.emp.net hdprm ip-10-0-0-2.compute.internal
10.0.0.3 hdp1.emp.net hdp1 ip-10-0-0-3.compute.internal
10.0.0.4 hdp2.emp.net hdp2 ip-10-0-0-4.compute.internal I used this when setting up my cluster through Ambari: hdpmaster.emp.net
hdprm.emp.net
hdp1.emp.net
hdp2.emp.net All services are running as expected but I need to change URL every time if I want to browse them. Ambari is redirecting everything based on the hostname (which is expected), but is there any way not to get the actual IP address on the redirect link instead of the hostname? For example: I get this to track an app (RM): http://hdprm.emp.net:8088/cluster/app/application_1466120609594_0002 Unless I change that to this, I'm unable to browse it: http://10.0.0.2:8088/cluster/app/application_1466120609594_0002 I do not want to assign elastic public IP to each of those machines (not feasible) and use public dns name during installation, so what I can do for private IP address? Did I miss something? Or I need to use some browser plug-in to redirect to the expected url? Or I need to use the exact IP address (10.0.0.2) during installation from Ambari? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
05-05-2016
05:20 PM
That did the trick! Thanks a lot @Predrag Minovic. Should I consider that as a work-around for now? I never had to do that for other Oozie actions. Thanks again.
... View more
05-05-2016
01:08 AM
Thanks again @Predrag Minovic, I tried that then I saw this:
https://community.hortonworks.com/questions/11599/how-to-change-resourcemanager-port-on-oozie.html
https://community.hortonworks.com/questions/21248/spark-action-always-submit-to-00008032.html
Looks like it's a known issue. As a work around, I changed my resource manager port to 8032 and restarted again but didn't help. It was still pointing to 0.0.0.0/0.0.0.0:8032. Though at this point, the port is right, but not the hostname, it should be hdp1.cidev/hdp1.cidev:8032. So looks like Oozie/Spark action is not picking up the right configuration. Any clue? Thanks again.
... View more
05-03-2016
09:15 PM
Thanks @Predrag Minovic, I got past that error but now it's getting timed out. Looks like when the job is placed, its pointing to invalid resource manager url. Not sure why it's not picking up the right configuration. 2016-05-03 17:04:09,439 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2016-05-03 17:04:10,448 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:11,449 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:12,450 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:13,450 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:14,451 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:15,452 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
By the way, should we ignore this warning? org.apache.spark.SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
Thanks again.
... View more
05-01-2016
05:45 AM
Thanks @Benjamin Leonhardi, appreciate it. I was under impression that Spark action is now supported under HDP 2.4. My bad 😞 I checked classpath but didn't find why some old jars are still there. In the current cluster, we initially installed HDP 2.3.4, then we upgraded to HDP 2.4. Even in HDP 2.3.4, Spark 1.5.2 was there so it should have found that method. So looks like it still pointing to old version of Spark somehow.
... View more
04-28-2016
11:42 PM
Hello everyone,
Is there anyone tried to use Oozie Spark action on HDP 2.4. I believe it's now supported and I'm trying to schedule a job through Oozie using Spark action but facing several issues.
I have already followed Hortonworks KB article related to "How-To_ How to run Spark Action in oozie of HDP" and followed each steps. I'm also able to submit my jar through spark-submit using "yarn-cluster" mode but when I place it through Oozie I face the following error:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:49)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1120)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I guess some library mismatch in the classpath but not sure which one. My Oozie workflow is pretty simple, just reading some files from S3 and print the count:
<workflow-app name="POC-ETL" xmlns="uri:oozie:workflow:0.5">
<start to="spark"/>
<action name="spark">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
</configuration>
<master>${master}</master>
<!-- <mode>cluster</mode> -->
<name>Sample</name>
<class>com.example.PrintCount</class>
<jar>${nameNode}/user/${wf:user()}/poc/workflow/lib/print-count.jar</jar>
<spark-opts>spark.driver.extraJavaOptions=-Dhdp.version=2.4.0.0-169</spark-opts>
<arg>s3n://....</arg>
</spark>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>ETL Ingest Failed: error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Here is my job.properties file:
oozie.wf.application.path=${nameNode}/user/cidev/poc/workflow
oozie.use.system.libpath=true
oozie.action.sharelib.for.spark = spark,hcatalog,hive
nameNode=hdfs://hdpmaster.cidev:8020
jobTracker=hdp1.cidev:8050
queueName=default
master=yarn-cluster
Here is the list from sharelib Spark:
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-actor_2.10-2.2.3-shaded-protobuf.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-remote_2.10-2.2.3-shaded-protobuf.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/aws-java-sdk-1.7.4.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/azure-storage-2.2.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/chill-java-0.3.6.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/chill_2.10-0.3.6.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/colt-1.2.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-codec-1.4.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-httpclient-3.1.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-lang-2.4.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-lang3-3.3.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-net-2.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/compress-lzf-1.0.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/concurrent-1.3.4.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/config-1.0.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/core-site.xml
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-client-2.5.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-framework-2.5.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-recipes-2.5.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/guava-14.0.1.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hadoop-aws-2.7.1.2.4.0.0-169.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hadoop-azure-2.7.1.2.4.0.0-169.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hdfs-site.xml
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-annotations-2.2.3.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-core-2.2.3.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-databind-2.2.3.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.activation-1.1.0.v201105071233.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.mail.glassfish-1.4.1.v201005082020.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.servlet-3.0.0.v201112011016.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.transaction-1.1.1.v201105210645.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jblas-1.2.3.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jets3t-0.7.1.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-continuation-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-http-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-io-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-jndi-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-plus-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-security-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-server-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-servlet-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-util-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-webapp-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-xml-8.1.14.v20131031.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jline-2.12.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/joda-time-2.1.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-ast_2.10-3.2.10.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-core_2.10-3.2.10.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-jackson_2.10-3.2.10.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jsr305-1.3.9.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/kryo-2.21.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/log4j-1.2.16.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/lz4-1.2.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/mapred-site.xml
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-core-3.0.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-graphite-3.0.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-json-3.0.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-jvm-3.0.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/minlog-1.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/netty-3.6.6.Final.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/netty-all-4.0.23.Final.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/objenesis-1.2.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/oozie-sharelib-spark-4.2.0.2.4.0.0-169.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/paranamer-2.6.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/protobuf-java-2.4.1-shaded.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/py4j-0.8.2.1.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/pyrolite-2.0.1.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/reflectasm-1.07-shaded.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-compiler-2.10.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-library-2.10.4.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-reflect-2.10.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scalap-2.10.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/slf4j-api-1.6.6.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/slf4j-log4j12-1.6.6.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/snappy-java-1.0.5.3.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-core_2.10-1.1.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-graphx_2.10-1.1.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-streaming_2.10-1.1.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/stream-2.7.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/tachyon-0.5.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/tachyon-client-0.5.0.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/uncommons-maths-1.2.2a.jar
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/yarn-site.xml
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/zookeeper-3.4.6.jar
Any helps would be highly appreciated.
Thanks in advance.
Tishu
... View more
Labels: