Member since
11-24-2017
76
Posts
8
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2632 | 05-14-2018 10:28 AM | |
4800 | 03-28-2018 12:19 AM | |
2358 | 02-07-2018 02:54 AM | |
2887 | 01-26-2018 03:41 AM | |
4252 | 01-05-2018 02:06 AM |
05-10-2018
03:06 AM
Hello everyone, I have a cluster with HDFS High Availability (HA) enabled. The cluster has two NameNode, one active and on in standby state, plus 3 journal nodes, a balancer and failover controllers. My question: how should I configure Oozie workflows for nameNode and jobTracker parameters in job.properties file in order to point always to the active NameNode and JobTracker (in case of a failure or a manual switch of the NameNode)? Thanks for any information
... View more
Labels:
- Labels:
-
Apache Oozie
-
HDFS
04-17-2018
06:37 AM
@Harsh J Thank you, unfortunately I have access only to edge node (I can't ssh to masters and workers). I have access to web interfaces though (CM, HUE, Yarn, etc) thus if there is anything I can check from there let me know.
... View more
04-17-2018
04:23 AM
@Harsh J In Cloudera Manager I went to Oozie server instance and check logs from there but there is nothing useful in Stdout and Stderr, are these the logs you were talking about? Also I am not sure where can I find logs about HMS, can you provide some details?
... View more
04-16-2018
07:58 AM
@Harsh J Thank you very much for the answer, I was able to generate the keytab file and put it in the lib folder of the Oozie application, but when I run I got the following error: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Login failure for user: icon0104@AZCLOUD.LOCAL from keytab icon0104.keytab javax.security.auth.login.LoginException: Pre-authentication information was invalid (24)
org.apache.hadoop.security.KerberosAuthException: Login failure for user: icon0104@AZCLOUD.LOCAL from keytab icon0104.keytab javax.security.auth.login.LoginException: Pre-authentication information was invalid (24)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1130)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:562)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:178)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:90)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:81)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) I am also a bit confused about all this procedure, I mean shouldn't Oozie be able to get a Kerberos delegation token on behalf of my user without the need for me to provide the keytab file? I have also tried (again) to use hcat credentials specifiying the configuration you suggested in the other post with the following workflow: <workflow-app xmlns="uri:oozie:workflow:0.5" name="oozie_spark_wf">
<credentials>
<credential name="hive2_credentials" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>jdbc:hive2://trmas-fc2d552a.azcloud.local:10000/default;ssl=true</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>hive/trmas-fc2d552a.azcloud.local@AZCLOUD.LOCAL</value>
</property>
</credential>
<credential name="hcat_cred" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>thrift://trmas-fc2d552a.azcloud.local:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/trmas-fc2d552a.azcloud.local@AZCLOUD.LOCAL</value>
</property>
</credential>
</credentials>
<start to="spark_action"/>
<action cred="hcat_cred" name="spark_action">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="hdfs://trmas-6b8bc78c.azcloud.local:8020/user/icon0104/spark_hive_output"/>
</prepare>
<configuration>
<property>
<name>spark.yarn.security.tokens.hive.enabled</name>
<value>false</value>
</property>
</configuration>
<master>yarn-cluster</master>
<name>OozieSparkAction</name>
<class>my.Main</class>
<jar>/home/icon0104/oozie/ooziespark/lib/ooziespark-1.0.jar</jar>
<spark-opts>--files ${nameNode}/user/icon0104/oozie/ooziespark/hive-site.xmlL</spark-opts>
</spark>
<ok to="END_NODE"/>
<error to="KILL_NODE"/>
</action>
<kill name="KILL_NODE">
<message>${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name="END_NODE"/>
</workflow-app> But the Spark action goes in START_RETRY state with the same error : JA009: org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : TException while getting delegation token.. Cause : org.apache.thrift.transport.TTransportException Thanks for the support!
... View more
04-16-2018
05:12 AM
@saranvisa Thank you, if I uderstand correctly I need to provide a keytab file on HDFS and pass it as a file in the Oozie Spark action. What I am missing here is how can generate this keytab file as non proviliged user. I can kinit but I have no privileges for kadmin command. Do I need to contact an administrator or are other ways to get this keytab file?
... View more
04-16-2018
04:07 AM
@saranvisa @suresh.sethu I've tried to kinit before launching Oozie Spark action in yarn-cluster mode but it fails anyway. In the logs I found a lot of the following warnings: 2018-04-16 10:59:05,874 [main] INFO org.apache.spark.deploy.yarn.YarnSparkHadoopUtil - getting token for namenode: hdfs://hanameservice/user/icon0104/.sparkStaging/application_1523441517429_3067
2018-04-16 10:59:06,004 [main] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:icon0104 (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error and the following exception: diagnostics: User class threw exception: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient I've also tried to run the spark program directly from the shell with spark-submit using --master yarn-cluster but got the following error: Delegation Token can be issued only with kerberos or web authentication Any idea how to solve?
... View more
04-16-2018
01:55 AM
Thank you, exactly what I was thinking. With all queries aggregated in one script I gain speed (no overhead on Yarn containers) but in case of error I loose granularity for debug.
... View more
04-15-2018
12:51 AM
Hello everyone, when performing Hive commands inside Oozie is it ok to aggregate them in one script, or it is better to split up in different Hive action/script? For example I need to create several views, shoould I put each view creation in a distinct Hive action/script or can I put all the views creation in a single one? Which is the best practice and why?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Oozie
04-10-2018
11:45 AM
Thank you very much for the detailed answer @mzkdm. This is indeed a very interesting point. Do you think could make sense to have daily-based partitions, since my main ingestion workflow run once a day? And how can I force Hive or Impala users to use the last point-in-time data? Thanks for the help!
... View more
04-09-2018
12:11 PM
Hi @saranvisa, thanks for the answer. Do I need to do this every time before running the Oozie Spark action? Because this is a coordinator-scheduled workflow that I need to run several times per day.
... View more