About ludof

ludof · ‎05-10-2018

Hello everyone, I have a cluster with HDFS High Availability (HA) enabled. The cluster has two NameNode, one active and on in standby state, plus 3 journal nodes, a balancer and failover controllers. My question: how should I configure Oozie workflows for nameNode and jobTracker parameters in job.properties file in order to point always to the active NameNode and JobTracker (in case of a failure or a manual switch of the NameNode)? Thanks for any information

ludof · ‎04-17-2018

@Harsh J Thank you, unfortunately I have access only to edge node (I can't ssh to masters and workers). I have access to web interfaces though (CM, HUE, Yarn, etc) thus if there is anything I can check from there let me know.

ludof · ‎04-17-2018

@Harsh J In Cloudera Manager I went to Oozie server instance and check logs from there but there is nothing useful in Stdout and Stderr, are these the logs you were talking about? Also I am not sure where can I find logs about HMS, can you provide some details?

ludof · ‎04-16-2018

@Harsh J Thank you very much for the answer, I was able to generate the keytab file and put it in the lib folder of the Oozie application, but when I run I got the following error: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Login failure for user: icon0104@AZCLOUD.LOCAL from keytab icon0104.keytab javax.security.auth.login.LoginException: Pre-authentication information was invalid (24) org.apache.hadoop.security.KerberosAuthException: Login failure for user: icon0104@AZCLOUD.LOCAL from keytab icon0104.keytab javax.security.auth.login.LoginException: Pre-authentication information was invalid (24) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1130) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:562) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:178) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:90) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:81) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:57) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) I am also a bit confused about all this procedure, I mean shouldn't Oozie be able to get a Kerberos delegation token on behalf of my user without the need for me to provide the keytab file? I have also tried (again) to use hcat credentials specifiying the configuration you suggested in the other post with the following workflow: <workflow-app xmlns="uri:oozie:workflow:0.5" name="oozie_spark_wf"> <credentials> <credential name="hive2_credentials" type="hive2"> <property> <name>hive2.jdbc.url</name> <value>jdbc:hive2://trmas-fc2d552a.azcloud.local:10000/default;ssl=true</value> </property> <property> <name>hive2.server.principal</name> <value>hive/trmas-fc2d552a.azcloud.local@AZCLOUD.LOCAL</value> </property> </credential> <credential name="hcat_cred" type="hcat"> <property> <name>hcat.metastore.uri</name> <value>thrift://trmas-fc2d552a.azcloud.local:9083</value> </property> <property> <name>hcat.metastore.principal</name> <value>hive/trmas-fc2d552a.azcloud.local@AZCLOUD.LOCAL</value> </property> </credential> </credentials> <start to="spark_action"/> <action cred="hcat_cred" name="spark_action"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="hdfs://trmas-6b8bc78c.azcloud.local:8020/user/icon0104/spark_hive_output"/> </prepare> <configuration> <property> <name>spark.yarn.security.tokens.hive.enabled</name> <value>false</value> </property> </configuration> <master>yarn-cluster</master> <name>OozieSparkAction</name> <class>my.Main</class> <jar>/home/icon0104/oozie/ooziespark/lib/ooziespark-1.0.jar</jar> <spark-opts>--files ${nameNode}/user/icon0104/oozie/ooziespark/hive-site.xmlL</spark-opts> </spark> <ok to="END_NODE"/> <error to="KILL_NODE"/> </action> <kill name="KILL_NODE"> <message>${wf:errorMessage(wf:lastErrorNode())}</message> </kill> <end name="END_NODE"/> </workflow-app> But the Spark action goes in START_RETRY state with the same error : JA009: org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : TException while getting delegation token.. Cause : org.apache.thrift.transport.TTransportException Thanks for the support!

ludof · ‎04-16-2018

@saranvisa Thank you, if I uderstand correctly I need to provide a keytab file on HDFS and pass it as a file in the Oozie Spark action. What I am missing here is how can generate this keytab file as non proviliged user. I can kinit but I have no privileges for kadmin command. Do I need to contact an administrator or are other ways to get this keytab file?

ludof · ‎04-16-2018

@saranvisa @suresh.sethu I've tried to kinit before launching Oozie Spark action in yarn-cluster mode but it fails anyway. In the logs I found a lot of the following warnings: 2018-04-16 10:59:05,874 [main] INFO org.apache.spark.deploy.yarn.YarnSparkHadoopUtil - getting token for namenode: hdfs://hanameservice/user/icon0104/.sparkStaging/application_1523441517429_3067 2018-04-16 10:59:06,004 [main] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:icon0104 (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error and the following exception: diagnostics: User class threw exception: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient I've also tried to run the spark program directly from the shell with spark-submit using --master yarn-cluster but got the following error: Delegation Token can be issued only with kerberos or web authentication Any idea how to solve?

ludof · ‎04-16-2018

Thank you, exactly what I was thinking. With all queries aggregated in one script I gain speed (no overhead on Yarn containers) but in case of error I loose granularity for debug.

ludof · ‎04-15-2018

Hello everyone, when performing Hive commands inside Oozie is it ok to aggregate them in one script, or it is better to split up in different Hive action/script? For example I need to create several views, shoould I put each view creation in a distinct Hive action/script or can I put all the views creation in a single one? Which is the best practice and why?

ludof · ‎04-10-2018

Thank you very much for the detailed answer @mzkdm. This is indeed a very interesting point. Do you think could make sense to have daily-based partitions, since my main ingestion workflow run once a day? And how can I force Hive or Impala users to use the last point-in-time data? Thanks for the help!

ludof · ‎04-09-2018

Hi @saranvisa, thanks for the answer. Do I need to do this every time before running the Oozie Spark action? Because this is a coordinator-scheduled workflow that I need to run several times per day.

Online	Offline
Last Visited	‎12-21-2018 06:29 AM

Member Since	‎11-24-2017 01:33 AM
Last Visited	‎12-21-2018 06:29 AM
Posts	76
Kudos received	7

Cloudera Community

Re: Oozie with HDFS High Availability

Re: Invalidate metadata using Cloudera Impala JDBC...

Re: Cloudera Manager: oozie.service.WorkflowAppSer...

Re: Oozie Sqoop actions fails when importing data ...

Re: Oozie Sqoop action fails on --hive-import

Oozie with HDFS High Availability

Re: Spark SQL action fails in Kerberos secured clu...

Re: Spark SQL action fails in Kerberos secured clu...

Re: Spark SQL action fails in Kerberos secured clu...

Re: Spark SQL action fails in Kerberos secured clu...

Re: Spark SQL action fails in Kerberos secured clu...

Re: Best practice for Hive actions inside Oozie

Best practice for Hive actions inside Oozie

Re: Rollback

Re: Spark SQL action fails in Kerberos secured clu...