Support Questions
Find answers, ask questions, and share your expertise

Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

I am using a spark-action which runs my spark job to get the data from the HBase. I have made all the configurations (https://community.hortonworks.com/content/supportkb/49407/how-to-set-up-oozie-to-connect-to-secured-hbase-cl-1.html). When i run the oozie job it just keeps running indefinitely. The stdout just keeps repeating this below lines

2018-03-15 17:07:56,235 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:57,237 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:58,242 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:59,247 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)

My workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="demo-workflow">

    <credentials>
        <credential name="hbaseauth" type="hbase">
            <property>
                <name>hadoop.security.authentication</name>
                <value>kerberos</value>
            </property>
            <property>
                <name>hbase.security.authentication</name>
                <value>kerberos</value>
            </property>
            <property>
                <name>hbase.master.kerberos.principal</name>
                <value>hbase/_HOST@HORTONWORKS.COM</value>
            </property>
            <property>
                <name>hbase.regionserver.kerberos.principal</name>
                <value>hbase/_HOST@HORTONWORKS.COM</value>
            </property>
            <property>
                <name>hbase.zookeeper.quorum</name>
                <value>sandbox.hortonworks.com</value>
            </property>
            <property>
                <name>hadoop.rpc.protection</name>
                <value>authentication</value>
            </property>
            <property>
                <name>hbase.rpc.protection</name>
                <value>authentication</value>
            </property>
            <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
            </property>
            <property>
                <name>zookeeper.znode.parent</name>
                <value>/hbase-secure</value>
            </property>
        </credential>
    </credentials>


    <start to="sparkjob"/>

    <action name="sparkjob" cred="hbaseauth">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>sandbox.hortonworks.com:8032</job-tracker>
            <name-node>hdfs://sandbox.hortonworks.com:8020</name-node>
            <configuration>
                <property>
                    <name>oozie.launcher.mapred.job.queue.name</name>
                    <value>default</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.map.memory.mb</name>
                    <value>4096</value>
                </property>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
                    <value>1024</value>
                </property>
                <property>
                    <name>mapreduce.job.queuename</name>
                    <value>default</value>
                </property>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>yarn-client</master>
            <mode>client</mode>
            <name>oozie-sparkjob</name>
            <class>SparkJob</class>
            <jar>
                hdfs://sandbox.hortonworks.com/user/oozie/lib/ooziesparkjobhbase-1.0.0-1.0-SNAPSHOT.jar
            </jar>
            <spark-opts>--executor-memory 2G --num-executors 5 --queue default --conf spark.ui.port=44040 --files
                /usr/hdp/current/spark-client/conf/hive-site.xml --jars
                /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar
                --conf spark.yarn.historyServer.address=sandbox.hortonworks.com:18080 --conf spark.eventLog.dir=hdfs://sandbox.hortonworks.com:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true
            </spark-opts>
        </spark>

        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Workflow failed, error message[]${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

My spark job just counts the number of rows in the table and then must print the value.

I am not sure what is wrong. Appreciate any help in this. Thanks

1 ACCEPTED SOLUTION

Looks like there are no free resources available in YARN and the job state is not promoted fro ACCEPTED to RUNNING.
Can you please verify of free conatiners available in the queue

View solution in original post

4 REPLIES 4

@Balachandra Pai You may want to look into the yarn application logs to find the actual cause...

The stdout which i posted above was observed in the yarn logs. i also get this warning though

 08:17:57,832 [main] WARN  org.apache.hadoop.security.token.Token  - Cannot find class for token kind HBASE_AUTH_TOKEN
2018-03-16 08:17:57,832 [main] WARN  org.apache.hadoop.security.token.Token  - Cannot find class for token kind HBASE_AUTH_TOKEN

Could this be the problem?

Looks like there are no free resources available in YARN and the job state is not promoted fro ACCEPTED to RUNNING.
Can you please verify of free conatiners available in the queue

Clearing the queues actually solved the issue for me. Thanks

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.