Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

Solved Go to solution
Highlighted

Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

New Contributor

I am using a spark-action which runs my spark job to get the data from the HBase. I have made all the configurations (https://community.hortonworks.com/content/supportkb/49407/how-to-set-up-oozie-to-connect-to-secured-hbase-cl-1.html). When i run the oozie job it just keeps running indefinitely. The stdout just keeps repeating this below lines

2018-03-15 17:07:56,235 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:57,237 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:58,242 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:59,247 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)

My workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="demo-workflow">

    <credentials>
        <credential name="hbaseauth" type="hbase">
            <property>
                <name>hadoop.security.authentication</name>
                <value>kerberos</value>
            </property>
            <property>
                <name>hbase.security.authentication</name>
                <value>kerberos</value>
            </property>
            <property>
                <name>hbase.master.kerberos.principal</name>
                <value>hbase/_HOST@HORTONWORKS.COM</value>
            </property>
            <property>
                <name>hbase.regionserver.kerberos.principal</name>
                <value>hbase/_HOST@HORTONWORKS.COM</value>
            </property>
            <property>
                <name>hbase.zookeeper.quorum</name>
                <value>sandbox.hortonworks.com</value>
            </property>
            <property>
                <name>hadoop.rpc.protection</name>
                <value>authentication</value>
            </property>
            <property>
                <name>hbase.rpc.protection</name>
                <value>authentication</value>
            </property>
            <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
            </property>
            <property>
                <name>zookeeper.znode.parent</name>
                <value>/hbase-secure</value>
            </property>
        </credential>
    </credentials>


    <start to="sparkjob"/>

    <action name="sparkjob" cred="hbaseauth">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>sandbox.hortonworks.com:8032</job-tracker>
            <name-node>hdfs://sandbox.hortonworks.com:8020</name-node>
            <configuration>
                <property>
                    <name>oozie.launcher.mapred.job.queue.name</name>
                    <value>default</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.map.memory.mb</name>
                    <value>4096</value>
                </property>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
                    <value>1024</value>
                </property>
                <property>
                    <name>mapreduce.job.queuename</name>
                    <value>default</value>
                </property>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>yarn-client</master>
            <mode>client</mode>
            <name>oozie-sparkjob</name>
            <class>SparkJob</class>
            <jar>
                hdfs://sandbox.hortonworks.com/user/oozie/lib/ooziesparkjobhbase-1.0.0-1.0-SNAPSHOT.jar
            </jar>
            <spark-opts>--executor-memory 2G --num-executors 5 --queue default --conf spark.ui.port=44040 --files
                /usr/hdp/current/spark-client/conf/hive-site.xml --jars
                /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar
                --conf spark.yarn.historyServer.address=sandbox.hortonworks.com:18080 --conf spark.eventLog.dir=hdfs://sandbox.hortonworks.com:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true
            </spark-opts>
        </spark>

        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Workflow failed, error message[]${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

My spark job just counts the number of rows in the table and then must print the value.

I am not sure what is wrong. Appreciate any help in this. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

Looks like there are no free resources available in YARN and the job state is not promoted fro ACCEPTED to RUNNING.
Can you please verify of free conatiners available in the queue

4 REPLIES 4

Re: Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

@Balachandra Pai You may want to look into the yarn application logs to find the actual cause...

Re: Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

New Contributor

The stdout which i posted above was observed in the yarn logs. i also get this warning though

 08:17:57,832 [main] WARN  org.apache.hadoop.security.token.Token  - Cannot find class for token kind HBASE_AUTH_TOKEN
2018-03-16 08:17:57,832 [main] WARN  org.apache.hadoop.security.token.Token  - Cannot find class for token kind HBASE_AUTH_TOKEN

Could this be the problem?

Re: Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

Looks like there are no free resources available in YARN and the job state is not promoted fro ACCEPTED to RUNNING.
Can you please verify of free conatiners available in the queue

Re: Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

New Contributor

Clearing the queues actually solved the issue for me. Thanks