Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn job stuck at Accepted state in a kerberized cluster - oozie spark

avatar

I am using a spark-action which runs my spark job to get the data from the HBase. I have made all the configurations (https://community.hortonworks.com/content/supportkb/49407/how-to-set-up-oozie-to-connect-to-secured-hbase-cl-1.html). When i run the oozie job it just keeps running indefinitely. The stdout just keeps repeating this below lines

2018-03-15 17:07:56,235 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:57,237 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:58,242 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)
2018-03-15 17:07:59,247 [main] INFO  org.apache.spark.deploy.yarn.Client  - Application report for application_1521130356618_0004 (state: ACCEPTED)

My workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="demo-workflow">

    <credentials>
        <credential name="hbaseauth" type="hbase">
            <property>
                <name>hadoop.security.authentication</name>
                <value>kerberos</value>
            </property>
            <property>
                <name>hbase.security.authentication</name>
                <value>kerberos</value>
            </property>
            <property>
                <name>hbase.master.kerberos.principal</name>
                <value>hbase/_HOST@HORTONWORKS.COM</value>
            </property>
            <property>
                <name>hbase.regionserver.kerberos.principal</name>
                <value>hbase/_HOST@HORTONWORKS.COM</value>
            </property>
            <property>
                <name>hbase.zookeeper.quorum</name>
                <value>sandbox.hortonworks.com</value>
            </property>
            <property>
                <name>hadoop.rpc.protection</name>
                <value>authentication</value>
            </property>
            <property>
                <name>hbase.rpc.protection</name>
                <value>authentication</value>
            </property>
            <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
            </property>
            <property>
                <name>zookeeper.znode.parent</name>
                <value>/hbase-secure</value>
            </property>
        </credential>
    </credentials>


    <start to="sparkjob"/>

    <action name="sparkjob" cred="hbaseauth">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>sandbox.hortonworks.com:8032</job-tracker>
            <name-node>hdfs://sandbox.hortonworks.com:8020</name-node>
            <configuration>
                <property>
                    <name>oozie.launcher.mapred.job.queue.name</name>
                    <value>default</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.map.memory.mb</name>
                    <value>4096</value>
                </property>
                <property>
                    <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
                    <value>1024</value>
                </property>
                <property>
                    <name>mapreduce.job.queuename</name>
                    <value>default</value>
                </property>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>yarn-client</master>
            <mode>client</mode>
            <name>oozie-sparkjob</name>
            <class>SparkJob</class>
            <jar>
                hdfs://sandbox.hortonworks.com/user/oozie/lib/ooziesparkjobhbase-1.0.0-1.0-SNAPSHOT.jar
            </jar>
            <spark-opts>--executor-memory 2G --num-executors 5 --queue default --conf spark.ui.port=44040 --files
                /usr/hdp/current/spark-client/conf/hive-site.xml --jars
                /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar
                --conf spark.yarn.historyServer.address=sandbox.hortonworks.com:18080 --conf spark.eventLog.dir=hdfs://sandbox.hortonworks.com:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true
            </spark-opts>
        </spark>

        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Workflow failed, error message[]${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

My spark job just counts the number of rows in the table and then must print the value.

I am not sure what is wrong. Appreciate any help in this. Thanks

1 ACCEPTED SOLUTION

avatar

Looks like there are no free resources available in YARN and the job state is not promoted fro ACCEPTED to RUNNING.
Can you please verify of free conatiners available in the queue

View solution in original post

4 REPLIES 4

avatar

@Balachandra Pai You may want to look into the yarn application logs to find the actual cause...

avatar

The stdout which i posted above was observed in the yarn logs. i also get this warning though

 08:17:57,832 [main] WARN  org.apache.hadoop.security.token.Token  - Cannot find class for token kind HBASE_AUTH_TOKEN
2018-03-16 08:17:57,832 [main] WARN  org.apache.hadoop.security.token.Token  - Cannot find class for token kind HBASE_AUTH_TOKEN

Could this be the problem?

avatar

Looks like there are no free resources available in YARN and the job state is not promoted fro ACCEPTED to RUNNING.
Can you please verify of free conatiners available in the queue

avatar

Clearing the queues actually solved the issue for me. Thanks