Support Questions

vijaykumar243 · ‎07-14-2016

Hi,

I We have installed HDP-2.4.0.0. As per the requirement i need to configure oozie job w.r.t spark action.

I have written the code.

Workflow.xml:

<?xml version="1.0"?>
<workflow-app name="${OOZIE_WF_NAME}" xmlns="uri:oozie:workflow:0.5">
<global>
        <configuration>
            <property>
                <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                <value>SPARK_HOME=/usr/hdp/2.4.0.0-169/spark/</value>
            </property>
        </configuration>
</global>
    <start to="spark-mongo-ETL"/>
    <action name="spark-mongo-ETL">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
             <master>yarn-cluster</master>
            <mode>cluster</mode>
            <name>SparkMongoLoading</name>
            <class>com.SparkSqlExample</class>
            <jar>${nameNode}${WORKFLOW_HOME}/lib/SparkParquetExample-0.0.1-SNAPSHOT.jar</jar>
        </spark>
        <ok to="End"/>
        <error to="killAction"/>
    </action>
        <kill name="killAction">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="End"/>
</workflow-app>

Job.properties:

nameNode=hdfs://nameNode1:8020
jobTracker=yarnNM:8050
queueName=default
user.name=hadoop
oozie.libpath=/user/oozie/share/lib/
oozie.use.system.libpath=true
WORKFLOW_HOME=/user/hadoop/SparkETL
OOZIE_WF_NAME=Spark-Mongo-ETL-wf
SPARK_MONGO_JAR=${nameNode}${WORKFLOW_HOME}/lib/SparkParquetExample-0.0.1-SNAPSHOT.jar
oozie.wf.application.path=${nameNode}/user/hadoop/SparkETL/

Under lib folder 2 jar are placed

SparkParquetExample-0.0.1-SNAPSHOT.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar

When I submit the oozie job, the action was killed.

Error :

Error: java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation
  at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217)
  at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2624)
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
  at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:342)
  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:270)
  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:432)
  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
  at org.apache.hadoop.mapred.YarnChild.configureLocalDirs(YarnChild.java:256)
  at org.apache.hadoop.mapred.YarnChild.configureTask(YarnChild.java:314)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:146)

Also let me know how to pass the jars and files explicitly in the workflow.

Command :

spark-submit --class com.SparkSqlExample --master yarn-cluster --num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/jackson-core-2.4.4.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-spark-1.5.2.jar,/usr/share/java/slf4j-simple-1.7.5.jar,/usr/hdp/current/spark-client/lib/spark-core_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/spark-hive_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/spark-sql_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-core-1.5.2.jar,/usr/hdp/current/spark-client/lib/spark-avro_2.10-2.0.1.jar,/usr/hdp/current/spark-client/lib/spark-csv_2.10-1.4.0.jar,/usr/hdp/current/spark-client/lib/spark-mongodb_2.10-0.11.2.jar,/usr/hdp/current/spark-client/lib/spark-streaming_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/commons-csv-1.1.jar,/usr/hdp/current/spark-client/lib/mongodb-driver-3.2.2.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-master-1.5.2.jar,/usr/hdp/current/spark-client/lib/mongo-java-driver-3.2.2.jar,/usr/hdp/current/spark-client/lib/spark-1.6.0.2.4.0.0-169-yarn-shuffle.jar --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar --conf spark.yarn.executor.memoryOverhead=512 /home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT.jar

The above command executes successfully

Can anyone suggest me the solution.

bwalter1 · ‎07-14-2016

I don't know where the TFS bit comes from, maybe some dependency problems.

For including all dependencies in the workflow I would recommend to go for a fat jar (assembly). In scala with sbt you can see the idea here Creating fat jars with sbt. Same works with maven's "maven-assembly-plugin". You should be able to call your code as

spark-submit --master yarn-cluster \ 
--num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 \
--class com.SparkSqlExample \
/home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT-with-depencencies.jar

If this works, the jar with dependencies should be the one in the oozie spark action.

View solution in original post

bwalter1 · ‎07-14-2016

I don't know where the TFS bit comes from, maybe some dependency problems.

For including all dependencies in the workflow I would recommend to go for a fat jar (assembly). In scala with sbt you can see the idea here Creating fat jars with sbt. Same works with maven's "maven-assembly-plugin". You should be able to call your code as

spark-submit --master yarn-cluster \ 
--num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 \
--class com.SparkSqlExample \
/home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT-with-depencencies.jar

If this works, the jar with dependencies should be the one in the oozie spark action.

vijaykumar243 · ‎07-14-2016

Hi @Bernhard Walter,

Thanks for the reply!!!.

I have followed your idea, but still throwing different error.

Please help me.

diagnostics: Application application_1468279065782_0300 failed 2 times due to AM Container for appattempt_1468279065782_0300_000002 exited with  exitCode: -1000
  For more detailed output, check application tracking page:http://yarnNM:8088/cluster/app/application_1468279065782_0300Then, click on links to logs of each attempt.
  Diagnostics: Permission denied: user=hadoop, access=EXECUTE, inode="/user/yarn/.sparkStaging/application_1468279065782_0300/__spark_conf__1316069581048982381.zip":yarn:yarn:drwx------
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
  at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3866)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1076)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

bwalter1 · ‎07-15-2016

It looks like you are executing the job as user hadoop, However spark wants to execute staging data from/user/yarn (which can only be accessed by yarn). How did you start the job and with which user?

I am surprised that spark uses /user/yarn as staging dir for user hadoop. Is there any staging dir configuration in your system (SPARK_YARN_STAGING_DIR)?

vijaykumar243 · ‎07-14-2016

Hi @Bernhard Walter,

Inspite of Creating the Fat jar, the below error also occured

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
	at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:49)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1120)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

bwalter1 · ‎07-15-2016

This might help: https://community.hortonworks.com/questions/30288/oozie-spark-action-on-hdp-24-nosuchmethoderror-org...

Cloudera Community

Support Questions

Executing Spark action in Oozie using yarn cluster mode but getting an error java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation