Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Executing Spark action in Oozie using yarn cluster mode but getting an error java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation

avatar
Rising Star

Hi,

I We have installed HDP-2.4.0.0. As per the requirement i need to configure oozie job w.r.t spark action.

I have written the code.

Workflow.xml:

<?xml version="1.0"?>
<workflow-app name="${OOZIE_WF_NAME}" xmlns="uri:oozie:workflow:0.5">
<global>
        <configuration>
            <property>
                <name>oozie.launcher.yarn.app.mapreduce.am.env</name>
                <value>SPARK_HOME=/usr/hdp/2.4.0.0-169/spark/</value>
            </property>
        </configuration>
</global>
    <start to="spark-mongo-ETL"/>
    <action name="spark-mongo-ETL">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
             <master>yarn-cluster</master>
            <mode>cluster</mode>
            <name>SparkMongoLoading</name>
            <class>com.SparkSqlExample</class>
            <jar>${nameNode}${WORKFLOW_HOME}/lib/SparkParquetExample-0.0.1-SNAPSHOT.jar</jar>
        </spark>
        <ok to="End"/>
        <error to="killAction"/>
    </action>
        <kill name="killAction">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="End"/>
</workflow-app>

Job.properties:

nameNode=hdfs://nameNode1:8020
jobTracker=yarnNM:8050
queueName=default
user.name=hadoop
oozie.libpath=/user/oozie/share/lib/
oozie.use.system.libpath=true
WORKFLOW_HOME=/user/hadoop/SparkETL
OOZIE_WF_NAME=Spark-Mongo-ETL-wf
SPARK_MONGO_JAR=${nameNode}${WORKFLOW_HOME}/lib/SparkParquetExample-0.0.1-SNAPSHOT.jar
oozie.wf.application.path=${nameNode}/user/hadoop/SparkETL/

Under lib folder 2 jar are placed

SparkParquetExample-0.0.1-SNAPSHOT.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar

When I submit the oozie job, the action was killed.

Error :

Error: java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation
  at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217)
  at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2624)
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
  at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:342)
  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:270)
  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:432)
  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
  at org.apache.hadoop.mapred.YarnChild.configureLocalDirs(YarnChild.java:256)
  at org.apache.hadoop.mapred.YarnChild.configureTask(YarnChild.java:314)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:146)

Also let me know how to pass the jars and files explicitly in the workflow.

Command :

spark-submit --class com.SparkSqlExample --master yarn-cluster --num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/jackson-core-2.4.4.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-spark-1.5.2.jar,/usr/share/java/slf4j-simple-1.7.5.jar,/usr/hdp/current/spark-client/lib/spark-core_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/spark-hive_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/spark-sql_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-core-1.5.2.jar,/usr/hdp/current/spark-client/lib/spark-avro_2.10-2.0.1.jar,/usr/hdp/current/spark-client/lib/spark-csv_2.10-1.4.0.jar,/usr/hdp/current/spark-client/lib/spark-mongodb_2.10-0.11.2.jar,/usr/hdp/current/spark-client/lib/spark-streaming_2.10-1.6.0.jar,/usr/hdp/current/spark-client/lib/commons-csv-1.1.jar,/usr/hdp/current/spark-client/lib/mongodb-driver-3.2.2.jar,/usr/hdp/current/spark-client/lib/mongo-hadoop-master-1.5.2.jar,/usr/hdp/current/spark-client/lib/mongo-java-driver-3.2.2.jar,/usr/hdp/current/spark-client/lib/spark-1.6.0.2.4.0.0-169-yarn-shuffle.jar --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar --conf spark.yarn.executor.memoryOverhead=512 /home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT.jar

The above command executes successfully

Can anyone suggest me the solution.

1 ACCEPTED SOLUTION

avatar

I don't know where the TFS bit comes from, maybe some dependency problems.

For including all dependencies in the workflow I would recommend to go for a fat jar (assembly). In scala with sbt you can see the idea here Creating fat jars with sbt. Same works with maven's "maven-assembly-plugin". You should be able to call your code as

spark-submit --master yarn-cluster \ 
--num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 \
--class com.SparkSqlExample \
/home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT-with-depencencies.jar

If this works, the jar with dependencies should be the one in the oozie spark action.

View solution in original post

5 REPLIES 5

avatar

I don't know where the TFS bit comes from, maybe some dependency problems.

For including all dependencies in the workflow I would recommend to go for a fat jar (assembly). In scala with sbt you can see the idea here Creating fat jars with sbt. Same works with maven's "maven-assembly-plugin". You should be able to call your code as

spark-submit --master yarn-cluster \ 
--num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 \
--class com.SparkSqlExample \
/home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT-with-depencencies.jar

If this works, the jar with dependencies should be the one in the oozie spark action.

avatar
Rising Star

Hi @Bernhard Walter,

Thanks for the reply!!!.

I have followed your idea, but still throwing different error.

Please help me.

diagnostics: Application application_1468279065782_0300 failed 2 times due to AM Container for appattempt_1468279065782_0300_000002 exited with  exitCode: -1000
  For more detailed output, check application tracking page:http://yarnNM:8088/cluster/app/application_1468279065782_0300Then, click on links to logs of each attempt.
  Diagnostics: Permission denied: user=hadoop, access=EXECUTE, inode="/user/yarn/.sparkStaging/application_1468279065782_0300/__spark_conf__1316069581048982381.zip":yarn:yarn:drwx------
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
  at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3866)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1076)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
  

avatar

It looks like you are executing the job as user hadoop, However spark wants to execute staging data from/user/yarn (which can only be accessed by yarn). How did you start the job and with which user?

I am surprised that spark uses /user/yarn as staging dir for user hadoop. Is there any staging dir configuration in your system (SPARK_YARN_STAGING_DIR)?

avatar
Rising Star

Hi @Bernhard Walter,

Inspite of Creating the Fat jar, the below error also occured

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
	at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:49)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1120)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

avatar