Created 11-07-2017 01:40 PM
Hi,
I've set up a fresh cluster using HDC console. When following instructions:
> export SPARK_MAJOR_VERSION=2 > spark-shell —master yarn [...] AccessControlException: Permission denied: user=cloudbreak, access=WRITE, inode=“/user/cloudbreak/.sparkStaging/application_1510057948417_0004":hdfs:hdfs:drwxr-xr-x
Same happens with pyspark.
It looks like home directory is missing but I'm unable to create one (lack of access to hdfs account).
Is something missing in template or steps I follow.
I can try to do workarounds like pyspark --master yarn --conf spark.yarn.stagingDir=/tmp/ but still I end up with:
17/11/07 13:30:59 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
Running example:
spark-submit --conf spark.yarn.stagingDir=/tmp/ --class org.apache.spark.examples.SparkPi --master yarn --executor-memory 2G --num-executors 5 /usr/hdp/current/spark2-client/examples/jars/spark-examples_2.11-2.1.1.2.6.1.4-2.jar 100
failed with same issue, on RM site I can find:
Application application_1510057948417_0022 failed 2 times due to AM Container for appattempt_1510057948417_0022_000002 exited with exitCode: -1000 For more detailed output, check the application tracking page: http://ip-172-30-12-239.example.com:8088/cluster/app/application_1510057948417_0022 Then click on links to logs of each attempt. Diagnostics: Failing this attempt. Failing the application.
But there are no logs available for attempt and yarn cmd doesn't provide logs as well: "Can not find the logs for the application: application_1510057948417_0022 with the appOwner: cloudbreak"
Created 11-07-2017 02:06 PM
Following is the HDFS directory which has "hdfs:hdfs" access.
user=cloudbreak, access=WRITE, inode=“/user/cloudbreak/.sparkStaging/application_1510057948417_0004":hdfs:hdfs:drwxr-xr-x
.
Can you please try this "sudo su - hdfs" ?
# sudo su - hdfs # hdfs dfs -chown -R cloudbreak:hdfs /user/cloudbreak # hdfs dfs -chmod -R 777 /user/cloudbreak
.
Ambari 2.5 onwards we have a feature to auto create the home directory for the newly created users on HDFS. May be you can try creating a new user. https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-administration/content/create_use...
.
Created 11-07-2017 02:06 PM
Following is the HDFS directory which has "hdfs:hdfs" access.
user=cloudbreak, access=WRITE, inode=“/user/cloudbreak/.sparkStaging/application_1510057948417_0004":hdfs:hdfs:drwxr-xr-x
.
Can you please try this "sudo su - hdfs" ?
# sudo su - hdfs # hdfs dfs -chown -R cloudbreak:hdfs /user/cloudbreak # hdfs dfs -chmod -R 777 /user/cloudbreak
.
Ambari 2.5 onwards we have a feature to auto create the home directory for the newly created users on HDFS. May be you can try creating a new user. https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-administration/content/create_use...
.
Created 11-07-2017 02:13 PM
If kerberos is not enabled in your cluster then you can also Try to Fake "cloudbreak" user to act as "hdfs" user by running the following command:
By setting "HADOOP_USER_NAME=hdfs"
# export HADOOP_USER_NAME=hdfs # hdfs dfs -chown -R cloudbreak:hdfs /user/cloudbreak # hdfs dfs -chmod -R 777 /user/cloudbreak<br>
.
And once the permission on the directory is changed then you can open a new terminal and run your commands as "cloudbreak" user. Please do not forget to unset the "HADOOP_USER_NAME=hdfs"
Created 11-07-2017 02:18 PM
that's nice trick, will try that! Will also check user creation. The problematic part is - is it feature or a bug that it's not set after fresh startup. I'm trying to automate cluster creation for ETL (cron based) and it may be difficult to explain that I need those 3 lines if this is default cloudbreak user that is presented in each user guide 🙂
Created 11-07-2017 03:46 PM
@Jay Kumar SenSharma, manual directory creation did the trick and all the spark apps are working correctly now. Still think it's a bug but workaround is good enough for me.