Created 05-07-2018 09:38 AM
I want to save DataFrame on disk:
df.write.format("parquet").save("/home/centos/test/df.parquet")
I get the following error, which says that the user "centos" does not have write permissions:
18/05/07 09:18:08 ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user=centos, access=WRITE, inode="/home/centos/test/df.parquet/_temporary/0":hdfs:hdfs:drwxr-xr-x
This is how I run spark-submit command:
spark-submit --master yarn --deploy-mode cluster --driver-memory 6g --executor-cores 2 --num-executors 2 --executor-memory 4g --class org.test.MyProcessor mytest.jar
Created 05-07-2018 12:51 PM
.save action in spark writes the data to HDFS, but the permissions are changed in Local file system.
Please change the permissions to /home/centos directory in HDFS
Login as HDFS user
hdfs dfs -chown -R centos /home/centos/*
Created 05-07-2018 10:34 AM
You are trying to save to a local Filesystem /home/centos/---/---/ and from the error stack above the user and group is hdfs:hdfs The user centos doesn't have the correct permissions and ownership of this directory. This has nothing to do with your earlier hdfs directory where you set the correct permissions
Please do the following, while logged on the Linux CLI as centos
centos@{host}$ id
This will give you the group to which centos belongs to be used in the change ownership syntax,so as the root user or sudoer where xxx is the group
# chown -R centos:xxxxx /home/centos/---/---/
Hope that helps
Created 05-07-2018 12:37 PM
The output of "id":
uid=1000(centos) gid=1000(centos) groups=1000(centos),4(adm),10(wheel),190(systemd-journal)
I executed "chown -R centos:centos /home/centos/test" but still get the same error:
18/05/07 12:06:28 ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user=centos, access=WRITE, inode="/home/centos/test/df.parquet/_temporary/0":hdfs:hdfs:drwxr-xr-x
This is the output of "ls -la" executed in "/home/centos":
total 36236 drwx------. 4 centos centos 4096 May 7 12:34 . drwxr-xr-x. 15 root root 4096 Apr 16 18:41 .. -rw-------. 1 centos centos 13781 May 7 11:26 .bash_history -rw-r--r--. 1 centos centos 18 Mar 5 2015 .bash_logout -rw-r--r--. 1 centos centos 193 Mar 5 2015 .bash_profile -rw-r--r--. 1 centos centos 231 Mar 5 2015 .bashrc -rw-rw-r-- 1 centos centos 47 May 7 11:38 .scala_history drwx------. 2 centos centos 46 May 2 07:57 .ssh drwxrwxr-x 4 centos centos 144 May 7 11:42 test
Created 05-07-2018 12:43 PM
Maybe the problem is that I run Spark program in Yarn cluster mode? It means that the driver can be running in any of the machines of the cluster. So, probably I should run "chown -R centos:centos ..." in each machine or do ".coalesce(1)"?
Created 05-07-2018 12:51 PM
.save action in spark writes the data to HDFS, but the permissions are changed in Local file system.
Please change the permissions to /home/centos directory in HDFS
Login as HDFS user
hdfs dfs -chown -R centos /home/centos/*
Created 05-07-2018 01:05 PM
I think that this is the reason. If I login as HDFS user and run "hdfs dfs -chown -R centos /home/centos/test", then it says that this directory does not exist. I created this directory as HDFS user and then changed permissions to centos. Should I write a parquet file to the full path?:
df.coalesce(1).write.format("parquet").save("hdfs://eureambarimaster1.local.eurecat.org:8020/user/hdfs/test")
Created 05-07-2018 01:19 PM
df.coalesce(1).write.mode("overwrite").format("parquet").save("/user/hdfs/test")
if we won't mention any mode spark will fail with directory already exists error because you have already created the test directory.
Created 05-07-2018 01:30 PM
Yes, sure. Sorry, I was actually referring to "hdfs://eureambarimaster1.local.eurecat.org:8020/user/hdfs/test/df.parquet"
Let me test it.
Created 05-07-2018 01:45 PM
I have just tested it. It worked fine! Thank you!