Support Questions

Find answers, ask questions, and share your expertise

:32: error: value format is not a member of org.apache.spark.sql.DataFrame

avatar
New Contributor

Hi,

I'm trying Lab 4 - Spark Risk Factor Analysis, in almost the last step I executed this command:

risk_factor_spark.write.orc("risk_factor_spark")

It throws the following error:

:32: error: value format is not a member of org.apache.spark.sql.DataFrame

Please could you help me solve this?

Thanks!

1 ACCEPTED SOLUTION

avatar
New Contributor

Thanks for your answer Geoffrey.

I'm using version 2.3.2.

After running the command

risk_factor_spark.write.format("orc").save("risk_factor_spark")

I just can see several messages, but one of them says the following:

INFO DefaultWriterContainer: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/risk_factor_spark/_temporary/0":hdfs:hdfs:drwxr-xr-x

Reading in the comments I saw something similar from Özgür Akdemirci, and tried the answer from Peter Lasne:

there was no /user directory for my user in HDFS, so I also had to do this: “sudo -u admin hdfs dfs -mkdir /user/” and “sudo -u admin hdfs dfs -chown :hdfs /user/”.

But it didn't work. Do I need to set more permissions?

Regards.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Juan Manuel Perez

If you are using Spark >= 1.4 try the following command

  1. risk_factor_spark.write.format("orc").save("risk_factor_spark")

avatar
New Contributor

Thanks for your answer Geoffrey.

I'm using version 2.3.2.

After running the command

risk_factor_spark.write.format("orc").save("risk_factor_spark")

I just can see several messages, but one of them says the following:

INFO DefaultWriterContainer: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/risk_factor_spark/_temporary/0":hdfs:hdfs:drwxr-xr-x

Reading in the comments I saw something similar from Özgür Akdemirci, and tried the answer from Peter Lasne:

there was no /user directory for my user in HDFS, so I also had to do this: “sudo -u admin hdfs dfs -mkdir /user/” and “sudo -u admin hdfs dfs -chown :hdfs /user/”.

But it didn't work. Do I need to set more permissions?

Regards.

avatar
Master Mentor

@Juan Manuel Perez

hdfs dfs -chown -R root:hdfs /user/root

avatar
New Contributor

Hi Neeraj,

When I try to assign the permissions it sends an error saying that /user/root folder doesn't exist, I'm newbie in Hadoop and Linux commands, but if I understand well, this one is assigning the permissions in the HDFS not in the linux system. Reviewing HDFS the /user/root folder doesn't exist since I'm logged in with admin user but the /user/admin folder is there.

I ran again the Spark commands but this time from the web interface http://localhost:4200 (previously I was using a console) and it worked, seems that every time that I use the console with ssh root@localhost, Spark assumes that the Hadoop user is root, not admin.

I'm wondering how can I impersonate the user in spark shell different from root when running the scripts.

Thanks anyway! this help me figuring out what was the problem.

avatar
Master Mentor
@Juan Manuel Perez

login as root in your server

su - hdfs

hdfs dfs -mkdir -p /user/root

hdfs dfs -chown -R root:hdfs /user/root