Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

:32: error: value format is not a member of org.apache.spark.sql.DataFrame

avatar

Hi,

I'm trying Lab 4 - Spark Risk Factor Analysis, in almost the last step I executed this command:

risk_factor_spark.write.orc("risk_factor_spark")

It throws the following error:

:32: error: value format is not a member of org.apache.spark.sql.DataFrame

Please could you help me solve this?

Thanks!

1 ACCEPTED SOLUTION

avatar

Thanks for your answer Geoffrey.

I'm using version 2.3.2.

After running the command

risk_factor_spark.write.format("orc").save("risk_factor_spark")

I just can see several messages, but one of them says the following:

INFO DefaultWriterContainer: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/risk_factor_spark/_temporary/0":hdfs:hdfs:drwxr-xr-x

Reading in the comments I saw something similar from Özgür Akdemirci, and tried the answer from Peter Lasne:

there was no /user directory for my user in HDFS, so I also had to do this: “sudo -u admin hdfs dfs -mkdir /user/” and “sudo -u admin hdfs dfs -chown :hdfs /user/”.

But it didn't work. Do I need to set more permissions?

Regards.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Juan Manuel Perez

If you are using Spark >= 1.4 try the following command

  1. risk_factor_spark.write.format("orc").save("risk_factor_spark")

avatar

Thanks for your answer Geoffrey.

I'm using version 2.3.2.

After running the command

risk_factor_spark.write.format("orc").save("risk_factor_spark")

I just can see several messages, but one of them says the following:

INFO DefaultWriterContainer: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/risk_factor_spark/_temporary/0":hdfs:hdfs:drwxr-xr-x

Reading in the comments I saw something similar from Özgür Akdemirci, and tried the answer from Peter Lasne:

there was no /user directory for my user in HDFS, so I also had to do this: “sudo -u admin hdfs dfs -mkdir /user/” and “sudo -u admin hdfs dfs -chown :hdfs /user/”.

But it didn't work. Do I need to set more permissions?

Regards.

avatar
Master Mentor

@Juan Manuel Perez

hdfs dfs -chown -R root:hdfs /user/root

avatar

Hi Neeraj,

When I try to assign the permissions it sends an error saying that /user/root folder doesn't exist, I'm newbie in Hadoop and Linux commands, but if I understand well, this one is assigning the permissions in the HDFS not in the linux system. Reviewing HDFS the /user/root folder doesn't exist since I'm logged in with admin user but the /user/admin folder is there.

I ran again the Spark commands but this time from the web interface http://localhost:4200 (previously I was using a console) and it worked, seems that every time that I use the console with ssh root@localhost, Spark assumes that the Hadoop user is root, not admin.

I'm wondering how can I impersonate the user in spark shell different from root when running the scripts.

Thanks anyway! this help me figuring out what was the problem.

avatar
Master Mentor
@Juan Manuel Perez

login as root in your server

su - hdfs

hdfs dfs -mkdir -p /user/root

hdfs dfs -chown -R root:hdfs /user/root