Support Questions

Find answers, ask questions, and share your expertise

Hive: INSERT OVERWRITE does not work

avatar
Expert Contributor

I cannot get the following query run using PutHiveQL processor, it fails with the permission denied exception. I see that the processor emulates the same behavior as in Beeline.

But, I have been able to run the query from Hive CLI and writes to a file as expected. So, we know that Hive shell is an option, but can you let me know if there is any specific settings that causes this behavior in Beeline (Hive2) preventing to write to local filesystem?

insert overwrite local directory '/tmp' select current_date from dual

Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [xxxxx] does not have [WRITE] privilege on [/tmp] (state=42000,code=40000)

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Regardless the reason you've decided to use PutHiveQL for select statement (I would use SelectHiveQL and then PutFile processor to store a result of select)...

But... back to the use case. Both PutHiveQL and Beeline are JDBC based clients. Using "insert overwrite LOCAL directory" doesn't seem to have a sense, as you never know what LOCAL directory (which node, etc) will be in the context. If still temp files are required to be created, you can go with:

1. PutHiveQL (insert overwrite directory, not local).

2. GetHDFS

... continue your processes/transformations with FlowFiles as usual.

Please note, HDFS files will be deleted after GetHDFS will finish.

View solution in original post

10 REPLIES 10

avatar

It seems nifi user dont have permissions to write into /tmp dir. You have two options. 1. Change the permission on /tmp folder to allow everyone to write into it. 2. If you have configured Ranger, make sure in the resource based policy for HDFS, nifi user is allowed access to all paths or specific paths you want write to.

10406-screen-shot-2016-12-16-at-52349-pm.gif

avatar
Expert Contributor

Ranger is specific to HDFS. I am referring to the issue with Hive writing to the local file system, see Hive statement in my summary above. And, /tmp directory is accessible by any user

avatar
Master Guru
@milind pandit

is right. if you do not have ranger enabled then add nifi user to the linux group which owns /tmp. This is ACL security. if you have ranger enabled then don't do this.

avatar
Expert Contributor

Please see my response above

avatar
Master Guru

Did you provide hive user access to local file system directory /tmp?

avatar
Expert Contributor

What I understand is that Hive temporarily uses the /tmp on the HDFS to write into and then copies over to Local directory. So, in the ranger the recursive access to /tmp HDFS folder has been issued. But, the issue still persists with NiFi

avatar
Master Guru

Do you have ranger audit enabled? if so please provide what the log shows when nifi tries to hit /tmp

avatar
Super Collaborator

Regardless the reason you've decided to use PutHiveQL for select statement (I would use SelectHiveQL and then PutFile processor to store a result of select)...

But... back to the use case. Both PutHiveQL and Beeline are JDBC based clients. Using "insert overwrite LOCAL directory" doesn't seem to have a sense, as you never know what LOCAL directory (which node, etc) will be in the context. If still temp files are required to be created, you can go with:

1. PutHiveQL (insert overwrite directory, not local).

2. GetHDFS

... continue your processes/transformations with FlowFiles as usual.

Please note, HDFS files will be deleted after GetHDFS will finish.

avatar
Expert Contributor

My use case requires to write in a delimited format. INSERT OVERWRITE LOCAL fits perfectly for this. I wish we had a way to custom delimit the content retrieved through SelectHiveQL processor, so I couldn't opt it.

I agree it's a good option of writing to HDFS instead. I will attempt to modify the process, but I still wonder why it wouldn't work with write to Local filesystem, through NiFi.