Created 12-16-2016 10:12 PM
I cannot get the following query run using PutHiveQL processor, it fails with the permission denied exception. I see that the processor emulates the same behavior as in Beeline.
But, I have been able to run the query from Hive CLI and writes to a file as expected. So, we know that Hive shell is an option, but can you let me know if there is any specific settings that causes this behavior in Beeline (Hive2) preventing to write to local filesystem?
insert overwrite local directory '/tmp' select current_date from dual
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [xxxxx] does not have [WRITE] privilege on [/tmp] (state=42000,code=40000)
Created 12-19-2016 06:38 PM
Regardless the reason you've decided to use PutHiveQL for select statement (I would use SelectHiveQL and then PutFile processor to store a result of select)...
But... back to the use case. Both PutHiveQL and Beeline are JDBC based clients. Using "insert overwrite LOCAL directory" doesn't seem to have a sense, as you never know what LOCAL directory (which node, etc) will be in the context. If still temp files are required to be created, you can go with:
1. PutHiveQL (insert overwrite directory, not local).
2. GetHDFS
... continue your processes/transformations with FlowFiles as usual.
Please note, HDFS files will be deleted after GetHDFS will finish.
Created on 12-16-2016 10:25 PM - edited 08-18-2019 05:09 AM
It seems nifi user dont have permissions to write into /tmp dir. You have two options. 1. Change the permission on /tmp folder to allow everyone to write into it. 2. If you have configured Ranger, make sure in the resource based policy for HDFS, nifi user is allowed access to all paths or specific paths you want write to.
Created 12-18-2016 07:25 AM
Ranger is specific to HDFS. I am referring to the issue with Hive writing to the local file system, see Hive statement in my summary above. And, /tmp directory is accessible by any user
Created 12-16-2016 10:27 PM
is right. if you do not have ranger enabled then add nifi user to the linux group which owns /tmp. This is ACL security. if you have ranger enabled then don't do this.
Created 12-18-2016 07:25 AM
Please see my response above
Created 12-19-2016 05:27 AM
Did you provide hive user access to local file system directory /tmp?
Created 12-19-2016 11:37 PM
What I understand is that Hive temporarily uses the /tmp on the HDFS to write into and then copies over to Local directory. So, in the ranger the recursive access to /tmp HDFS folder has been issued. But, the issue still persists with NiFi
Created 12-20-2016 02:43 AM
Do you have ranger audit enabled? if so please provide what the log shows when nifi tries to hit /tmp
Created 12-19-2016 06:38 PM
Regardless the reason you've decided to use PutHiveQL for select statement (I would use SelectHiveQL and then PutFile processor to store a result of select)...
But... back to the use case. Both PutHiveQL and Beeline are JDBC based clients. Using "insert overwrite LOCAL directory" doesn't seem to have a sense, as you never know what LOCAL directory (which node, etc) will be in the context. If still temp files are required to be created, you can go with:
1. PutHiveQL (insert overwrite directory, not local).
2. GetHDFS
... continue your processes/transformations with FlowFiles as usual.
Please note, HDFS files will be deleted after GetHDFS will finish.
Created 12-19-2016 11:43 PM
My use case requires to write in a delimited format. INSERT OVERWRITE LOCAL fits perfectly for this. I wish we had a way to custom delimit the content retrieved through SelectHiveQL processor, so I couldn't opt it.
I agree it's a good option of writing to HDFS instead. I will attempt to modify the process, but I still wonder why it wouldn't work with write to Local filesystem, through NiFi.