Created 03-27-2017 06:24 PM
Hello,
I'm trying to use Ranger to activate User Column Level permissions.
I am able to do table level permissions by changing HDFS policies.
When I try Hive, column level permissions, and then use Hive CLI, these permissions do not work.
Please let me know what I am doing wrong and what I should be doing.
Thanks,
Marcy
Created 03-27-2017 06:42 PM
@Marcy Using the Hive CLI, the connection is direct to the Hive Metastore, and relies on Storage-based Authorization. To take advantage of the Ranger-based central security, Hortonworks recommends using Beeline (instead of the Hive CLI) as it will go through HiveServer2 and the Ranger-based policies will apply. In fact, in production environments, it is often suggested to have administrators disable the hive CLI and force users to issue CLI-based interactions through Beeline. Here are some relevant links that you may find useful. As always, if you find this post useful, don't forget to upvote and/or accept the answer.
https://community.hortonworks.com/articles/10367/apache-ranger-and-hive-column-level-security.html
https://community.hortonworks.com/questions/10760/how-to-disable-hive-shell-for-all-users.html
Created 03-27-2017 06:42 PM
@Marcy Using the Hive CLI, the connection is direct to the Hive Metastore, and relies on Storage-based Authorization. To take advantage of the Ranger-based central security, Hortonworks recommends using Beeline (instead of the Hive CLI) as it will go through HiveServer2 and the Ranger-based policies will apply. In fact, in production environments, it is often suggested to have administrators disable the hive CLI and force users to issue CLI-based interactions through Beeline. Here are some relevant links that you may find useful. As always, if you find this post useful, don't forget to upvote and/or accept the answer.
https://community.hortonworks.com/articles/10367/apache-ranger-and-hive-column-level-security.html
https://community.hortonworks.com/questions/10760/how-to-disable-hive-shell-for-all-users.html
Created 03-27-2017 06:46 PM
Ok...
If I would like users to use HiveQL, what are my options if I disable Hive CLI?
What are the differences between Hive and Beeline?
Can I connect via Spark? RStudio? Python?
Thanks,
Marcia
Created 03-27-2017 06:53 PM
@Marcy If you disable the Hive CLI, your best and recommended option is to have users use Beeline for HiveQL. It is supported by Hortonworks, and is the most popular client. Additionally, you may wish to explore a GUI-based tool included in Ambari called the Ambari Hive View (which gets even better in the upcoming HDP 2.6 release). The first link I included outlines the major differences between Hive and Beeline for you, but in a nutshell, Beeline goes through HiveServer2 which means it will respect Ranger based authorization whereas Hive is more like a brute-force direct connection if you will and bypasses many of the security features. All of the options you listed are possible. When looking at different methods of accessing data in Hive, what you want to ensure is that they go through HiveServer2 so that the Ranger-based security is respected. This is normally Hadoop administrator's primary concern. Here is an additional link that goes over various Hive clients:
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
In my experience, BeeLine and the Ambari Hive View is where most Hadoopers start their journey and remain until a use case comes along that requires additional technologies like Spark, R or Python.
Created 03-27-2017 06:57 PM
Created 03-27-2017 07:30 PM
@Marcy All of them can work. Their access to Hive is commonly done using a Notebook tool called Apache Zeppelin (included in the Hortonworks Data Platform). Hortonworks has many tutorials that can show you step by step on how to connect these:
https://hortonworks.com/hadoop-tutorial/using-hive-with-orc-from-apache-spark/
https://hortonworks.com/hadoop-tutorial/getting-started-apache-zeppelin/