I'm working though this procedure to upgrade Ambari from 2.1.1 to 2.2.0 before I start an HDP upgrade from 2.3.0 to 2.3.4:
It says to run the service checks on installed components first. They all passed except the Hive check and I get this access error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=ambari-qa, access=WRITE, inode="/apps/hive/warehouse":hive:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
/apps/hive/warehouse user:group is to hive:hdfs. The ambari-qa user that is used when running the service checks on the node where Ambari is saying the checks are being run is ambari-qa:hadoop:
[jupstats@vmwhadnfs01 hive]$ id ambari-qa uid=1001(ambari-qa) gid=501(hadoop) groups=501(hadoop),100(users)
So, ambari-qa is a member of the hadoop group but that group has no write permission to hive managed tables which are owned by hive and allowed read access to only hdfs users. I'm not sure what Ambari service check is trying to do by it is clearly trying to write something in that managed table space.
As I understand it, the hadoop superuser is the user that starts the namenode, user "hdfs" in my case. So, my questions are:
1. Should "hdfs" really be the group for /apps/hive/warehouse or would it be better to have that be the "hadoop" group?
2. What are the best practice recommendations for the user:group permissions on /apps/hive/warehouse? For example, I have some Java and Python apps that run every 30 minutes to ingest data into hive management and external tables. Those processes run as a service user "jupstats" and group "ingest". My /apps/hive/warehouse/jupstats.db directory is where the managed tables lives and that directory is set to jupstats:ingest to restrict access appropriately. This seems right to me. Do you experts agree? Same for the directories where I also write some HDFS data that is accessed by external Hive tables. Those files are owned as jupstats:ingest.
3. I think I am generally lacking knowledge in how to best setup up access to various Hive tables that are eventually going to need to be accessed by various users. My thought was that all my jupstats.db tables, which are read only by group ingest, will be made made readable by these users by adding those users to the "ingest" group. Does that approach seem reasonable?
4. This still leaves me with the question of how to I setup Hive so that this Ambari service check can pass? Should I add ambari-qa to the "hdfs" group? That feels wrong and dangerous in that it is like adding ambari-qa to a root-like account since user "hdfs" is the hadoop superuser and can wack a lot of stuff.
Thanks for any help/tips on this...
My installation differs from that links instructions but I did not change these from what the base install of HDP created. Why the discrepancy?
For "/apps/hive", your link says: hdfs dfs -chown -R $HIVE_USER:$HDFS_USER /apps/hive hdfs dfs -chmod -R 775 /apps/hive My setup is: drwxr-xr-x 3 hdfs hdfs 96 Sep 30 15:09 hive For "/tmp/hive", your link says: hdfs dfs -chmod -R 777 /tmp/hive My setup is: drwx-wx-wx 11 ambari-qa hdfs 352 Jan 15 22:13 hive
Ambari may be running some shell script to check if Hive service is running. Can you tell which shell scrip is running from the stack trace? Also can you run 'create table' and 'drop table' via Hive CLI? If you can, the error you are facing might be due to a bug in that shell script that Ambari is running.
Interestingly, I just upgraded Ambari from 2.1 to 2.2 as part of my upgrade plans and the Hive service check now passes. The stack trace does show Ambari running various command scripts that implement this check.