Solution: Check ACLs for HDFS folders & set access for user account hive if required.
It seems that Hive 3.1.0 executes HiveSQL commands as user hive even if you launch them from the Linux command-line with Beeline as another user (e.g. maria_dev). Thus, the hive account needs to have access to any HDFS resources required (i.e. folders & files). Also, it relies on access control lists (ACLs) for this (sometime, just setting the right group access via chmod isn't enough).
Use the following to check ACL permissions:
hdfs dfs -getfacls <some_hdfs_path>
Use the following to set ACL permissions:
hdfs dfs -setfacls -m user:hive:rwx <some_hdfs_path>
However, sometimes this still didn't work as expected. For e.g., after setting the following to ensure any future files get the right ACLs, I'd find they'd randomly fail to be set:
hdfs dfs -setfacls -R -m default:user:hive:rwx <some_hdfs_path>
Then I started getting several other issues while trying to learn Hive with DAS & Spark SQL with Zeppelin.
In Data Analytics Studio, I would be able to create my training database and add tables to it, but was never able to view any tables via the DAS UI (unless I ran the show tables command).
Also, YARN resources were hitting 100% while executing simple queries with a small dataframe, just before services seemed to crash (the remote Zeppelin, Ambari & Shell-in-a-Box sessions all disconnected even though the VM still appeared to be running in VirtualBox). Then all services failed to restart this morning.
Thus, I deleted the VM & re-installed HDP Sandbox 3.0.1.
Straight after re-installing and re-creating my training database, I was able to view all tables created in it and Spark SQL is working fine.
While re-creating the VM is fine in this instance, if anyone can direct me to reading material on how folks handle such problems in live production environments, I'd appreciate it. I'm guessing re-installing the VM is a last resort...