This article will (hopefully) be helpful if you are trying to use LLAP/Hive Interactive, but it fails or gets stuck during startup. This is often caused by misconfiguration, esp. w.r.t. sizing, but could also be a bug. Ambari performs 2 major activities when (re)starting Hive Interactive - first it starts LLAP itself on the YARN cluster, then it starts HiveServer2 for Hive Interactive. Usually the problems happen during LLAP startup.
Note: for all the logs, it is sometimes the case that the last error in the log is InterruptedException or some other similar error during component shutdown after something else has already failed. It might help to look at all the errors at the end of the log if there are several; esp. the initial exception. Also please capture all the callstacks (or attach the entire log) if opening a bug.
Look at Ambari operation logs for HiveServer Interactive startup.
If it has an error that is basically a timeout waiting for LLAP cluster, go to step 2 to determine why LLAP wasn’t able to start. This may also manifest as InvalidACL error (on an older HDP version).
If the issue is after LLAP has started (i.e. in HiveServer2 startup) and there's no useful error message, see hiveServer2Interactive logs on the HiveServer2 Interactive service machine. Try to make sense of the error, and/or file a bug or contact someone in dev (Hive).
Otherwise, try to make sense of the error, and/or file a bug or contact someone in dev (Ambari or Hive depending on the error).
Check that the LLAP YARN app has enough capacity via ResourceManager UI, assuming it hasn't just failed (skip this step if the app is in failed state).
If the app is neither running nor complete (i.e. it’s queued, accepted, etc.), capacity is missing in the cluster to start an app; check what takes up space on the cluster; if nothing looks unexpected, consult the sizing doc to ensure LLAP is set up properly.
If the app is running (not failed), go to “Tracking URL: Application Master” link, and check if it can start enough LLAP daemons on the cluster (Desired vs Actual containers for LLAP component). If not, check if there’s enough capacity to start this number of containers – something else might be taking up the space; if nothing looks unexpected, consult the sizing doc to ensure LLAP is set up properly.
If the app has failed, or there is enough capacity but the app doesn’t have enough containers, view the logs in the UI (or kill the app and download YARN logs). If the slider app status says “too many component failures” or a similar error, go directly to step 5; otherwise step 4.
Check slider.log for errors (in the UI, it would be in the slider AM – usually the lowest numbered container).
If the errors are container failures or there are no errors, go to step 5.
Otherwise, try to make sense of the error, and/or file a bug or contact someone in dev (YARN or Hive depending on the error).
If present, check daemon logs in one of the failed containers for errors.
Start with (llap-daemon*.log, llap-daemon*.out). They would usually have the error at the end. Try to make sense of the error, and/or contact Hive team.
If those are not present, check slider-agent.log instead; try to make sense of the error, and/or file a bug or contact someone in dev (YARN or Hive depending on the error).