For a long time, we have been running Hive LLAP together with sparklyr and it’s been working fine. Today, we upgraded to HDP 2.6.5 and from now on, we can’t connect to Hive through sparklyr. SparkR works fine with Hive LLAP. Easiest way to see the problem is to just show all databases in Hive. After the upgrade, we only see the “default” database. If I try to list the tables in it, it returns an empty list. There is no errors, stacktraces or anything else that can point at the problem.
The code we are running that worked before the upgrade to HDP 2.6.5 is the following
Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf") Sys.setenv(HIVE_CONF_DIR = "/etc/hive/conf") Sys.setenv(SPARK_HOME="/usr/hdp/current/spark2-client/") library(argparse) library(tidyverse) library(sparklyr) library(DBI) library(plyr) library(dplyr) .config <- spark_config() .config <- c(.config, list("spark.executor.memory"="2688M", "spark.shuffle.service.enabled"="true", "spark.dynamicAllocation.enabled"="true", "spark.executor.extraClassPath"="/usr/hdp/188.8.131.52-292/spark_llap/spark-llap-assembly-184.108.40.206.6.5.0-292.jar")) sc <- spark_connect(master = "yarn-client", app_name = "sparklyr-test", config = .config) DBI::dbGetQuery(sc, 'show databases')
Anybody got any information that can help us solve this problem?
Yes, Authorization is with Ranger
Everything else is working fine in the cluster. Ranger with Spark + LLAP works fine. Zeppelin with R/Python + Spark + Livy + LLAP + Ranger is working fine. Only thing after the upgrade that is not working is the sparklyr problem we have. So I dont think the problem we have are related to Authorization.