Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Database not found when querying Hive with Spark on localhost

Highlighted

Database not found when querying Hive with Spark on localhost

New Contributor

Hello,

I'm trying to request Hive table using Spark/Scala code on IntelliJ. I have installed Hadoop and Hive on a virtual machine (localhost environment) and the database 'test' and the table 'employee' have been already created using the Hive console. The versions of my componants are the following :
- Spark 2.2.0
- Hive 1.1.0
- Hadoop 2.7.3

Here, the code i'm trying to run :

import org.apache.spark.sql.SparkSession

// warehouseLocation points to the default location for managed databases and tables
val warehouseLocation = "hdfs://localhost:9000/user/hive/warehouse"

val spark = SparkSession
 .builder()
 .appName("Spark Hive Local Connector")
 .config("spark.sql.warehouse.dir", warehouseLocation)
 .config("spark.master", "local")
 .enableHiveSupport()
 .getOrCreate()

spark.catalog.listDatabases().show(false)
spark.catalog.listTables().show(false)
spark.conf.getAll.mkString("\n")

import spark.implicits._
import spark.sql

sql("USE test")
sql("SELECT * FROM test.employee").show()

Hive access seems to be properly configured via the hive-site.xml file located into /usr/local/apache-hive-1.1.0-bin/conf directory (Hive have been installed with Derby).

When i'm running the code below, it shows the existing default database but it doesn't show the table that are inside default and test databases :

+-------+---------------------+-----------------------------------------+
|name   |description          |locationUri                              |
+-------+---------------------+-----------------------------------------+
|default|Default Hive database|hdfs://localhost:9000/user/hive/warehouse|
+-------+---------------------+-----------------------------------------+

18/11/20 22:32:51 INFO SparkSqlParser: Parsing command: USE test
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
+----+--------+-----------+---------+-----------+

I join you the configurations of the hive-site.xml file i'm using :

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/usr/local/apache-hive-1.1.0-bin/metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>

Once the code has finished, it throws the following error :

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'test' not found;

I have tried many attempts and i still do not understand why i cannot access to the 'test' database. I tried to create the 'employee' table into the 'default' database but i obtained the same error. I also placed the hive-site.xml file into the /resource folder of IntelliJ and the result is the same.

What can be missing when accessing Hive ? Thank you for your contributions !

1 REPLY 1

Re: Database not found when querying Hive with Spark on localhost

New Contributor

Check property in spark for hive warehouse. spark.sql.warehouse.dir = /apps/hive/warehouse