Support Questions

Find answers, ask questions, and share your expertise

Spark 2.3 : pyspark.sql.utils.AnalysisException: u"Database 'test' not found;" - Only default hive database is visible

avatar
Contributor

I have installed hdp3.0 and using Spark 2.3 and Hive 3.1.

When I am trying to access hive tables though spark(pyspark/spark-shell) then I am getting below error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 71, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"Database 'test' not found;"

Only default hive database is visible in Spark.

>>> spark.sql("show databases").show()
+------------+
|databaseName|
+------------+
|     default|
+------------+
>>>

Content of hive-site.xml is not exactly same in spark/conf and hive/conf dir.

-rw-r--r-- 1 hive  hadoop 23600 Sep 14 09:21 /usr/hdp/current/hive-client/conf/hive-site.xml
-rw-r--r-- 1 spark spark   1011 Sep 14 12:02 /etc/spark2/3.0.0.0-1634/0/hive-site.xml

I even tried initiated spark session with hive/conf/hive-site.xml, even this did not help.

pyspark --files /usr/hdp/current/hive-client/conf/hive-site.xml

Should I copy hive-site.xml file from hive-conf to spark-conf dir (or anywhere else as well)?

Or changing a property Ambari UI will work?

1 ACCEPTED SOLUTION

avatar
Contributor

The default database it was showing was the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive.

I am able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

Try to run this query in your metastore database, in my case it is MySQL.

mysql> SELECT NAME, DB_LOCATION_URI FROM hive.DBS;

You will see 2 default databases there, one pointing to 'spark.sql.warehouse.dir' and other to 'hive.metastore.warehouse.di'. Location will depend in what value you have for these configuration properties.

View solution in original post

2 REPLIES 2

avatar
Contributor

The default database it was showing was the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive.

I am able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

Try to run this query in your metastore database, in my case it is MySQL.

mysql> SELECT NAME, DB_LOCATION_URI FROM hive.DBS;

You will see 2 default databases there, one pointing to 'spark.sql.warehouse.dir' and other to 'hive.metastore.warehouse.di'. Location will depend in what value you have for these configuration properties.

avatar
Contributor

Hi,

I am trying to run spark application which will need access to Hive databases. But Hive databases like FOODMART are not visible in spark session.

I did spark.sql("show databases").show(); it is not showing Foodmart database, though spark session is having enableHiveSupport.


Below i've tried:

1)

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

2)

Changed spark.sql.warehouse.dir in spark UI from /apps/spark/warehouse to /warehouse/tablespace/managed/hive

Even though it is not working.


Please let me know what configuration changes would be required to have this.


Please note - Above is working in HDP2.6.5.