Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark 2.3 : pyspark.sql.utils.AnalysisException: u"Database 'test' not found;" - Only default hive database is visible

avatar

I have installed hdp3.0 and using Spark 2.3 and Hive 3.1.

When I am trying to access hive tables though spark(pyspark/spark-shell) then I am getting below error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 71, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"Database 'test' not found;"

Only default hive database is visible in Spark.

>>> spark.sql("show databases").show()
+------------+
|databaseName|
+------------+
|     default|
+------------+
>>>

Content of hive-site.xml is not exactly same in spark/conf and hive/conf dir.

-rw-r--r-- 1 hive  hadoop 23600 Sep 14 09:21 /usr/hdp/current/hive-client/conf/hive-site.xml
-rw-r--r-- 1 spark spark   1011 Sep 14 12:02 /etc/spark2/3.0.0.0-1634/0/hive-site.xml

I even tried initiated spark session with hive/conf/hive-site.xml, even this did not help.

pyspark --files /usr/hdp/current/hive-client/conf/hive-site.xml

Should I copy hive-site.xml file from hive-conf to spark-conf dir (or anywhere else as well)?

Or changing a property Ambari UI will work?

1 ACCEPTED SOLUTION

avatar

The default database it was showing was the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive.

I am able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

Try to run this query in your metastore database, in my case it is MySQL.

mysql> SELECT NAME, DB_LOCATION_URI FROM hive.DBS;

You will see 2 default databases there, one pointing to 'spark.sql.warehouse.dir' and other to 'hive.metastore.warehouse.di'. Location will depend in what value you have for these configuration properties.

View solution in original post

2 REPLIES 2

avatar

The default database it was showing was the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive.

I am able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

Try to run this query in your metastore database, in my case it is MySQL.

mysql> SELECT NAME, DB_LOCATION_URI FROM hive.DBS;

You will see 2 default databases there, one pointing to 'spark.sql.warehouse.dir' and other to 'hive.metastore.warehouse.di'. Location will depend in what value you have for these configuration properties.

avatar
Contributor

Hi,

I am trying to run spark application which will need access to Hive databases. But Hive databases like FOODMART are not visible in spark session.

I did spark.sql("show databases").show(); it is not showing Foodmart database, though spark session is having enableHiveSupport.


Below i've tried:

1)

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

2)

Changed spark.sql.warehouse.dir in spark UI from /apps/spark/warehouse to /warehouse/tablespace/managed/hive

Even though it is not working.


Please let me know what configuration changes would be required to have this.


Please note - Above is working in HDP2.6.5.