- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDP 2.6 Spark can't create database - configuration issue?
- Labels:
-
Apache Hive
Created ‎05-16-2018 06:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
After installing HDP 2.6.3, I ran Pyspark in the terminal, then initiated a Spark Session, and tried to create a new database (see last line of code:
$ pyspark > from pyspark.sql import SparkSession > spark = SparkSession.builder.master("local").appName("test").enableHiveSupport().getOrCreate() > spark.sql("show databases").show() > spark.sql("create database if not exists NEW_DB")
However, PySpark threw an error where it was trying to create a database locally:
AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Unable to create database path file:/home/jdoe/spark-warehouse/new_db.db, failed to create database new_db);'
I wasn't trying to create a database locally. I was trying to create a database within Hive. Is there a configuration problem with HDP 2.6.3?
Please advise. Thanks.
Created ‎05-16-2018 07:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Doe Could you try running on yarn client mode instead of local? I think this will help resolving the problem you have now.
$ pyspark --master yarn from pyspark.sql import SparkSession spark =SparkSession.builder.appName("test").enableHiveSupport().getOrCreate() spark.sql("show databases").show() spark.sql("create database if not exists NEW_DB")
Note: If you comment this post make sure you tag my name. And If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
HTH
Created ‎05-16-2018 07:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Doe Could you try running on yarn client mode instead of local? I think this will help resolving the problem you have now.
$ pyspark --master yarn from pyspark.sql import SparkSession spark =SparkSession.builder.appName("test").enableHiveSupport().getOrCreate() spark.sql("show databases").show() spark.sql("create database if not exists NEW_DB")
Note: If you comment this post make sure you tag my name. And If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
HTH
Created ‎05-16-2018 08:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Felix Albani,
Thanks for your reply. Unfortunately, the suggestion didn't work. First, it took FOREVER to launch pyspark with the yarn option
$ pyspark --master yarn
option (and I still don't understand why that option was needed). And also, when it did launch, it ultimately threw a bunch of java errors.
Created ‎05-16-2018 09:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Doe Did it throw errors before of after running the code? I think is expected to take longer since its launching an application on the cluster. Another option that may help you get passed this issue is adding the LOCATION to the directory you like the database to be created? Something like this:
CREATE DATABASE IF NOT EXISTS abc LOCATION '/user/zeppelin/abc.db'
HTH
Created ‎05-16-2018 08:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Felix_Albani,
Thanks for your reply. Unfortunately, the suggestion didn't work. First, it took FOREVER to launch pyspark with the yarn option
$ pyspark --master yarn
option (and I still don't understand why that option was needed). And also, when it did launch, it ultimately threw a bunch of java errors.
Created ‎05-17-2018 10:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have enough permissions on this directory /home/jdoe/spark-warehouse ?
Created ‎05-17-2018 04:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Felix Albani
Thank you for your reply. That suggestion actually worked! However, I don't understand why it is necessary to specify the database location in HDFS. Why does that have to be done in HDP? In other Hadoop/Spark distributions, I haven't had to specify the database filepath and database name when creating Hive databases with Spark.
I still believe there is a configuration problem with Hive and Spark with HDP.
Created ‎05-17-2018 04:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to this Hortonworks community URL, Location is NOT mandatory. But it was the only way I was able to create a database.
Created ‎05-17-2018 04:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Felix Albani,
According to @Aditya Sirna's reply to a similar thread, Spark 2 (which is what I am using - NOT Spark 1) has a different warehouse location, which, I suppose, explains why LOCATION must be used.
@Aditya Sirna, if I want to create a Hive database with Spark, do I have to use the location statement? If so, what location statement should I use if I want to keep my databases and tables managed by the Hive metastore?
Created ‎05-17-2018 04:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Doe Good to hear LOCATION helped. Please remember mark the answer if you think it has helped you with the issue.
