Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

New Contributor

I'm using the below command to run the job:

su root --command "/usr/hdp/2.5.3.0-37/spark2/bin/spark-submit --class com.kronos.research.svc.datascience.DataScienceApp --verbose --master yarn --deploy-mode cluster --jars dst-svc-reporting-assembly-0.0.1-SNAPSHOT-deps.jar,dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar,ojdbc6.jar,sqljdbc4.jar --driver-memory 2G --executor-memory 6G --total-executor-cores 6 --conf spark.cores.max=6 --conf spark.executor.cores=6 --conf spark.scheduler.mode=FIFO --conf spark.sql.warehouse.dir --conf spark.cassandra.output.concurrent.writes=5 --conf spark.cassandra.output.batch.size.bytes=4096 --conf spark.cassandra.output.consistency.level=ALL --conf spark.cassandra.input.consistency.level=LOCAL_ONE --conf spark.driver.extraClassPath=sqljdbc4.jar:ojdbc6.jar --conf spark.executor.extraJavaOptions=\"-Duser.timezone=UTC \" --driver-class-path sqljdbc4.jar:ojdbc6.jar --driver-java-options \"-Dspark.ui.port=0 -Doracle.jdbc.timezoneAsRegion=false -Dspark.cassandra.connection.host=catalyst-1.int.kronos.com,catalyst-2.int.kronos.com,catalyst-3.int.kronos.com -Duser.timezone=UTC\" dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar -job.type=pipeline -pipeline=usagemon -jdbc.type=tenant -jdbc.tenant=10013 -date.start=\"2013-02-01\" -date.end=\"2016-05-01\" -labor.level=4 -output.type=console -jdbc.throttle=15 -log.level=INFO -jdbc.cache.dir=hdfs://kvs-in-merlin04:8020/etl"

I'm getting the below exception. There are some Warnings as well related to the Hive Metastore. Basically the job starts but then the progress doesn't move ahead of 10 %. It's stuck there only. I'm also attaching the complete stderr file from one of the executors. Please help!!

errorfile.txt

java.io.EOFException: End of File Exception between local host is: "kvs-in-merlin08/10.131.137.96"; destination host is: "kvs-in-merlin04.int.kronos.com":8020; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
5 REPLIES 5

Re: I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

Expert Contributor

Its there in the stack trace of your job. The "default" database already exists and the code is trying to create it again.

16/12/27 06:22:12 INFO audit: ugi=root	ip=unknown-ip-addr	cmd=create_database: Database(name:default, description:default database, locationUri:file:/hadoopdisk/hadoop/yarn/local/usercache/root/appcache/application_1482810683736_0015/container_e89_1482810683736_0015_01_000001/spark-warehouse, parameters:{})	
16/12/27 06:22:12 ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at com.sun.proxy.$Proxy28.create_database(Unknown Source)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
	at com.sun.proxy.$Proxy29.createDatabase(Unknown Source)
	at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:309)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:309)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:309)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269)
	at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:308)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
	at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
	at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51)
	at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49)
	at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
	at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
	at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
	at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
	at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
	at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:238)
	at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:159)
	at com.kronos.research.pipeline.sources.JdbcSource$UsingColumn$$anonfun$toDF$1.apply(JdbcSource.scala:86)
	at com.kronos.research.pipeline.sources.JdbcSource$UsingColumn$$anonfun$toDF$1.apply(JdbcSource.scala:81)
	at scala.Option.getOrElse(Option.scala:121)
	at com.kronos.research.pipeline.sources.JdbcSource$UsingColumn.toDF(JdbcSource.scala:81)
	at com.kronos.research.usagemonitoring.UsageAudits.wfc_audit$lzycompute(UsageAudits.scala:11)
	at com.kronos.research.usagemonitoring.UsageAudits.wfc_audit(UsageAudits.scala:11)

Re: I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

New Contributor

Hi Bikas,

Thanks for your reply, I'm a little confused here since I'm new to Spark. I've not given any Hive configurations while running this Spark job but from the logs it shows that it's trying to connect to the Hive Metastore, I'm not sure why is that happening and how to resolve the above exception..

Thanks

Shikhar

Re: I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

Expert Contributor

Looks like there is already a JIRA filed for this :

https://issues.apache.org/jira/browse/SPARK-15345

Can you check and see if the workaround given in the last comment helps to address this issue for you ?

Re: I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

New Contributor

Hi,

I've already tried that workaround on my sandbox but still having the same issue. Also facing the exact same issue on my hortonworks cluster on which we have HDP 2.5 and Spark2 installed

I'm trying to run this job now on Sandbox 2.5 on Spark2 and have the folllowing settings in spark:

spark-env.sh:

HADOOP_CONF_DIR=/etc/hadoop/conf:/etc/hive/conf

SPARK_EXECUTOR_INSTANCES=2

SPARK_EXECUTOR_CORES=1

SPARK_EXECUTOR_MEMORY=512M

SPARK_DRIVER_MEMORY=512M

spark-defaults.conf:

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
 
spark.driver.extraJavaOptions -Dhdp.version=2.5.0.0-817
spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.0.0-817
 
spark.eventLog.dir hdfs:///spark-history
spark.eventLog.enabled true
 
# Required: setting this parameter to 'false' turns off ATS timeline server for Spark
spark.hadoop.yarn.timeline-service.enabled false
 
#spark.history.fs.logDirectory hdfs:///spark-history
#spark.history.kerberos.keytab none
#spark.history.kerberos.principal none
#spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
#spark.history.ui.port 18080
 
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 200
spark.yarn.executor.memoryOverhead 200
#spark.yarn.historyServer.address sandbox.hortonworks.com:18080
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.submit.file.replication 3
 
spark.ui.port 4041

Any help would be appreciated.

Re: I'm trying to run a Spark job in YARN cluster mode and getting this error: ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

New Contributor

i see --conf spark.sql.warehouse.dir as empty while you are running spark-submit ,can u point to proper directory based on your installation configuration

ex:spark.sql.warehouse.dir=/user/hive/warehouse