<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/ in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313620#M225622</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/86945"&gt;@zampJeri&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please let me know from which user you are running the spark application. Check that user is having creating files/directory access under /tmp/hive directory.&lt;/P&gt;</description>
    <pubDate>Tue, 23 Mar 2021 13:31:41 GMT</pubDate>
    <dc:creator>RangaReddy</dc:creator>
    <dc:date>2021-03-23T13:31:41Z</dc:date>
    <item>
      <title>User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313613#M225615</link>
      <description>&lt;P&gt;Hi community,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We run Spark 2.3.2 on Hadoop 3.1.1.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We use external ORC tables stored on HDFS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are encountering an issue on a job run under CRON when issuing the command `sql("msck repair table db.some_table")`. The table is partitioned and the issue is the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;21/03/22 22:44:13 WARN HiveConf: HiveConf of name hive.heapsize does not exist
21/03/22 22:44:13 WARN HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist
21/03/22 22:44:13 WARN HiveConf: HiveConf of name hive.plan.serialization.format does not exist
Hive Session ID = 2625af79-e021-4b57-9435-e0fea4f00803
21/03/22 22:44:13 INFO SessionState: Hive Session ID = 2625af79-e021-4b57-9435-e0fea4f00803
21/03/22 22:44:13 ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/2625af79-e021-4b57-9435-e0fea4f00803_resources;
org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/2625af79-e021-4b57-9435-e0fea4f00803_resources;
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
        at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
        at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.org$apache$spark$sql$hive$HiveSessionStateBuilder$$externalCatalog(HiveSessionStateBuilder.scala:39)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:53)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:53)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:90)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:90)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:237)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:176)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:400)
        at org.apache.spark.sql.catalyst.catalog.CatalogUtils$.getMetaData(ExternalCatalogUtils.scala:265)
        at org.apache.spark.sql.catalyst.catalog.CatalogUtils$.throwIfRO(ExternalCatalogUtils.scala:310)
        at org.apache.spark.sql.hive.HiveTranslationLayerCheck$$anonfun$apply$1.applyOrElse(HiveTranslationLayerStrategies.scala:117)
        at org.apache.spark.sql.hive.HiveTranslationLayerCheck$$anonfun$apply$1.applyOrElse(HiveTranslationLayerStrategies.scala:85)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
        at org.apache.spark.sql.hive.HiveTranslationLayerCheck.apply(HiveTranslationLayerStrategies.scala:85)
        at org.apache.spark.sql.hive.HiveTranslationLayerCheck.apply(HiveTranslationLayerStrategies.scala:83)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:124)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:118)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:103)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The same code that causes this issue is not giving any error on another environment, and all the other flows using the command do not have issues with it. As a side effect, it seems also that the table that was populated before issuing the partitions' repair is producing double entries for each new record.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not sure if it's a permissions' problem, though that would be quite unusual, as all the other flows have never encountered problems with the same command, in case the command needed to use temporary files to store e.g. metastore information.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Might it be a problem with dependencies? HBase is involved initially to read from some sources. LLAP usage is avoided.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The code looks like:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df
.write
.format("orc")
.mode("append")
.partitionBy(singleColumn)
.option("compression", "snappy")
.save(hdfsPath)

sql(s"msck repair table $tableOfInterest") // $tableOfInterest = db.some_table&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2021 10:53:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313613#M225615</guid>
      <dc:creator>zampJeri</dc:creator>
      <dc:date>2021-03-23T10:53:15Z</dc:date>
    </item>
    <item>
      <title>Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313620#M225622</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/86945"&gt;@zampJeri&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please let me know from which user you are running the spark application. Check that user is having creating files/directory access under /tmp/hive directory.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2021 13:31:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313620#M225622</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2021-03-23T13:31:41Z</dc:date>
    </item>
    <item>
      <title>Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313621#M225623</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78612"&gt;@RangaReddy&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I do a simple&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hdfs dfs -ls /tmp/hive&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ls: Permission denied: user={myUser}  access=READ_EXECUTE, inode="/tmp/hive":hive:hdfs:drwx-wx-wx&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I guess that msck repair is using that folder to store temporary files. Is it because the spark submit suggests&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;--conf spark.datasource.hive.warehouse.load.staging.dir="/tmp"&lt;/LI-CODE&gt;&lt;P&gt;?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2021 13:47:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313621#M225623</guid>
      <dc:creator>zampJeri</dc:creator>
      <dc:date>2021-03-23T13:47:11Z</dc:date>
    </item>
    <item>
      <title>Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313637#M225631</link>
      <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/86945" target="_blank"&gt;@zampJeri&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;This /tmp is about the OS file system, not HDFS. It wants to create the _resources files and unable. Does the user have permissions on /tmp/hive?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Mar 2021 17:30:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313637#M225631</guid>
      <dc:creator>mugdha</dc:creator>
      <dc:date>2021-03-23T17:30:17Z</dc:date>
    </item>
    <item>
      <title>Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313685#M225658</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/86945"&gt;@zampJeri&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes one of operation write or msck repair command is using temp directory. Current running user is not having create directory permission. Could you please give the proper permission and re run the job.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Mar 2021 15:56:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313685#M225658</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2021-03-24T15:56:22Z</dc:date>
    </item>
    <item>
      <title>Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313686#M225659</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78612"&gt;@RangaReddy&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/36139"&gt;@mugdha&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi, thanks for the replies.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The user has all the permissions to write to /tmp and subfolders.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are currently investigating other parts of the code, even if the exception points to the specific line of the msck repair command. As far as I knew, that command would throw an exception if dealing with non-partitioned tables, but indeed the table under interest is partitioned. I'm not sure if an empty table could give troubles, but then other jobs should break just the same occasionally (especially the same code under a different environment - and it should be the same if considering authentication files passed to the submit).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the beginning, we were using the Hive Warehouse Connector by means of&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;hive.execute("msck repair table etc...")&lt;/LI-CODE&gt;&lt;P&gt;but we were told to stay away from triggering unnecessary LLAP (that was giving us a lot of troubles generally), so we removed all instances of HWC and all jobs run just fine with spark.sql.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;</description>
      <pubDate>Wed, 24 Mar 2021 16:08:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313686#M225659</guid>
      <dc:creator>zampJeri</dc:creator>
      <dc:date>2021-03-24T16:08:02Z</dc:date>
    </item>
    <item>
      <title>Re: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: Unable to create directory /tmp/hive/</title>
      <link>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313881#M225781</link>
      <description>&lt;P&gt;Ok, we found the very stupid issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This specific job running as standalone was passing the "hive-site.xml" as file to the spark-submit, whereas all other jobs run under Oozie and make use of a generic spark-submit that doesnt pass the "hive-site.xml" file. This file specifies /tmp/hive as default directory to dump temporary resources and it came out that our user still has issues with that folder, issues that are being investigated. The workaround so far is to not pass the hive-site.xml file, so the default directory is instead /tmp, where we can happily live without issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All in all, it was a stupid "mistake" that let us know about other issues with out current system.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers and thanks to all for the support!&lt;/P&gt;</description>
      <pubDate>Wed, 31 Mar 2021 09:37:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/User-class-threw-exception-org-apache-spark-sql/m-p/313881#M225781</guid>
      <dc:creator>zampJeri</dc:creator>
      <dc:date>2021-03-31T09:37:01Z</dc:date>
    </item>
  </channel>
</rss>

