<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark - Hive tables not found when running in YARN-Cluster mode in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98519#M11887</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt; That did the trick! &lt;span class="lia-unicode-emoji" title=":grinning_face_with_big_eyes:"&gt;😃&lt;/span&gt; I didn't notice that at first. I wasn't the one who set-up our cluster so I had no idea that the contents of those two files were different. It's a subtle thing but I had a lot of trouble just because of that. Thank you very much!
&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;&lt;/A&gt; &lt;/P&gt;</description>
    <pubDate>Fri, 11 Dec 2015 10:35:42 GMT</pubDate>
    <dc:creator>latorres</dc:creator>
    <dc:date>2015-12-11T10:35:42Z</dc:date>
    <item>
      <title>Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98508#M11876</link>
      <description>&lt;P&gt;
	I have a Spark (version 1.4.1) application on HDP 2.3.2. It works fine when running it in YARN-Client mode. However, when running it on YARN-Cluster mode none of my Hive tables can be found by the application.&lt;/P&gt;&lt;P&gt;
	I submit the application like so:&lt;/P&gt;
&lt;PRE&gt;  ./bin/spark-submit 
  --class com.myCompany.Main 
  --master yarn-cluster 
  --num-executors 3 
  --driver-memory 4g 
  --executor-memory 10g 
  --executor-cores 1 
  --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar /home/spark/apps/YarnClusterTest.jar  
  --files /etc/hive/conf/hive-site.xml
&lt;/PRE&gt;&lt;P&gt;
	Here's an excerpt from the logs:&lt;/P&gt;
&lt;PRE&gt;5/12/02 11:05:13 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/12/02 11:05:14 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/12/02 11:05:14 INFO metastore.ObjectStore: ObjectStore, initialize called
15/12/02 11:05:14 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/12/02 11:05:14 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/02 11:05:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker2.xxx.com:34697 with 5.2 GB RAM, BlockManagerId(1, worker2.xxx.com, 34697)
15/12/02 11:05:16 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/12/02 11:05:16 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/12/02 11:05:17 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:17 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:18 INFO metastore.ObjectStore: Initialized ObjectStore
15/12/02 11:05:19 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/12/02 11:05:19 INFO metastore.HiveMetaStore: Added admin role in metastore
15/12/02 11:05:19 INFO metastore.HiveMetaStore: Added public role in metastore
15/12/02 11:05:19 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
15/12/02 11:05:19 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/12/02 11:05:19 INFO parse.ParseDriver: Parsing command: SELECT * FROM streamsummary
15/12/02 11:05:20 INFO parse.ParseDriver: Parse Completed
15/12/02 11:05:20 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes.
15/12/02 11:05:20 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=streamsummary
15/12/02 11:05:20 INFO HiveMetaStore.audit: ugi=spark  ip=unknown-ip-addr  cmd=get_table : db=default tbl=streamsummary   
15/12/02 11:05:20 DEBUG myCompany.Main$: no such table streamsummary; line 1 pos 14
&lt;/PRE&gt;&lt;P&gt;
	I basically run into the same 'no such table' problem for any time my application needs to read from or write to the Hive tables.&lt;/P&gt;&lt;P&gt;
	Thanks in advance!&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;UPDATE:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	I tried submitting the spark application with the --files parameter provided before --jars as per &lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt;'s suggestion, but doing so now gives me an exception saying that the HiveMetastoreClient could not be instantiated.&lt;/P&gt;&lt;P&gt;
	&lt;EM&gt;spark-submit:&lt;/EM&gt;&lt;/P&gt;
&lt;PRE&gt;  ./bin/spark-submit \
  --class com.myCompany.Main \
  --master yarn-cluster \
  --num-executors 3 \
  --driver-memory 1g \
  --executor-memory 11g \
  --executor-cores 1 \
  --files /etc/hive/conf/hive-site.xml \
  --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar \&amp;lt;br&amp;gt;  /home/spark/apps/YarnClusterTest.jar
&lt;/PRE&gt;&lt;P&gt;
	&lt;EM&gt;code:&lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;// core.scala
trait Core extends java.io.Serializable{
/**
 *  This trait should be mixed in by every other class or trait that is dependent on `sc`
 * 
 */
  val sc: SparkContext
  lazy val sqlContext = new HiveContext(sc)
}

// yarncore.scala
trait YarnCore extends Core {
/** 
 * This trait initializes the SparkContext with YARN as the master
 */
  val conf = new SparkConf().setAppName("my app").setMaster("yarn-cluster")
  val sc = new SparkContext(conf)
}

main.scala
	object Test {
	  def main(args:Array[String]){
	
	  /**initialize the spark application**/
	  val app = new YarnCore  // initializes the SparkContext in YARN mode
	  with sqlHelper  // provides SQL functionality
	  with Transformer  // provides UDF's for transforming the dataframes into the marts
	
	  /**initialize the logger**/
	  val log = Logger.getLogger(getClass.getName)
	
	  val count = app.sqlContext.sql("SELECT COUNT(*) FROM streamsummary")
	
	  log.info("streamsummary has ${count} records")
	
	  /**Shut down the spark app**/
	  app.sc.stop
	  }
	}&lt;/PRE&gt;&lt;P&gt;
	&lt;EM&gt;exception:&lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;15/12/11 09:34:55 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/12/11 09:34:56 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
	at org.apache.spark.sql.hive.client.ClientWrapper.&amp;lt;init&amp;gt;(ClientWrapper.scala:117)
	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:165)
	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:163)
	at org.apache.spark.sql.hive.HiveContext.&amp;lt;init&amp;gt;(HiveContext.scala:170)
	at com.epldt.core.Core$class.sqlContext(core.scala:13)
	at com.epldt.Test$anon$1.sqlContext$lzycompute(main.scala:17)
	at com.epldt.Test$anon$1.sqlContext(main.scala:17)
	at com.epldt.Test$.main(main.scala:26)
	at com.epldt.Test.main(main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:486)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.&amp;lt;init&amp;gt;(RetryingMetaStoreClient.java:62)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
	... 14 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
	... 19 more
Caused by: java.lang.NumberFormatException: For input string: "1800s"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1258)
	at org.apache.hadoop.hive.conf.HiveConf.getIntVar(HiveConf.java:1211)
	at org.apache.hadoop.hive.conf.HiveConf.getIntVar(HiveConf.java:1220)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:293)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.&amp;lt;init&amp;gt;(HiveMetaStoreClient.java:214)
	... 24 more&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 16:27:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98508#M11876</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-10T16:27:19Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98509#M11877</link>
      <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/746-log.txt"&gt;log.txt&lt;/A&gt; Uploading a copy of the log excerpt in a text file because it won't format properly in the post&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 16:34:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98509#M11877</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-10T16:34:59Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98510#M11878</link>
      <description>&lt;P&gt;Do you have Kerberos enabled on this cluster? Also - are you using HDP 2.3.0 or HDP 2.3.2?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 20:10:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98510#M11878</guid>
      <dc:creator>agillan</dc:creator>
      <dc:date>2015-12-10T20:10:43Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98511#M11879</link>
      <description>&lt;P&gt;Could you share the code from the com.myCompany.Main class?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 20:21:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98511#M11879</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2015-12-10T20:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98512#M11880</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1196/latorres.html" nodeid="1196"&gt;@Luis Antonio Torres&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I did few tests and I think you just need to change location of --files, it must come before you .jar file.&lt;/P&gt;&lt;P&gt;Find my sample class here:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/gbraccialli/SparkUtils/blob/master/src/main/scala/com/github/gbraccialli/spark/HiveCommand.scala"&gt;https://github.com/gbraccialli/SparkUtils/blob/master/src/main/scala/com/github/gbraccialli/spark/HiveCommand.scala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Project is here:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/gbraccialli/SparkUtils"&gt;https://github.com/gbraccialli/SparkUtils&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Sample spark-submit with hive commands as parameter:&lt;/P&gt;&lt;PRE&gt;git clone &lt;A href="https://github.com/gbraccialli/SparkUtils" target="_blank"&gt;https://github.com/gbraccialli/SparkUtils&lt;/A&gt;
cd SparkUtils/
mvn clean package
spark-submit \
  --class com.github.gbraccialli.spark.HiveCommand \
  --master yarn-cluster \
  --num-executors 1 \
  --driver-memory 1g \
  --executor-memory 1g \
  --executor-cores 1 \
  --files /usr/hdp/current/spark-client/conf/hive-site.xml \
  --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar \
 target/SparkUtils-1.0.0-SNAPSHOT.jar "show tables" "select * from sample_08"&lt;/PRE&gt;</description>
      <pubDate>Thu, 10 Dec 2015 20:46:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98512#M11880</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-10T20:46:36Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98513#M11881</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/5798/spark-hive-tables-not-found-when-running-in-yarn-c.html#"&gt;@Guilherme Braccialli&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks
 for your reply. I tried your suggestion of putting the --files 
parameter before --jars when submitting, but now I'm running into an 
exception saying the HiveMetastoreClient could not be instantiated. I'll
 update my post with the code and new stack trace.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 09:52:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98513#M11881</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T09:52:08Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98514#M11882</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1196/latorres.html" nodeid="1196"&gt;@Luis Antonio Torres&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It worked for me. Can you check content of  /usr/hdp/current/spark-client/conf/hive-site.xml you are using?&lt;/P&gt;&lt;P&gt;mine is like this:&lt;/P&gt;&lt;PRE&gt;  &amp;lt;configuration&amp;gt;
    &amp;lt;property&amp;gt;
      &amp;lt;name&amp;gt;hive.metastore.uris&amp;lt;/name&amp;gt;
      &amp;lt;value&amp;gt;thrift://sandbox.hortonworks.com:9083&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;
  &amp;lt;/configuration&amp;gt;
&lt;/PRE&gt;</description>
      <pubDate>Fri, 11 Dec 2015 09:55:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98514#M11882</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-11T09:55:40Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98515#M11883</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/214/agillan.html" nodeid="214"&gt;@Ana Gillan&lt;/A&gt; we're using 2.3.2, and Kerberos is disabled.&lt;/P&gt;&lt;P&gt;@Jonas Straub I've updated the post with some of the simpler sample code I've used to try and test things out. Even a simple select statement is giving me the same errors... though I can't be sure if my use of the CAKE pattern could be resulting in some unwanted side-effects.
&lt;A rel="user" href="https://community.cloudera.com/users/214/agillan.html" nodeid="214"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:07:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98515#M11883</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T10:07:49Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98516#M11884</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1196/latorres.html" nodeid="1196"&gt;@Luis Antonio Torres&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;Please, do not use hive-site.xml from hive. &lt;/P&gt;&lt;P&gt;You need a clean hive-site.xml for spark, your hive-site.xml for spark should have only this:&lt;/P&gt;&lt;PRE&gt;&amp;lt;configuration&amp;gt;
    &amp;lt;property&amp;gt;
      &amp;lt;name&amp;gt;hive.metastore.uris&amp;lt;/name&amp;gt;
      &amp;lt;value&amp;gt;thrift://sandbox.hortonworks.com:9083&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;
  &amp;lt;/configuration&amp;gt;
&lt;/PRE&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:12:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98516#M11884</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-11T10:12:07Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98517#M11885</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I just want to start by thanking you for your quick responses. I've been struggling with this problem for a while now, and actually I've also asked this on stackoverflow but no luck.&lt;/P&gt;&lt;P&gt;As for /usr/hdp/current/spark-client/conf/hive-site.xml, the content is pretty much the same as yours:&lt;/P&gt;&lt;PRE&gt;&amp;lt;configuration&amp;gt;
   
  &amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;hive.metastore.uris&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;thrift://host.xxx.com:9083&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
   
  &amp;lt;/configuration&amp;gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:20:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98517#M11885</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T10:20:44Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98518#M11886</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/1196/latorres.html" nodeid="1196"&gt;@Luis Antonio Torres&lt;/A&gt;&lt;P&gt;check your command, your are using /etc/hive/conf/hive-site.xml instead of /usr/hdp/current/spark-client/conf/hive-site.xml&lt;/P&gt;&lt;P&gt;I think this is the issue.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:24:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98518#M11886</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-11T10:24:17Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98519#M11887</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt; That did the trick! &lt;span class="lia-unicode-emoji" title=":grinning_face_with_big_eyes:"&gt;😃&lt;/span&gt; I didn't notice that at first. I wasn't the one who set-up our cluster so I had no idea that the contents of those two files were different. It's a subtle thing but I had a lot of trouble just because of that. Thank you very much!
&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:35:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98519#M11887</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T10:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98520#M11888</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1196/latorres.html" nodeid="1196"&gt;@Luis Antonio Torres&lt;/A&gt; I'm glad it worked! It took me a while to make it work also.&lt;/P&gt;&lt;P&gt;You can also check this scala code I created, where you can get hive commands from command line instead of "hard coded":&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/gbraccialli/SparkUtils/blob/master/src/main/scala/com/github/gbraccialli/spark/HiveCommand.scala"&gt;https://github.com/gbraccialli/SparkUtils/blob/master/src/main/scala/com/github/gbraccialli/spark/HiveCommand.scala&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 10:39:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98520#M11888</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-11T10:39:09Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98521#M11889</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I did look at your code, and it's an interesting approach. However in my actual application, the SQL commands don't need to be parameterized since the app performs ETL on a specific set of data already. Still, I'll keep it in mind. I have been trying to come up with a utility mixin that provides wrapper methods to the calls to hiveContext.sql, and all my other Spark apps that needs to call it need only to provide the column and table names, as well as the where conditions.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 11:18:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98521#M11889</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T11:18:57Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98522#M11890</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt; &lt;/P&gt;&lt;P&gt;one thing about your code that got me curious though is how you instantiated your SparkContext... I've been following the Spark &lt;A href="http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark"&gt;programming guide&lt;/A&gt;, and that's why I set the app name and master on the SparkConf before initializing `sc`... In your code you don't set the app master, so in this case does `SparkContext` pick up master from the `--master` arg from the submit command?&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 11:22:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98522#M11890</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T11:22:41Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98523#M11891</link>
      <description>&lt;P&gt;Yes, it picks automatically. You can use same code to run in yarn-master, yarn-client or standalone.&lt;/P&gt;&lt;P&gt;If you want, you can define the app name as well:&lt;/P&gt;&lt;PRE&gt;val sparkConf = new SparkConf().setAppName("app-name")
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
&lt;/PRE&gt;&lt;P&gt;I copied my code and pom.xml from &lt;A rel="user" href="https://community.cloudera.com/users/157/rgelhausen.html" nodeid="157"&gt;@Randy Gelhausen&lt;/A&gt; one:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/randerzander/HiveToPhoenix/blob/master/src/main/scala/com/github/randerzander/HiveToPhoenix.scala"&gt;https://github.com/randerzander/HiveToPhoenix/blob/master/src/main/scala/com/github/randerzander/HiveToPhoenix.scala&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 11:28:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98523#M11891</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-11T11:28:41Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - Hive tables not found when running in YARN-Cluster mode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98524#M11892</link>
      <description>&lt;P&gt;I see! They probably could have phrased the documentation better, IMHO. &lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;you will not want to hardcode &lt;CODE&gt;master&lt;/CODE&gt; in the program,
but rather &lt;A href="http://spark.apache.org/docs/latest/submitting-applications.html"&gt;launch the application with &lt;CODE&gt;spark-submit&lt;/CODE&gt;&lt;/A&gt; and
receive it there&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;The above quote from the &lt;A href="http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark"&gt;documentation&lt;/A&gt; was never actually clear to me and I thought that I had to "receive" the master URL by reading it in the code from some configuration or parameter then setting master. Thanks for clearing that up!&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2015 14:07:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Hive-tables-not-found-when-running-in-YARN-Cluster/m-p/98524#M11892</guid>
      <dc:creator>latorres</dc:creator>
      <dc:date>2015-12-11T14:07:33Z</dc:date>
    </item>
  </channel>
</rss>

