Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark hive program in eclipse

Spark hive program in eclipse

New Contributor

I have install cloudera-quickstart-vm-5.4.2-virtualbox and I am trying to debug spark-hive program using eclipse. 

I am getting table not found exception. looks like i am missing some configuraiton, any help is highly appricated.

below are the error log and sinppet of my code. I have created transaction table under default database.


public static void main(String[] args) {
JavaSparkContext sc = new JavaSparkContext(conf);
HiveContext hiveContext = new HiveContext(JavaSparkContext.toSparkContext(sc));*/
SparkConf conf = new SparkConf().setMaster("spark://quickstart.cloudera:7077").setAppName("HdfsToSolr").setSparkHome("/usr/lib/spark/");
JavaSparkContext sc = new JavaSparkContext(conf);
HiveContext hiveContext = new HiveContext(JavaSparkContext.toSparkContext(sc));
Row[] results = hiveContext.sql(select * from transaction).collect();

 

 

--------------------Error Log-----------------------------------------------

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/18 16:09:47 INFO SparkContext: Running Spark version 1.3.0
16/04/18 16:09:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/18 16:09:50 INFO SecurityManager: Changing view acls to: cloudera
16/04/18 16:09:50 INFO SecurityManager: Changing modify acls to: cloudera
16/04/18 16:09:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
16/04/18 16:09:51 INFO Slf4jLogger: Slf4jLogger started
16/04/18 16:09:52 INFO Remoting: Starting remoting
16/04/18 16:09:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@quickstart.cloudera:44941]
16/04/18 16:09:52 INFO Utils: Successfully started service 'sparkDriver' on port 44941.
16/04/18 16:09:52 INFO SparkEnv: Registering MapOutputTracker
16/04/18 16:09:52 INFO SparkEnv: Registering BlockManagerMaster
16/04/18 16:09:52 INFO DiskBlockManager: Created local directory at /tmp/spark-9b48dce3-7a5b-4242-b39e-5ff721a0752b/blockmgr-9d73920f-238c-4be9-8978-8f47bf570758
16/04/18 16:09:52 INFO MemoryStore: MemoryStore started with capacity 500.1 MB
16/04/18 16:09:53 INFO HttpFileServer: HTTP File server directory is /tmp/spark-4842de15-b3e4-4d09-acf6-9ca619371bf2/httpd-a8cbf9ca-28d6-4bfa-9e45-dad3f448dfd0
16/04/18 16:09:53 INFO HttpServer: Starting HTTP Server
16/04/18 16:09:53 INFO Server: jetty-8.y.z-SNAPSHOT
16/04/18 16:09:53 INFO AbstractConnector: Started SocketConnector@0.0.0.0:46262
16/04/18 16:09:53 INFO Utils: Successfully started service 'HTTP file server' on port 46262.
16/04/18 16:09:54 INFO SparkEnv: Registering OutputCommitCoordinator
16/04/18 16:10:02 INFO Server: jetty-8.y.z-SNAPSHOT
16/04/18 16:10:02 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/04/18 16:10:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/04/18 16:10:02 INFO SparkUI: Started SparkUI at http://quickstart.cloudera:4040
16/04/18 16:10:03 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@quickstart.cloudera:7077/user/Master...
16/04/18 16:10:04 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160418161003-0010
16/04/18 16:10:04 INFO AppClient$ClientActor: Executor added: app-20160418161003-0010/0 on worker-20160418091429-quickstart.cloudera-7078 (quickstart.cloudera:7078) with 1 cores
16/04/18 16:10:04 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160418161003-0010/0 on hostPort quickstart.cloudera:7078 with 1 cores, 512.0 MB RAM
16/04/18 16:10:04 INFO AppClient$ClientActor: Executor updated: app-20160418161003-0010/0 is now LOADING
16/04/18 16:10:04 INFO AppClient$ClientActor: Executor updated: app-20160418161003-0010/0 is now RUNNING
16/04/18 16:10:04 INFO NettyBlockTransferService: Server created on 57879
16/04/18 16:10:04 INFO BlockManagerMaster: Trying to register BlockManager
16/04/18 16:10:04 INFO BlockManagerMasterActor: Registering block manager quickstart.cloudera:57879 with 500.1 MB RAM, BlockManagerId(<driver>, quickstart.cloudera, 57879)
16/04/18 16:10:04 INFO BlockManagerMaster: Registered BlockManager
16/04/18 16:10:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/04/18 16:10:10 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/04/18 16:10:11 INFO ObjectStore: ObjectStore, initialize called
16/04/18 16:10:12 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/04/18 16:10:12 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/04/18 16:10:21 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/04/18 16:10:21 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
16/04/18 16:10:29 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@quickstart.cloudera:51417/user/Executor#-1856620862] with ID 0
16/04/18 16:10:30 INFO BlockManagerMasterActor: Registering block manager quickstart.cloudera:49467 with 267.3 MB RAM, BlockManagerId(0, quickstart.cloudera, 49467)
16/04/18 16:10:31 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/04/18 16:10:31 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/04/18 16:10:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/04/18 16:10:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/04/18 16:10:34 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/04/18 16:10:34 INFO ObjectStore: Initialized ObjectStore
16/04/18 16:10:35 INFO HiveMetaStore: Added admin role in metastore
16/04/18 16:10:35 INFO HiveMetaStore: Added public role in metastore
16/04/18 16:10:36 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/04/18 16:10:37 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
16/04/18 16:10:39 INFO ParseDriver: Parsing command: select * from transaction
16/04/18 16:10:40 INFO ParseDriver: Parse Completed
16/04/18 16:10:44 INFO HiveMetaStore: 0: get_table : db=default tbl=transaction
16/04/18 16:10:44 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_table : db=default tbl=transaction
16/04/18 16:10:45 ERROR Hive: NoSuchObjectException(message:default.transaction table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy12.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy13.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:976)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:180)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:252)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:252)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:175)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:187)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:182)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:186)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:177)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:182)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:172)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:1071)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:1071)
at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1069)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92)

4 REPLIES 4
Highlighted

Re: Spark hive program in eclipse

Contributor

My code:

 

SparkConf conf = new SparkConf().setAppName("HdfsToSolr");
JavaSparkContext sc = new JavaSparkContext(conf);

HiveContext hiveContext = new HiveContext(JavaSparkContext.toSparkContext(sc));

hiveContext.sql("select * from test1").show();

 

And it displayed me the records in the table . You should not specify "master" in the program, instead try to pass it with spark-submit.

spark-submit --class SparkSQLTest /my/jar/path.jar --master spark://10.0.2.15:7077

 

for your case I can see:

 

16/04/18 16:10:21 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".

Check your configuration for hive: Hive Metastore Database Type , ideally it should be MySql. 

 

Try to submit the above code from the shell where VM is running.

 

 

 

Re: Spark hive program in eclipse

New Contributor

Thanks for your response. I am using default configuration and did not make any change to config files. 

 

hive-site.xml file in both /usr/lib/spark/conf and /usr/lib/hive/conf folders have same below configuration.

 

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>

<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>

 

Is there anyway to configure and debug the application on eclipse? 

 

However i also submitted the job using spark-submit and got below error.

 

-------------------------------------ERROR LOG------------------------------------------------------------------

[cloudera@quickstart conf.dist]$ spark-submit --class com.america.spark.TestSparkHive /home/cloudera/data/sparktosolr.jar --master spark://10.0.2.15:7077
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/04/19 07:08:08 INFO spark.SparkContext: Running Spark version 1.3.0
16/04/19 07:08:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/19 07:08:10 INFO spark.SecurityManager: Changing view acls to: cloudera
16/04/19 07:08:10 INFO spark.SecurityManager: Changing modify acls to: cloudera
16/04/19 07:08:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
16/04/19 07:08:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/04/19 07:08:11 INFO Remoting: Starting remoting
16/04/19 07:08:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@quickstart.cloudera:59160]
16/04/19 07:08:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@quickstart.cloudera:59160]
16/04/19 07:08:11 INFO util.Utils: Successfully started service 'sparkDriver' on port 59160.
16/04/19 07:08:11 INFO spark.SparkEnv: Registering MapOutputTracker
16/04/19 07:08:11 INFO spark.SparkEnv: Registering BlockManagerMaster
16/04/19 07:08:11 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-b88c2e0a-363f-4d92-a907-ffaf5a7556f6/blockmgr-84d8c7ac-a0d0-4808-a7fd-6a89eaf44c95
16/04/19 07:08:11 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
16/04/19 07:08:12 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-8d66910b-97a1-42cc-852a-fdb20118909c/httpd-86cc9ba7-be73-4880-a4d2-360602458c1d
16/04/19 07:08:12 INFO spark.HttpServer: Starting HTTP Server
16/04/19 07:08:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/04/19 07:08:12 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:38399
16/04/19 07:08:12 INFO util.Utils: Successfully started service 'HTTP file server' on port 38399.
16/04/19 07:08:12 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/04/19 07:08:41 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/04/19 07:08:41 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/04/19 07:08:41 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/04/19 07:08:41 INFO ui.SparkUI: Started SparkUI at http://quickstart.cloudera:4040
16/04/19 07:08:41 INFO spark.SparkContext: Added JAR file:/home/cloudera/data/sparktosolr.jar at http://10.0.2.15:38399/jars/sparktosolr.jar with timestamp 1461074921546
16/04/19 07:08:41 INFO client.AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.0.2.15:7077/user/Master...
16/04/19 07:09:01 INFO client.AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.0.2.15:7077/user/Master...
16/04/19 07:09:21 INFO client.AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.0.2.15:7077/user/Master...
16/04/19 07:09:42 WARN cluster.SparkDeploySchedulerBackend: Application ID is not initialized yet.
16/04/19 07:09:42 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
16/04/19 07:09:42 ERROR scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.

Re: Spark hive program in eclipse

Contributor

For debugging your application in IDE, you need to setup remote debugging, this article is suitable for you. 

 

I can see in your logs :

 

16/04/19 07:08:41 INFO client.AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@10.0.2.15:7077/user/Master...

I think there is a typo which you have made from my last reply, because of this your application is unable to connect to the master. Use your master's address instead and which spark-submit script you have used?

Re: Spark hive program in eclipse

New Contributor

I had the same issue with using existing Hive tables from Spark application. I resolved this issue by adding hive-site.xml file to submit command. 

spark-submit ... --files hive-site.xml

If I'm not mistaken there were some issue with finding this configuration file with spark driver. So this is a workaround for it.

Don't have an account?
Coming from Hortonworks? Activate your account here