- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to load Hive data in to Spark-shell
- Labels:
-
Cloudera Manager
-
Cloudera Navigator
Created on ‎06-29-2017 10:34 PM - edited ‎09-16-2022 04:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have downloaded Cloudera quickstart 5.10 for VirtualBox.
But it's not loading hive data into spark
import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._
val hiveObj = new HiveContext(sc)
hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables.
val sample = sqlContext.sql("select * from table").collect()
sample.foreach(println)
Still i'm getting the error as table not found(It's not accessing metadata)
What should i do, Any one pls help me
(In cloudera quickstart we are unable to copy hive-site.xml in to spark/conf)
Created ‎06-30-2017 07:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have to declare the variable sqlContext before you import as follows.. But you are using hiveObj instead... Once you are done with the below steps, you can use sqlContext to interact with Hive
val sqlContext = new HiveContext(sc)
import sqlContext.implicits._
Created ‎07-02-2017 10:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your replay, but still its not loading.
giving errror as table not found.
Could you please show how to load hive data in sparks-shell(I'm using Cloudera quick start5.10 in VirtualBox)
Created ‎07-03-2017 03:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
replace this in your code val sample = hiveObj.sql("select * from table").collect()
you need to use the hiveObj instead of SqlContext.
let me know if that helps
Created ‎07-03-2017 08:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try this, it will work (Note: Login to hive and make sure the table is exists)
import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql.hive.HiveContext val conf = new SparkConf().setAppName("Test").setMaster("yarn-client") val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) import sqlContext.implicits._ val resultDF = sqlContext.sql("select * from test.emp where empid=100")
Created on ‎07-03-2017 11:17 PM - edited ‎07-03-2017 11:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for quick replay to Saranvisa and csguna. Please see bellow
Table is existed and its loaded as
scala> val l = sc.textFile("/user/hive/warehouse/cloudera.db/test1").collect().foreach(println)
1|Raj|200
2|Rahul|300
3|Ram|400
4|Sham|250
5|John|500
l: Unit = ()
scala>
But in the bellow its not working, Could you please check it once(its showing one metastore warning also in 2nd step) I try in different ways but its showing as "table not found"
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala> val sqlContext = new HiveContext(sc)
17/07/03 22:47:30 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@36bcf0b6
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> val r = sqlContext.sql("select * from cloudera.test1")
org.apache.spark.sql.AnalysisException: Table not found: `cloudera`.`test1`; line 1 pos 23
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC$$iwC.<init>(<console>:50)
at $iwC$$iwC.<init>(<console>:52)
at $iwC.<init>(<console>:54)
at <init>(<console>:56)
at .<init>(<console>:60)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:35)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Created ‎07-04-2017 12:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As mentioned in my post replace the below line of code
scala> val r = sqlContext.sql("select * from cloudera.test1")
with
val sample = hiveObj.sql("select * from table").collect()
U need to use hiveObj. This will fix the error
Created on ‎07-04-2017 02:18 AM - edited ‎07-04-2017 02:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guna,
i did as you say still same thing repeating,Please see bellow:
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> val hiveObj = new HiveContext(sc)
17/07/04 02:10:55 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
hiveObj: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3474ddfe
scala> hiveObj.refreshTable("cloudera.test1")
scala> val s = hiveObj.sql("select * from cloudera.test1").collect()
org.apache.spark.sql.AnalysisException: Table not found: `cloudera`.`test1`; line 1 pos 23
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
Created ‎07-04-2017 04:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After runining the below statement
scala> hiveObj.refreshTable("cloudera.test1")
try to see if you can list tables in the database using show after you refresh
val df = hiveObj.sql("show tables in database_name); df.show()
Created ‎07-04-2017 10:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guna,
i try as you said, default and cloudera both databases have the tables but its not showing tables.
When i try as "show databases" its not showing bothdatabases its showing only default. Please See Bellow
scala> val df = hiveObj.sql("show tables in cloudera")
df: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]
scala> df.show()
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
+---------+-----------+
scala> val df1 = hiveObj.sql("show tables in default")
df1: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]
scala> df1.show()
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
+---------+-----------+
scala> val df2 = hiveObj.sql("show databases")
df2: org.apache.spark.sql.DataFrame = [result: string]
scala> df2.show()
+-------+
| result|
+-------+
|default|
+-------+
