question Re: How to read table into Spark using the Hive tablename, not HDFS filename? in Archives of Support Questions (Read Only)

How to read table into Spark using the Hive tablename, not HDFS filename?

gpolanch — Fri, 16 Sep 2022 10:29:13 GMT

I successfully worked through Tutorial -400 (Using Hive with ORC from Apache Spark). But, what I would really like to do is to read established Hive ORC tables into Spark without having to know the HDFS path and filenames. I created an ORC table in Hive, then did the following commands from the tutorial in scala, but from the exception, it appears that the read/load is expecting the HDFS filename. How do I read directly from the Hive table, not HDFS? I searched, but could not find an existing answer.

Thanks much!

-Greg

hive> create table test_enc_orc stored as ORC as select * from test_enc;
hive> select count(*) from test_enc_orc; 
OK 
10

spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val test_enc_orc = hiveContext.read.format("orc").load("test_enc_orc")

java.io.FileNotFoundException: File does not exist: 
hdfs://sandbox.hortonworks.com:8020/user/xxxx/test_enc_orc
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1319)

Re: How to read table into Spark using the Hive tablename, not HDFS filename?

slachterman — Mon, 11 Jul 2016 05:02:37 GMT

@Greg Polanchyck if you have an existing ORC table in the Hive metastore, and you want to load the whole table into a Spark DataFrame, you can use the sql method on the hiveContext to run:

val test_enc_orc = hiveContext.sql("select * from test_enc_orc")

Re: How to read table into Spark using the Hive tablename, not HDFS filename?

gpolanch — Mon, 11 Jul 2016 10:03:39 GMT

@slachterman Thank you very much ! That worked well ! -Greg

Re: How to read table into Spark using the Hive tablename, not HDFS filename?

tusharn184 — Fri, 21 Apr 2017 20:03:30 GMT

i m also having same problem giving error

INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1492763204120 end=1492763204592 duration=472 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>

Exception in thread "main" java.util.NoSuchElementException: next on empty iterator

at scala.collection.Iterator$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$anon$2.next(Iterator.scala:37) at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64) at scala.collection.IterableLike$class.head(IterableLike.scala:91) at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$super$head(ArrayOps.scala:108) at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120) at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:108) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1422) at org.apache.spark.sql.DataFrame.first(DataFrame.scala:1429) at com.apollobit.jobs.TestData$.main(TestData.scala:32) at com.apollobit.jobs.TestData.main(TestData.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Please any body can help????

Re: How to read table into Spark using the Hive tablename, not HDFS filename?

eptakaktak — Thu, 27 Jul 2017 21:45:16 GMT

I like this more

val test_enc_orc = hiveContext.table("test_enc_orc")