Support Questions

gpolanch · ‎07-10-2016

I successfully worked through Tutorial -400 (Using Hive with ORC from Apache Spark). But, what I would really like to do is to read established Hive ORC tables into Spark without having to know the HDFS path and filenames. I created an ORC table in Hive, then did the following commands from the tutorial in scala, but from the exception, it appears that the read/load is expecting the HDFS filename. How do I read directly from the Hive table, not HDFS? I searched, but could not find an existing answer.

Thanks much!

-Greg

hive> create table test_enc_orc stored as ORC as select * from test_enc;
hive> select count(*) from test_enc_orc; 
OK 
10

spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val test_enc_orc = hiveContext.read.format("orc").load("test_enc_orc")

java.io.FileNotFoundException: File does not exist: 
hdfs://sandbox.hortonworks.com:8020/user/xxxx/test_enc_orc
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1319)

slachterman · ‎07-10-2016

@Greg Polanchyck if you have an existing ORC table in the Hive metastore, and you want to load the whole table into a Spark DataFrame, you can use the sql method on the hiveContext to run:

val test_enc_orc = hiveContext.sql("select * from test_enc_orc")

View solution in original post

slachterman · ‎07-10-2016

@Greg Polanchyck if you have an existing ORC table in the Hive metastore, and you want to load the whole table into a Spark DataFrame, you can use the sql method on the hiveContext to run:

val test_enc_orc = hiveContext.sql("select * from test_enc_orc")

eptakaktak · ‎07-27-2017

I like this more

val test_enc_orc = hiveContext.table("test_enc_orc")

gpolanch · ‎07-11-2016

@slachterman Thank you very much ! That worked well ! -Greg

tusharn184 · ‎04-21-2017

i m also having same problem giving error

INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1492763204120 end=1492763204592 duration=472 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>

Exception in thread "main" java.util.NoSuchElementException: next on empty iterator

at scala.collection.Iterator$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$anon$2.next(Iterator.scala:37) at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64) at scala.collection.IterableLike$class.head(IterableLike.scala:91) at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$super$head(ArrayOps.scala:108) at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120) at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:108) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1422) at org.apache.spark.sql.DataFrame.first(DataFrame.scala:1429) at com.apollobit.jobs.TestData$.main(TestData.scala:32) at com.apollobit.jobs.TestData.main(TestData.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Please any body can help????

Cloudera Community

Support Questions

How to read table into Spark using the Hive tablename, not HDFS filename?

Writing parquet on HDFS using Spark Streaming

Read from Hive using Rapidminer

Creating HIVE partitioned tables using sqoop

read orc table from spark

Streamlining Data Processing with Spark HBase Inte...

How to Integrate CDE with COD and Reading & Writin...

write a file to HDFS using Spark

Using Zeppelin/Spark to query HDFS Ranger Audit lo...

How to Extract All Hive Tables DDL

When to Use Hive CSVSerde