Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark cannot read hive orc table

avatar
Explorer

Hello all,

 

I cannot read data from hive orc table and load to dataframe. If someone know, could you help me to fix it? Below is my scripts:

 

from pyspark import SparkContext, SparkConf
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import HiveContext,SQLContext

spark = SparkSession.builder.appName("Testing....").enableHiveSupport().getOrCreate()

hive_context = HiveContext(spark)
sqlContext = SQLContext(spark)

df_pgw=hive_context.sql("select * from orc_table")
Hive Session ID = 79c9e6c0-1649-41dc-9aea-493c0f62d046
22/07/20 11:50:52 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
22/07/20 11:50:56 WARN HiveMetastoreCatalog: Unable to infer schema for table orc_table from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.

 

df_pgw.show()

 

 

=> ....Don't have data presents

 

Thanks,

5 REPLIES 5

avatar
Master Collaborator

Hi @mala_etl 

 

I think you didn't mention you are running the application in CDH/HDP/CDP. Could you please share your hive script and check you are using hive catalog instead of in-memory catalog.

 

avatar
Explorer

Hello @RangaReddy , I run in Hortonwork, and hive table is orc format.

What you mean hive catalog or in-memory catalog?

avatar
Master Collaborator

Hi @mala_etl 

 

You can find the catalog information in the below link:

 

https://stackoverflow.com/questions/59894454/spark-and-hive-in-hadoop-3-difference-between-metastore...

 

Could you please confirm, the table is internal or external table in Hive and also verify the data in Hive.

avatar
Explorer

It is internal table. Data in hive is normal, it can select/update/delete from openquery in sql server and can query from dbeaver.

avatar
Master Collaborator
What is the HDP version. if it is HDP3.x then you need to use Hive
Warehouse Connector (HWC).