Created on 07-20-2022 12:33 AM - last edited on 07-20-2022 01:48 AM by VidyaSargur
Hello all,
I cannot read data from hive orc table and load to dataframe. If someone know, could you help me to fix it? Below is my scripts:
from pyspark import SparkContext, SparkConf
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import HiveContext,SQLContext
spark = SparkSession.builder.appName("Testing....").enableHiveSupport().getOrCreate()
hive_context = HiveContext(spark)
sqlContext = SQLContext(spark)
df_pgw=hive_context.sql("select * from orc_table")
Hive Session ID = 79c9e6c0-1649-41dc-9aea-493c0f62d046
22/07/20 11:50:52 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
22/07/20 11:50:56 WARN HiveMetastoreCatalog: Unable to infer schema for table orc_table from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
df_pgw.show()
=> ....Don't have data presents
Thanks,
Created 08-30-2022 04:33 AM
Hi @mala_etl
I think you didn't mention you are running the application in CDH/HDP/CDP. Could you please share your hive script and check you are using hive catalog instead of in-memory catalog.
Created 08-30-2022 06:45 PM
Hello @RangaReddy , I run in Hortonwork, and hive table is orc format.
What you mean hive catalog or in-memory catalog?
Created 08-30-2022 06:49 PM
Hi @mala_etl
You can find the catalog information in the below link:
Could you please confirm, the table is internal or external table in Hive and also verify the data in Hive.
Created 08-30-2022 10:05 PM
It is internal table. Data in hive is normal, it can select/update/delete from openquery in sql server and can query from dbeaver.
Created 08-30-2022 11:18 PM