question Re: Spark cannot read Hive managed table in Support Questions

Spark cannot read Hive managed table

BrianChan — Fri, 31 Mar 2023 05:50:27 GMT

Hi all, I am practicing spark.

When using pyspark to query table in Hive, I can retrieve the data from an external table but query a internal table.

Here is the error message:

>>> spark.read.table("exams").count() 23/03/30 22:28:50 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist Hive Session ID = eb0a9583-da34-4c85-9a1b-db790d126fb1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/readwriter.py", line 301, in table return self._df(self._jreader.table(tableName)) File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u'\nSpark has no access to table `default`.`exams`. Clients can access this table only ifMANAGEDINSERTWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE.\nThis table may be a Hive-managed ACID table, or require some other capability that Spark\ncurrently does not implement;'

I know that spark cannot read a ACID Hive table. it there any work around?

Thanks in advance.

Re: Spark cannot read Hive managed table

ChethanYM — Fri, 31 Mar 2023 09:43:02 GMT

Hi,

Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?

Regards,

Chethan YM

Re: Spark cannot read Hive managed table

BrianChan — Mon, 03 Apr 2023 02:40:33 GMT

@ChethanYM Thank you for your reply.

I tried your suggestion by recreating the spark session

>>> conf = spark.sparkContext._conf.setAll([('spark.sql.htl.check','false'), ('mapreduce.input.fileinputformat.input.dir.recursive','true')]) >>> spark.sparkContext.stop() >>> spark = SparkSession.builder.config(conf=conf).getOrCreate()

It works fine. Thank you very much.

Re: Spark cannot read Hive managed table

BrianChan — Mon, 03 Apr 2023 03:11:59 GMT

@ChethanYM Could you please explain further why spark can read Hive managed table by pass this parameter? Thank you very much.

Re: Spark cannot read Hive managed table

cardozogp — Wed, 31 Jan 2024 14:16:07 GMT

Hello ChethanYM,

Could you please provide some link where I can find documentation about this conf (spark.sql.htl.check=false) ?

I could not find anythin in https://spark.apache.org/doc

Regards,

Guilherme C P

Re: Spark cannot read Hive managed table

ggangadharan — Fri, 30 Aug 2024 06:09:58 GMT

You need to use Hive Warehouse Connector (HWC) to query Hive managed tables from Spark.

Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html

Re: Spark cannot read Hive managed table

kartheekb — Thu, 19 Sep 2024 04:19:36 GMT

This solution worked for eliminating error , but data is not being fetched from table.

empty data frame showing.