Support Questions

BrianChan · ‎03-30-2023

Hi all, I am practicing spark.

When using pyspark to query table in Hive, I can retrieve the data from an external table but query a internal table.

Here is the error message:

>>> spark.read.table("exams").count()
23/03/30 22:28:50 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = eb0a9583-da34-4c85-9a1b-db790d126fb1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/readwriter.py", line 301, in table
    return self._df(self._jreader.table(tableName))
  File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
  File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'\nSpark has no access to table `default`.`exams`. Clients can access this table only ifMANAGEDINSERTWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE.\nThis table may be a Hive-managed ACID table, or require some other capability that Spark\ncurrently does not implement;'

I know that spark cannot read a ACID Hive table. it there any work around?

Thanks in advance.

ChethanYM · ‎03-31-2023

Hi,

Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?

Regards,

Chethan YM

View solution in original post

ChethanYM · ‎03-31-2023

Hi,

Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?

Regards,

Chethan YM

BrianChan · ‎04-02-2023

@ChethanYM Thank you for your reply.

I tried your suggestion by recreating the spark session

>>> conf = spark.sparkContext._conf.setAll([('spark.sql.htl.check','false'), ('mapreduce.input.fileinputformat.input.dir.recursive','true')])
>>> spark.sparkContext.stop()
>>> spark = SparkSession.builder.config(conf=conf).getOrCreate()

It works fine. Thank you very much.

kartheekb · ‎09-18-2024

This solution worked for eliminating error , but data is not being fetched from table.

empty data frame showing.

BrianChan · ‎04-02-2023

@ChethanYM Could you please explain further why spark can read Hive managed table by pass this parameter? Thank you very much.

ChethanYM · ‎01-31-2024

Hello ChethanYM,

Could you please provide some link where I can find documentation about this conf (spark.sql.htl.check=false) ?

I could not find anythin in https://spark.apache.org/doc

Regards,

Guilherme C P

ggangadharan · ‎08-29-2024

You need to use Hive Warehouse Connector (HWC) to query Hive managed tables from Spark.

Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/integrating-hive-and-bi/topics/hive_hivewareh...

Cloudera Community

Support Questions

Spark cannot read Hive managed table