Support Questions

Find answers, ask questions, and share your expertise

Spark cannot read Hive managed table

avatar
Rising Star

Hi all, I am practicing spark.

 

When using pyspark to query table in Hive, I can retrieve the data from an external table but query a internal table.

 

Here is the error message:

>>> spark.read.table("exams").count()
23/03/30 22:28:50 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = eb0a9583-da34-4c85-9a1b-db790d126fb1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/readwriter.py", line 301, in table
    return self._df(self._jreader.table(tableName))
  File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
  File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'\nSpark has no access to table `default`.`exams`. Clients can access this table only ifMANAGEDINSERTWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE.\nThis table may be a Hive-managed ACID table, or require some other capability that Spark\ncurrently does not implement;'

 

I know that spark cannot read a ACID Hive table. it there any work around? 

 

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi,

 

Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?

 

Regards,

Chethan YM

View solution in original post

6 REPLIES 6

avatar
Master Collaborator

Hi,

 

Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?

 

Regards,

Chethan YM

avatar
Rising Star

@ChethanYM Thank you for your reply. 

 

I tried your suggestion by recreating the spark session

>>> conf = spark.sparkContext._conf.setAll([('spark.sql.htl.check','false'), ('mapreduce.input.fileinputformat.input.dir.recursive','true')])
>>> spark.sparkContext.stop()
>>> spark = SparkSession.builder.config(conf=conf).getOrCreate()

 

It works fine. Thank you very much.

avatar
New Contributor

This solution worked for eliminating error , but data is not being fetched from table. 

empty data frame showing.

avatar
Rising Star

@ChethanYM Could you please explain further why spark can read Hive managed table by pass this parameter? Thank you very much.

avatar
Explorer

Hello ChethanYM,

Could you please provide some link where I can find documentation about this conf (spark.sql.htl.check=false) ?

I could not find anythin in https://spark.apache.org/doc 

Regards,

Guilherme C P

 
 

 

 

avatar
Super Collaborator

You need to use Hive Warehouse Connector (HWC) to query Hive managed tables from Spark.

Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/integrating-hive-and-bi/topics/hive_hivewareh...