Created 03-30-2023 10:50 PM
Hi all, I am practicing spark.
When using pyspark to query table in Hive, I can retrieve the data from an external table but query a internal table.
Here is the error message:
>>> spark.read.table("exams").count()
23/03/30 22:28:50 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = eb0a9583-da34-4c85-9a1b-db790d126fb1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/readwriter.py", line 301, in table
return self._df(self._jreader.table(tableName))
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
File "/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.30990532/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'\nSpark has no access to table `default`.`exams`. Clients can access this table only ifMANAGEDINSERTWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE.\nThis table may be a Hive-managed ACID table, or require some other capability that Spark\ncurrently does not implement;'
I know that spark cannot read a ACID Hive table. it there any work around?
Thanks in advance.
Created 03-31-2023 02:43 AM
Hi,
Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?
Regards,
Chethan YM
Created 03-31-2023 02:43 AM
Hi,
Can you use "spark.sql.htl.check=false" parameter in your spark job and give a try?
Regards,
Chethan YM
Created 04-02-2023 07:40 PM
@ChethanYM Thank you for your reply.
I tried your suggestion by recreating the spark session
>>> conf = spark.sparkContext._conf.setAll([('spark.sql.htl.check','false'), ('mapreduce.input.fileinputformat.input.dir.recursive','true')])
>>> spark.sparkContext.stop()
>>> spark = SparkSession.builder.config(conf=conf).getOrCreate()
It works fine. Thank you very much.
Created 09-18-2024 09:19 PM
This solution worked for eliminating error , but data is not being fetched from table.
empty data frame showing.
Created 04-02-2023 08:11 PM
@ChethanYM Could you please explain further why spark can read Hive managed table by pass this parameter? Thank you very much.
Created 01-31-2024 06:16 AM
Hello ChethanYM,
Could you please provide some link where I can find documentation about this conf (spark.sql.htl.check=false) ?
I could not find anythin in https://spark.apache.org/doc
Regards,
Guilherme C P
Created 08-29-2024 11:09 PM
You need to use Hive Warehouse Connector (HWC) to query Hive managed tables from Spark.
Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/integrating-hive-and-bi/topics/hive_hivewareh...