About RangaReddy

jaris · ‎04-16-2025

Hi @RangaReddy , is something similar applicable also for structured streaming? Thanks.

jaris · ‎03-17-2025

Hi @haridjh Thanks for reply. Procedure you described in your reply is utilizing HDFS as a store for JAR files used by Spark job. We don't have problem to utilize HDFS in Spark job, problem is when trying to access Ozone FS, e.g. (ofs) when job is submitted via LIVY. 1. Access files on Ozone in spark job e.g.: df = spark.read.parquet("ofs://ozone-service/volume/bucket/parquet") 2. Python job submitted via Livy: kinit user curl --negotiate -k -v -u : -X POST \ -H "Content-Type: application/json" \ --data '{ "file": "ozone_access.py"}' \ https://livy:28998/batches 3. Job is failing with: Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] When we are trying to access Ozone normally via spark-shell or spark-submit, everything works fine, e.g.: spark-shell \ --keytab ${KEY_TAB} \ --principal ${PRINCIPAL} \ --conf spark.yarn.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 Setting keytab and principal is not possible when submitting job via Livy, because we are using proxy users with Livy. Thanks.

ekeid · ‎12-07-2024

iam using below versions: spark 2.4.8 Python 3.6.8 and got the above error when only run spark submit from nifi or oozie, but it works fine when run it using shell, is there solution or configuration i missed. from pyspark.sql import SparkSession File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module> File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code TypeError: an integer is required (got type bytes)

sde_20241 · ‎10-25-2024

We’re attempting to run a basic Spark job to read/write data from Solr, using the following versions: CDP version: 7.1.9 Spark: Spark3 Solr: 8.11 Spark-Solr Connector: opt/cloudera/parcels/SPARK3/lib/spark3/spark-solr/spark-solr-3.9.3000.3.3.7191000.0-78-shaded.jar When we attempt to interact with Solr through Spark, the execution stalls indefinitely without any errors or results(similar to the issue which @hadoopranger mentioned). Other components, such as Hive and HBase, integrate smoothly with Spark, and we are using a valid Kerberos ticket that successfully connects with other Hadoop components. Additionally, testing REST API calls via both curl and Python’s requests library confirms we can access Solr and retrieve data using the Kerberos ticket. The issue seems isolated to Solr’s connection with Spark, as we have had no problems with other systems. Has anyone encountered a similar issue or have suggestions for potential solutions? @RangaReddy @hadoopranger

Bartlomiej · ‎10-08-2024

Yes , upgrading spark to newest SPARK version SPARK3-3.3.2.3.3.7190.5-2-1.p0.54391297 - fixed the issue

VidyaSargur · ‎10-01-2024

@ayukus0705, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.

zzeng · ‎09-10-2024

In CDP Public Cloud CDW Impala, you can only use HTTP+SSL to access, So you have to Edit the config file to specify ODBC Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc.ini [Driver] AllowHostNameCNMismatch = 0 CheckCertRevocation = 0 TransportMode = http AuthMech=3 https://community.cloudera.com/t5/Community-Articles/How-to-Connect-to-CDW-Impala-VW-Using-the-Power-BI-Desktop/ta-p/393013#toc-hId-1805728480

Gopinath · ‎08-18-2024

Pyspark 3.5.2 - python >= 3.8 and <=3.11 ref: https://pypi.org/project/pyspark/3.5.2/

GaneshLad · ‎08-07-2024

Thank you...

nagababu · ‎07-15-2024

Hi @RangaReddy , Exception stack trace: Currently we are running our spark jobs on yarn using same code and we never get his issue. Could it be caused by lack of memory. 2. We didn't hard code the clientmode any where. I was working fine in yarn not with Kubernetes. 3. we have tried by providing the following but it didn't work. And we also downloaded these jars and placed in the jars folder. But no Luck. --packages org.apache.hadoop:hadoop-aws:3.3.4 \ --packages com.amazonaws:aws-java-sdk-bundle:1.12.262 \ --packages org.apache.spark:spark-hadoop-cloud_2.12:3.5.1 \ --packages org.apache.hadoop:hadoop-client-api:3.3.4 \ --packages org.apache.hadoop:hadoop-client-runtime:3.3.4 \

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	68

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: Spark Streaming Graceful Shutdown - Part2

Re: Spark Ozone Integration in CDP

Re: Spark Python Integration Test Result Exception...

Re: How to integrate Apache Spark with Solr Framew...

Re: How to set default time zone/local time for Sp...

Re: How to read hexadecimal escape sequences from ...

Re: Load Iceberg Table on PowerBI Desktop

Re: Spark Python Supportability Matrix

Re: Spark Memory Management

Re: Spark submit configuration --deploy-mode has b...