Member since
06-02-2020
331
Posts
66
Kudos Received
49
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2128 | 07-11-2024 01:55 AM | |
6091 | 07-09-2024 11:18 PM | |
4839 | 07-09-2024 04:26 AM | |
4415 | 07-09-2024 03:38 AM | |
4357 | 06-05-2024 02:03 AM |
04-16-2025
05:20 AM
Hi @RangaReddy , is something similar applicable also for structured streaming? Thanks.
... View more
03-17-2025
03:46 AM
Hi @haridjh Thanks for reply. Procedure you described in your reply is utilizing HDFS as a store for JAR files used by Spark job. We don't have problem to utilize HDFS in Spark job, problem is when trying to access Ozone FS, e.g. (ofs) when job is submitted via LIVY. 1. Access files on Ozone in spark job e.g.: df = spark.read.parquet("ofs://ozone-service/volume/bucket/parquet") 2. Python job submitted via Livy: kinit user curl --negotiate -k -v -u : -X POST \ -H "Content-Type: application/json" \ --data '{ "file": "ozone_access.py"}' \ https://livy:28998/batches 3. Job is failing with: Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] When we are trying to access Ozone normally via spark-shell or spark-submit, everything works fine, e.g.: spark-shell \
--keytab ${KEY_TAB} \
--principal ${PRINCIPAL} \
--conf spark.yarn.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 Setting keytab and principal is not possible when submitting job via Livy, because we are using proxy users with Livy. Thanks.
... View more
12-07-2024
10:17 PM
iam using below versions: spark 2.4.8 Python 3.6.8 and got the above error when only run spark submit from nifi or oozie, but it works fine when run it using shell, is there solution or configuration i missed. from pyspark.sql import SparkSession File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module> File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code TypeError: an integer is required (got type bytes)
... View more
10-25-2024
06:57 AM
We’re attempting to run a basic Spark job to read/write data from Solr, using the following versions: CDP version: 7.1.9 Spark: Spark3 Solr: 8.11 Spark-Solr Connector: opt/cloudera/parcels/SPARK3/lib/spark3/spark-solr/spark-solr-3.9.3000.3.3.7191000.0-78-shaded.jar When we attempt to interact with Solr through Spark, the execution stalls indefinitely without any errors or results(similar to the issue which @hadoopranger mentioned). Other components, such as Hive and HBase, integrate smoothly with Spark, and we are using a valid Kerberos ticket that successfully connects with other Hadoop components. Additionally, testing REST API calls via both curl and Python’s requests library confirms we can access Solr and retrieve data using the Kerberos ticket. The issue seems isolated to Solr’s connection with Spark, as we have had no problems with other systems. Has anyone encountered a similar issue or have suggestions for potential solutions? @RangaReddy @hadoopranger
... View more
10-08-2024
03:57 AM
Yes , upgrading spark to newest SPARK version SPARK3-3.3.2.3.3.7190.5-2-1.p0.54391297 - fixed the issue
... View more
10-01-2024
03:39 AM
1 Kudo
@ayukus0705, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
... View more
09-10-2024
05:34 PM
1 Kudo
In CDP Public Cloud CDW Impala, you can only use HTTP+SSL to access, So you have to Edit the config file to specify ODBC Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc.ini [Driver]
AllowHostNameCNMismatch = 0
CheckCertRevocation = 0
TransportMode = http
AuthMech=3 https://community.cloudera.com/t5/Community-Articles/How-to-Connect-to-CDW-Impala-VW-Using-the-Power-BI-Desktop/ta-p/393013#toc-hId-1805728480
... View more
08-18-2024
10:11 PM
Pyspark 3.5.2 - python >= 3.8 and <=3.11 ref: https://pypi.org/project/pyspark/3.5.2/
... View more
07-15-2024
08:20 PM
Hi @RangaReddy , Exception stack trace: Currently we are running our spark jobs on yarn using same code and we never get his issue. Could it be caused by lack of memory. 2. We didn't hard code the clientmode any where. I was working fine in yarn not with Kubernetes. 3. we have tried by providing the following but it didn't work. And we also downloaded these jars and placed in the jars folder. But no Luck. --packages org.apache.hadoop:hadoop-aws:3.3.4 \ --packages com.amazonaws:aws-java-sdk-bundle:1.12.262 \ --packages org.apache.spark:spark-hadoop-cloud_2.12:3.5.1 \ --packages org.apache.hadoop:hadoop-client-api:3.3.4 \ --packages org.apache.hadoop:hadoop-client-runtime:3.3.4 \
... View more