Member since
09-20-2022
4
Posts
0
Kudos Received
0
Solutions
10-02-2023
06:29 AM
Hi again @RangaReddy , I'm sorry for the huge delay in reply, unfortunately this triggered a lengthy discussion between us and the AD team. In the end we managed to get our hands on a keytab file, and we confirmed it works fine by manually submitting the below command: kinit -k -t /path/to/keytab/file.keytab username Unfortunately when we attempt to pass this with a bash operator from an Airflow DAG we get the same error: py4j.protocol.Py4JJavaError: An error occurred while calling o32.csv. : org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] Thank you, Mario
... View more
08-31-2023
07:16 AM
Hi @RangaReddy , Is there a way to generate the file myself or do I need to contact our Active Directory administrators for that? Thank you
... View more
08-30-2023
05:25 AM
Hi everyone, So I've inherited a kerberized Cloudera cluster and I'm learning as I go. Right now I'm trying to get Airflow to work with our Spark jobs but without success. As I understand Airflow was installed by our OS team only after the cluster was configured by Cloudera. It runs on our edge node from where we run our jobs. Basically I'm using bash operators for my test DAG with the following tasks: Task 1: Kinit the user that is running the script: "echo 'password' | kinit user@domain" Task 2: Download some files from some location. Task 3: spark-submit /path/to/script.py Task 1 and 2 work fine, but task 3 fails with the following: py4j.protocol.Py4JJavaError: An error occurred while calling o32.csv. : org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] I am a bit confused by this as I am am authenticating the user as a first step. This exact workflow executes just fine when I run it manually in the CL. Has anyone dealt with a similar issue? Any input would be appreciated as we need to transition to using Airflow. Many thanks, Mario
... View more
09-30-2022
12:46 AM
Hi, We're experiencing a peculiar issue these past few days - our python scripts started failing when run with spark-submit. On first glance our logs show that the scripts are encountering syntax errors whenever we are using code related to modules, but further troubleshooting showed that in actuality the modules are the issue. When we comment out pieces of code that throw syntax errors, we instead receive import errors ("No module named..."). We have two versions of python on our cluster but it appears that spark-submit still is using the proper python version with all our modules installed on it. Our scripts run just fine through pyspark, for some reason however, spark-submit does not recognize the imported modules when we run scripts through it. What is more, YARN doesn't seem to recognize these jobs as failed, they are not logged at all, probably because they crash as soon as we start importing modules. So basically we do not have access to YARN logs for these jobs. Any insight would be greatly appreciated. Thanks.
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN