Created 08-30-2023 05:25 AM
Hi everyone,
So I've inherited a kerberized Cloudera cluster and I'm learning as I go. Right now I'm trying to get Airflow to work with our Spark jobs but without success. As I understand Airflow was installed by our OS team only after the cluster was configured by Cloudera. It runs on our edge node from where we run our jobs.
Basically I'm using bash operators for my test DAG with the following tasks:
Task 1:
Kinit the user that is running the script:
"echo 'password' | kinit user@domain"
Task 2:
Download some files from some location.
Task 3:
spark-submit /path/to/script.py
Task 1 and 2 work fine, but task 3 fails with the following:
Created 10-02-2023 10:58 PM
There are two solutions you can try.
1. Create one more shell operator and perform kinit and after that submit your spark
2. Pass the keytab and principal to the spark-submit
Created 08-31-2023 01:20 AM
Hi @imule
In step3, could you please pass --keytab <key_tab_path> --principal <principal_name> to the spark-submit command.
Note: In CDP, Airflow integration is not yet we are supported.
Created 08-31-2023 07:16 AM
Hi @RangaReddy ,
Is there a way to generate the file myself or do I need to contact our Active Directory administrators for that?
Thank you
Created 09-03-2023 09:53 PM
Hi @imule
You can follow the following steps to generate the keytab and if you don't have permission, please check with your admin team.
https://docs.cloudera.com/data-hub/cloud/access-clusters/topics/dh-retrieving-keytabs.html
Created 09-20-2023 07:02 AM
@imule, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Created 10-02-2023 06:29 AM
Hi again @RangaReddy ,
I'm sorry for the huge delay in reply, unfortunately this triggered a lengthy discussion between us and the AD team.
In the end we managed to get our hands on a keytab file, and we confirmed it works fine by manually submitting the below command:
Created 10-02-2023 10:58 PM
There are two solutions you can try.
1. Create one more shell operator and perform kinit and after that submit your spark
2. Pass the keytab and principal to the spark-submit