- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Issue running spark jobs with Airflow
Created 08-30-2023 05:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
So I've inherited a kerberized Cloudera cluster and I'm learning as I go. Right now I'm trying to get Airflow to work with our Spark jobs but without success. As I understand Airflow was installed by our OS team only after the cluster was configured by Cloudera. It runs on our edge node from where we run our jobs.
Basically I'm using bash operators for my test DAG with the following tasks:
Task 1:
Kinit the user that is running the script:
"echo 'password' | kinit user@domain"
Task 2:
Download some files from some location.
Task 3:
spark-submit /path/to/script.py
Task 1 and 2 work fine, but task 3 fails with the following:
Created 10-02-2023 10:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two solutions you can try.
1. Create one more shell operator and perform kinit and after that submit your spark
2. Pass the keytab and principal to the spark-submit
Created 08-31-2023 01:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @imule
In step3, could you please pass --keytab <key_tab_path> --principal <principal_name> to the spark-submit command.
Note: In CDP, Airflow integration is not yet we are supported.
Created 08-31-2023 07:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RangaReddy ,
Is there a way to generate the file myself or do I need to contact our Active Directory administrators for that?
Thank you
Created 09-03-2023 09:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @imule
You can follow the following steps to generate the keytab and if you don't have permission, please check with your admin team.
https://docs.cloudera.com/data-hub/cloud/access-clusters/topics/dh-retrieving-keytabs.html
Created 09-20-2023 07:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@imule, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 10-02-2023 06:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again @RangaReddy ,
I'm sorry for the huge delay in reply, unfortunately this triggered a lengthy discussion between us and the AD team.
In the end we managed to get our hands on a keytab file, and we confirmed it works fine by manually submitting the below command:
Unfortunately when we attempt to pass this with a bash operator from an Airflow DAG we get the same error:
Created 10-02-2023 10:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are two solutions you can try.
1. Create one more shell operator and perform kinit and after that submit your spark
2. Pass the keytab and principal to the spark-submit
