Member since
12-18-2016
3
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5265 | 12-18-2016 06:48 AM |
12-15-2017
09:50 PM
Airflow maintainer here. Let me list some of the great things of Airflow that set it apart. 1. Configuration as code. Airflow uses python for the definitions of DAGs (I.e. workflows). This gives you the full power and flexibility of a programming language with a wealth of modules. 2. DAGs are testable and versionable. As they are in code you can integrate your workflow definitions into your CI/CD pipeline. 3. Ease of setup, local development. While Airflow gives you horizontal and vertical scaleability it also allows your developers to test and run locally, all from a single pip install Apache-airflow. This greatly enhances productivity and reproducibility. 4. Real Data sucks Airflow knows that so we have features for retrying and SLAs 5. Changing history. After a year you find out that you need to put a task into a dag, but it needs to run ‘in the past’. Airflow allows you to do backfills giving you the opportunity to rewrite history. And guess what, you more often need it than you think. 6. Great debugability. There are logs for everything, but nicely tied to the unit of work they are doing. Scheduler logs, DAG parsing/professing logs, task logs. Being in python the hurdle is quite low to jump in and do a fix yourself if needed. 7. A wealth of connectors that allow you to run tasks on kubernetes, Docker, spark, hive, presto, Druid, etc etc. 8. A very active community. As to your question. There is no particular dependency between HDP and Airflow. If you make Ambari deploy the client libraries on your Airflow workers, it will work just fine.
... View more
12-15-2017
09:49 PM
1 Kudo
Airflow maintainer here. I know th is question is a bit dated, but it still turns up in the searches. Airflow and Nifi both have their strengths and weaknesses. Let me list some of the great things of Airflow that set it apart. 1. Configuration as code. Airflow uses python for the definitions of DAGs (I.e. workflows). This gives you the full power and flexibility of a programming language with a wealth of modules. 2. DAGs are testable and versionable. As they are in code you can integrate your workflow definitions into your CI/CD pipeline. 3. Ease of setup, local development. While Airflow gives you horizontal and vertical scaleability it also allows your developers to test and run locally, all from a single pip install Apache-airflow. This greatly enhances productivity and reproducibility. 4. Real Data sucks Airflow knows that so we have features for retrying and SLAs 5. Changing history. After a year you find out that you need to put a task into a dag, but it needs to run ‘in the past’. Airflow allows you to do backfills giving you the opportunity to rewrite history. And guess what, you more often need it than you think. 6. Great debugability. There are logs for everything, but nicely tied to the unit of work they are doing. Scheduler logs, DAG parsing/professing logs, task logs. Being in python the hurdle is quite low to jump in and do a fix yourself if needed. 7. A wealth of connectors that allow you to run tasks on kubernetes, Docker, spark, hive, presto, Druid, etc etc. 8. A very active community.
... View more
12-18-2016
06:48 AM
You need a really recent FreeIPA to support --maxlife=0 (https://git.fedorahosted.org/cgit/freeipa.git/commit/?id=d2cb9ed327ee4003598d5e45d80ab7918b89eeed). If you are on supported Redhat or CentOS then you probably you don't have it, unless you rolled your own. You can find out by checking the krbPasswordExpiration attribute of the user. It shouldn't be there. In that case you can try to set it (http://www.therebel.eu/2015/08/setting-password-expiry-in-ipa/) or update your password policy to a lifetime of say 10 years or so (dont go beyond 2038)
... View more