Member since
05-03-2016
15
Posts
4
Kudos Received
0
Solutions
04-07-2020
10:53 AM
Hi Team, I have upgraded to spark 2.2.1 but spark.sql.codegen.wholeStage=false doesn't give any improvement in performance
... View more
12-15-2017
09:49 PM
1 Kudo
Airflow maintainer here. I know th is question is a bit dated, but it still turns up in the searches. Airflow and Nifi both have their strengths and weaknesses. Let me list some of the great things of Airflow that set it apart. 1. Configuration as code. Airflow uses python for the definitions of DAGs (I.e. workflows). This gives you the full power and flexibility of a programming language with a wealth of modules. 2. DAGs are testable and versionable. As they are in code you can integrate your workflow definitions into your CI/CD pipeline. 3. Ease of setup, local development. While Airflow gives you horizontal and vertical scaleability it also allows your developers to test and run locally, all from a single pip install Apache-airflow. This greatly enhances productivity and reproducibility. 4. Real Data sucks Airflow knows that so we have features for retrying and SLAs 5. Changing history. After a year you find out that you need to put a task into a dag, but it needs to run ‘in the past’. Airflow allows you to do backfills giving you the opportunity to rewrite history. And guess what, you more often need it than you think. 6. Great debugability. There are logs for everything, but nicely tied to the unit of work they are doing. Scheduler logs, DAG parsing/professing logs, task logs. Being in python the hurdle is quite low to jump in and do a fix yourself if needed. 7. A wealth of connectors that allow you to run tasks on kubernetes, Docker, spark, hive, presto, Druid, etc etc. 8. A very active community.
... View more
07-07-2016
07:23 PM
Yup. The "EXPORT ... FOR REPLICATION" command was added only in 1.2.0+ , and this is used in the source cluster. IMPORT semantics changing to allow for "import-only-if-newer" which is used to apply updates to a table in the destination cluster, which is used by HiveDR was also added only in 1.2.0+. Thus, you will need 1.2.0+ on both clusters.
... View more