Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why use syncsort instead of nify?

Why use syncsort instead of nify?

New Contributor

I recently read this news announcement http://hortonworks.com/partner/syncsort/ and am wondering why sync sort is proposed instead of nifi.

4 REPLIES 4

Re: Why use syncsort instead of nify?

Super Guru
@Georg Heiler

Syncsort is for ETL in Hadoop and moving bulk data to/from Hadoop to large databases as well as support for Mainframes. Nifi is a data flow tool, helping acquire data at the source where it's generated and move data across the enterprise as well as support to move data between cloud and on-prem systems (Notice I didn't say Hadoop clusters - Nifi is agnostic of data destination). However, it is not an ETL tool. You can do some massaging of the data using over 170 plus processors as data is flowing, but not heavy ETL (think COALESCE, dropping, concatenating columns on hundreds of millions or billions of rows and other conditions like "CASE WHEN" in SELECT). Syncsort fits that space really well.

Re: Why use syncsort instead of nify?

@Georg Heiler

Syncsort also provides a high performance bulk data import utility called Data Funnel. Data Funnel will run parallel import jobs which can significantly accelerate large data loads. Bulk data loads is not a good use case for Nifi.

Re: Why use syncsort instead of nify?

New Contributor

So what would sync sort offer over plain spark sql then?

Re: Why use syncsort instead of nify?

Super Guru

@Georg Heiler

SparkSQL is a simple way to write SQL within your Spark programs and access hive tables. You are not going to use SparkSQL for heavy ETL processing in Hadoop (you could use SparkSQL if you are writing Spark programs for ETL but not Spark SQL on its own). On the other hand, SyncSort is an ETL tool. When writing Spark or Map Reduce programs or even Hive seems to complex or when offloading data from multiple source systems especially when one such source system is main frame, then many companies will choose to use an off the shelf ETL tool like Syncsort to save development time and easier maintenance.

Don't have an account?
Coming from Hortonworks? Activate your account here