Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Alternate to recursively Running Spark-submit jobs

Alternate to recursively Running Spark-submit jobs

New Contributor

Hi,

Below is the scenario I would need suggestions on,

Scenario:

Data ingestion is done through Nifi into Hive tables.

Spark program would have to perform ETL operations and complex joins on the data in Hive.

Since the data ingested from Nifi is continuous streaming, I would like the Spark jobs to run every 1 or 2 mins on the ingested data.

Which is the best option to use?

  1. Trigger spark-submit jobs every 1 min using a scheduler?

    How do we reduce the over head and time lag in submitting the job recursively to the spark cluster? Is there a better way to run a single program recursively?

  2. Run a spark streaming job?

    Can spark-streaming job get triggered automatically every 1 min and process the data from hive?

Is there any other efficient mechanism to handle such scenario?

Thanks in Advance

Don't have an account?
Coming from Hortonworks? Activate your account here