- last edited on
We need to schedule an import of 200 tables from Teradata into Hive/HDFS. Each table import can be done in parallel, thus I would like to know which of the following approach is better:
1) Single workflow with one fork/join, launching in parallel all the imports.
2) Single workflow with several fork/join pairs in sequence, splitting the tables import in batch of 10 tables (which number is good? how can I decide?) per fork/join.
3) Create a workflow for each table and launch it from a coordinator.
Which should I chose? Are there better alternatives?