While working with spark 2.3, Hive and yarn (HDP 3.1), Every job works fine and competes gracefully. But Overall spark job takes the same time as it takes time with larger data even if we have very small data.
For example: scheduling job on yarn takes some time either we have large data or small data. So, simple spark queries on small data also takes time and finish in 45 secs to 1+ mins (I guess, which includes yarn's scheduling and resource management time) and databases takes only few seconds to run same query.
Can we reduce the time with spark if we are using HDP 3.1 ---- 6 machines cluster. OR do we have any another mode to run spark with small data available in Hive in less time at-least for testing only.