I have 2 scenarios and need advice if this is possible.
Running Spark applications (pyspark) on CDH6:
Our Current CDH6 is enterprise cluster which has combination of batch jobs follow by analytic queries.
According to the documentation, Cloudera CDH6 supports running Spark applications on a YARN cluster manager.
Is it a sustainable approach running Spark applications (distributed workloads) on CDH6 without any impact to cluster (workload manager will differentiate the batch loads, user queries and spark applications)
Running Python applications on CDH6:
Python applications can also be deployed on CDH6 however this will be executed on single thread. I am sure this may not be a good for long run however would like to check if there would be any impact to cluster. We would be limiting the YARN resources for these applications.