Is it the best practice to install Spark/Pig clients manually on HDF cluster. Scenario is we want to execute the Spark job on HDF cluster, can we build any custom processors for this instead of installing clients?
We are already having client on HDP cluster, now is it safe or best practice to install on HDF cluster again?
When I'm trying to install spark client using hdp.repo it's expecting remove hdf-select rpm but if we remove that its trying remove dependencies like Ranger, Nifi on rpm on that machine. So according to observation I can see only one came installed either hdp-select or hdf-select, but not the both.
Any help is highly appreciated and thanks in advance.
Are you planning to use execute script processor from your nifi node? In that case, spark client libraries need to be installed on your nifi node from where you will execute spark-submit.
Thanks for the quick response,
Is there a way we can call the spark client running on remote edge node from NIFI without installing the spark client libraries on NIFI ? Both HDP and HDF are seperate clusters and managed by different Ambari. If client libraries needs to be installed then it would be manual install.Please suggest an alternative solution.
What you can do is, create 2 shell scripts, 1 in nifi and 1 in remote spark client, make sure nifi and remote spark client server is passwordless. using executestream processor, trigger the shell script on nifi.