Created on 11-28-2019 12:03 AM - last edited on 11-28-2019 12:38 AM by VidyaSargur
Hello All,
We have applications in our data organization like Oracle Data Integrator and Jupyter Hub.
Those set of applications require up to date jars, core-site.xml, hdfs-site xml etc to integrate properly with the cluster.
Our Application Architects suggests we should install those applications' clients and/or agents on edge nodes of our cluster.
I don't want to do this because:
*I might need to deal with conflicting environment requirements.
*I will have limited control over server resource consumption
*Application level operations such as upgrades might have adverse affects on the cluster
I am considering to offer mounting jars/conf directories of edge nodes on read only mode to application servers.
Any suggestions and evaluations about both architectures or alternatives are more than welcome.
Best regards
Created 11-28-2019 12:47 AM
I agree with you and definately application/third party tools/components must be installed outside cluster or on individual new node to avoid major performance impacts.
Regarding on how to manage the components if the hadoop version changes is pretty kind of devops question i feel.
You always need to make some inventory of applications running along with your ecosystem components and their dependencies.
Nearby you can use Nexus as centralized repository to fetch new versions which needs to be deployed on your application side[ie. Oracle Data Integrator and Jupyter Hub] with help of jenkins/some deployment tool.
As per my experience i see resource related problems in case you think of installing application on edge nodes. So i will suggest that is not a good idea.
Do revert if you have further points to highlight.
Created 11-28-2019 12:47 AM
I agree with you and definately application/third party tools/components must be installed outside cluster or on individual new node to avoid major performance impacts.
Regarding on how to manage the components if the hadoop version changes is pretty kind of devops question i feel.
You always need to make some inventory of applications running along with your ecosystem components and their dependencies.
Nearby you can use Nexus as centralized repository to fetch new versions which needs to be deployed on your application side[ie. Oracle Data Integrator and Jupyter Hub] with help of jenkins/some deployment tool.
As per my experience i see resource related problems in case you think of installing application on edge nodes. So i will suggest that is not a good idea.
Do revert if you have further points to highlight.
Created 11-29-2019 12:17 AM
First of all thanks for the reply,
We are in the very beginning of our journey and most of our workload right now is architectural decisions as in my question.
We were also considering building a DevOps pipeline, Nexus as you mentioned is being considered as well as GitLab with or without Jenkins.
But it has not occurred to me that we can cover this problem with an DevOps pipeline setup, before you mentioned.
Your suggestion has become more than helpful for me.
Best regards
Created 11-29-2019 01:38 AM
Than you for the response and appreciation.
I will be happy to contribute and share my experiences gong further. Thank you for accepting the answer.