Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Central management of connection strings - Sqoop - HDP

avatar
Explorer

Hello everyone!

 

Context: data ingestion made with Sqoop in HDP Cluster.

 

I would like to know  if there is or what is the best way to manage connection strings used by sqoop to connect to databases. My case is as follows:
I have to do many data ingestions from rdbms to the Datalake of my company. We have many. With certain frequence the hosts of these databases changes to a new host. So when it hapens I have to change the jdbc string connection in my .properties file (that is passed to Sqoop) and make a new deploy of it to keep my ingestion working.
I would like to have these jdbc strings externalized to something in the responsability of our operation team. I would like to have theses strings as an alias or something else that I could only refer and have the operation team taking care of the right host/vip to point to when sqoop opens a connection.
Maybe a a set of entrys in a configuration file could be done in Ambari to store theses jdbc strings?

 

1 REPLY 1

avatar
Cloudera Employee

As far as I know this is not something that Ambari or SQOOP allow for.

What you could do to achieve your goal is one of the two:

  • Prepare sh scripts and refer to your jdbc string as a variable
  • Prepare an Oozie Worklfow and pass the jdbc string as a variable

At that point you might have an external tool (e.g. Jenkins) maintaining a list of jdbc strings and taking the responsibility to specify the desidred one.

In solution 1, Jenkins should SSH to the node, set the variable to the JDBC string, launch the sh.

In solution 2, Jenkins should use Oozie API to start the workflow while specifying the desired variable value.

 

Solution 2 is much better than 1, since it relies on a distributed, highly available service (Oozie).

 

Regards