Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

As we add new nodes to the production, new firewall rules need to be added for using Sqoop. Is there an alternative solution.

Solved Go to solution

As we add new nodes to the production, new firewall rules need to be added for using Sqoop. Is there an alternative solution.

Contributor

In the current production environment we have 20 data nodes. We are using Sqoop to import data from Netezza into Hadoop. We opened firewall between the Netezza server and the 20 data nodes for Sqoop to work.

We are planning to add 40 new data nodes. For Sqoop functionality not to break, we need to open new firewall rules for all the new nodes. We are also getting requests to import data from other databases such as Teradata and Oracle into Hadoop. As we have firewalls in place, it is hard to maintain firewall rules between the databases and individual data nodes. Are there any alternative solutions to this problem, for example using a gateway node.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: As we add new nodes to the production, new firewall rules need to be added for using Sqoop. Is there an alternative solution.

New Contributor

I've never tried this approach, think of it as a science experiment.

Set up a node label, label the 20 existing hosts and create a queue that defaults to that node label, submit Sqoop jobs to that queue alone. Your Sqoop jobs will only run on the existing 20 nodes.

You could also go narrower and only have 1 host do the imports. Be careful because HDFS usage on that node will become much higher if you don't balance.

2 REPLIES 2
Highlighted

Re: As we add new nodes to the production, new firewall rules need to be added for using Sqoop. Is there an alternative solution.

New Contributor

I've never tried this approach, think of it as a science experiment.

Set up a node label, label the 20 existing hosts and create a queue that defaults to that node label, submit Sqoop jobs to that queue alone. Your Sqoop jobs will only run on the existing 20 nodes.

You could also go narrower and only have 1 host do the imports. Be careful because HDFS usage on that node will become much higher if you don't balance.

Re: As we add new nodes to the production, new firewall rules need to be added for using Sqoop. Is there an alternative solution.

New Contributor

In order to simplify the firewall rules i would create one edge host to use as a gateway using ssh-tunnels, iptables or another network type software package to forward the requests using that hosts ip only. You can also approach your network team and get a NAT assigned to your hosts so they all appear to be the same IP when making outgoing requests.