Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

As we add new nodes to the production, new firewall rules need to be added for using Sqoop. Is there an alternative solution.

avatar
Rising Star

In the current production environment we have 20 data nodes. We are using Sqoop to import data from Netezza into Hadoop. We opened firewall between the Netezza server and the 20 data nodes for Sqoop to work.

We are planning to add 40 new data nodes. For Sqoop functionality not to break, we need to open new firewall rules for all the new nodes. We are also getting requests to import data from other databases such as Teradata and Oracle into Hadoop. As we have firewalls in place, it is hard to maintain firewall rules between the databases and individual data nodes. Are there any alternative solutions to this problem, for example using a gateway node.

1 ACCEPTED SOLUTION

avatar
Contributor

I've never tried this approach, think of it as a science experiment.

Set up a node label, label the 20 existing hosts and create a queue that defaults to that node label, submit Sqoop jobs to that queue alone. Your Sqoop jobs will only run on the existing 20 nodes.

You could also go narrower and only have 1 host do the imports. Be careful because HDFS usage on that node will become much higher if you don't balance.

View solution in original post

2 REPLIES 2

avatar
Contributor

I've never tried this approach, think of it as a science experiment.

Set up a node label, label the 20 existing hosts and create a queue that defaults to that node label, submit Sqoop jobs to that queue alone. Your Sqoop jobs will only run on the existing 20 nodes.

You could also go narrower and only have 1 host do the imports. Be careful because HDFS usage on that node will become much higher if you don't balance.

avatar
New Contributor

In order to simplify the firewall rules i would create one edge host to use as a gateway using ssh-tunnels, iptables or another network type software package to forward the requests using that hosts ip only. You can also approach your network team and get a NAT assigned to your hosts so they all appear to be the same IP when making outgoing requests.