Support Questions

Find answers, ask questions, and share your expertise

Sqoop running on closed cluster (only edge server has access)

avatar
Explorer

Hi all,

I have a small cluster (10 machines now). One edge server that has two network cards (one on the internal network 142.39.41.*, the other sees the cluster at 10.1.1.*), a management server and 8 data nodes all on the 10.1.1.* network.

Sqoop is on the edge server but when I try to import a single table from a sql server database (on 142.39.41.*)

sqoop import \
  --connect 'jdbc:sqlserver://dbserver;DatabaseName=MyDB;user=XXXXXXXXX;password=XXXXXXX;port=1433' \
  --table=dbo.Asset \
  --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
  -m 1

I get :

Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host devsql94, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".

When i try to list the tables in the database using Sqoop

sqoop list-tables --connect 'jdbc:sqlserver://dbserver;DatabaseName=MyDB;user=XXXXXXXXX;password=XXXXXXX;port=1433' --driver   com.microsoft.sqlserver.jdbc.SQLServerDriver

it works fine and list all my tables so the JDBC access from the edge server to the database works.

Wrapping my head around the problem I started to think that Sqoop might be sending the jobs to another node (?) to handle the database reading. But which node ?

So I tried to port forwarding the edge localport 1433 to the sql server database port 1433 using "nc" as per this site but it didn't work either.

Can anyone figure this one out ? Is my architecture setup wrong by allowing only the edge server to see the corporate network ?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi. @Marc Mazerolle. sqoop runs by executing the jdbc on in a mapper on the data nodes, not on the edge node. That's what makes it so fast - multiple hosts pulling data. That means the data nodes need to talk to your sql server on 1433. I'm not sure about your port forwarding workaround, but I assume you understand that using netcat (nc) to do port forwarding is not a good solution for anything but proof of concept for multiple reasons. And, with netcat, I think it would only work for a single mapper since netcat can only handle one connection at a time, so if you had multiple mappers, they could't get through netcat. I think you may need to open the port between your data nodes and the sql server.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hi. @Marc Mazerolle. sqoop runs by executing the jdbc on in a mapper on the data nodes, not on the edge node. That's what makes it so fast - multiple hosts pulling data. That means the data nodes need to talk to your sql server on 1433. I'm not sure about your port forwarding workaround, but I assume you understand that using netcat (nc) to do port forwarding is not a good solution for anything but proof of concept for multiple reasons. And, with netcat, I think it would only work for a single mapper since netcat can only handle one connection at a time, so if you had multiple mappers, they could't get through netcat. I think you may need to open the port between your data nodes and the sql server.

avatar
Explorer

Or bridge the edge server.