Support Questions

Find answers, ask questions, and share your expertise

Can't connect to Impala through JDBC on Amazon EMR

avatar
Explorer

Hi all,

 
I'm trying to connect to Impala on a cluster setup through Amazon EMR, but it doesn't work. It's a three-node cluster, with Impala installed and working. I've done the following things:
 
  • Setup a SSH tunnel to the master node like this: ssh -ND 21050 hadoop@master-node-external-dns-hostname
  • Downloaded the correct JDBC drivers from here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/impala-jdbc.html
  • Tried to setup a connection using SquirrelSQL and SQLWorkbenchJ using the downloaded drivers and the following connection string: jdbc:hive2://localhost:21050/;auth=noSasl
  • Result: Could not establish connection to jdbc:hive2://localhost:21050/;auth=noSasl: null
  • I checked wether Impala works by running impala-shell on the master node. I can show tables, query, etc.
  • I checked wether the port is forwarded through the tunnel by telnetting to localhost 21050
  • I checked with beeline on the master node if it's possible at all to connect to Impala through JDBC on that port. Works just fine
Am I missing something? Can someone shine their light on this? 
 
Thanks!
1 ACCEPTED SOLUTION

avatar
New Contributor
Hi Daan,

I tried a different SSH tunnel and it worked for me:

ssh -L 12345:localhost:21050 your_user_name@your_node.compute.amazonaws.com

This opens up a port 12345 on your local machine and forwards it to port 21050 on the hadoop node.

More info here: http://marcelkrcah.net/blog/how-to-wire-pandas-to-impala/

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Daan,

 

I think you'd need to ask Amazon about this; it provides support for Impala on EMR.

avatar
Explorer
I asked it over at the Amazon EMR forums as well. No answers so far 😞 I
thought that maybe this was a general Impala JDBC issue that people have
seen before.

avatar
New Contributor
Hi Daan,

I tried a different SSH tunnel and it worked for me:

ssh -L 12345:localhost:21050 your_user_name@your_node.compute.amazonaws.com

This opens up a port 12345 on your local machine and forwards it to port 21050 on the hadoop node.

More info here: http://marcelkrcah.net/blog/how-to-wire-pandas-to-impala/

avatar
Explorer

Thanks Marcel! That seems to work indeed, at least with Tableau and Impyla. Apparently the instructions on the Amazon website regarding setting up a tunnel, don't work that well. I'm gonna try out tomorrow if this tunnel also works with Squirrel and other generic JDBC DB tools.