Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can't connect to Impala through JDBC on Amazon EMR

avatar
Explorer

Hi all,

 
I'm trying to connect to Impala on a cluster setup through Amazon EMR, but it doesn't work. It's a three-node cluster, with Impala installed and working. I've done the following things:
 
  • Setup a SSH tunnel to the master node like this: ssh -ND 21050 hadoop@master-node-external-dns-hostname
  • Downloaded the correct JDBC drivers from here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/impala-jdbc.html
  • Tried to setup a connection using SquirrelSQL and SQLWorkbenchJ using the downloaded drivers and the following connection string: jdbc:hive2://localhost:21050/;auth=noSasl
  • Result: Could not establish connection to jdbc:hive2://localhost:21050/;auth=noSasl: null
  • I checked wether Impala works by running impala-shell on the master node. I can show tables, query, etc.
  • I checked wether the port is forwarded through the tunnel by telnetting to localhost 21050
  • I checked with beeline on the master node if it's possible at all to connect to Impala through JDBC on that port. Works just fine
Am I missing something? Can someone shine their light on this? 
 
Thanks!
1 ACCEPTED SOLUTION

avatar
New Contributor
Hi Daan,

I tried a different SSH tunnel and it worked for me:

ssh -L 12345:localhost:21050 your_user_name@your_node.compute.amazonaws.com

This opens up a port 12345 on your local machine and forwards it to port 21050 on the hadoop node.

More info here: http://marcelkrcah.net/blog/how-to-wire-pandas-to-impala/

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Daan,

 

I think you'd need to ask Amazon about this; it provides support for Impala on EMR.

avatar
Explorer
I asked it over at the Amazon EMR forums as well. No answers so far 😞 I
thought that maybe this was a general Impala JDBC issue that people have
seen before.

avatar
New Contributor
Hi Daan,

I tried a different SSH tunnel and it worked for me:

ssh -L 12345:localhost:21050 your_user_name@your_node.compute.amazonaws.com

This opens up a port 12345 on your local machine and forwards it to port 21050 on the hadoop node.

More info here: http://marcelkrcah.net/blog/how-to-wire-pandas-to-impala/

avatar
Explorer

Thanks Marcel! That seems to work indeed, at least with Tableau and Impyla. Apparently the instructions on the Amazon website regarding setting up a tunnel, don't work that well. I'm gonna try out tomorrow if this tunnel also works with Squirrel and other generic JDBC DB tools.