I set up a kerberos cluster by cloudera manager 5.7.0 and it works fine.
following picture is the architecture of my cluster.
However, when I try to connect to my cluster by impyla API to do some query from external network,
connect( host='10.36.174.38', port=21050, auth_mechanism='GSSAPI', kerberos_service_name='impala'),
It fails, and the error is
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server impala/10.36.174.38@ROBINISME2.COM not found in Kerberos database)
as the error mentioned, my cluster doesn't have principal "impala/10.36.174.38@ROBINISME2.COM". I have "impala/dn-3-1@ROBINISME2.COM" and "impala/dn-3-2@ROBINISME2.COM", but I can't connect to hosts dn-3-1 & dn-3-2 directly
To connect to a datanode with impala daemon, I setup HAproxy on my proxy server,
listen impala :21050
server dn-3-1 dn-3-1:21050
server dn-3-2 dn-3-2:21050
and also follow all instructions in this page
but the error remains, what can I do to reach my goal?
Is it possible to connect to kerberos cluster by impyla API and do some query from external network?
What problem are you having? Are you seeing "not found in Kerberos database"
The cause of problem in the original description is that the IP address was being used to make a connection that required kerberos authentication. With kerberos, you need to specify the hostname (fully-qualified) here.
If you are having any other troubles, feel free to start your own thread