Member since
09-29-2015
42
Posts
34
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1409 | 10-06-2017 06:59 PM | |
1220 | 01-19-2017 05:08 PM | |
1354 | 07-22-2016 01:26 PM | |
2099 | 07-12-2016 11:34 AM | |
1393 | 06-19-2016 01:52 PM |
01-16-2017
04:45 PM
The gateway/knox_sample assumes a topology named knox_sample.xml in your {GATEWAY_HOME}/conf/topologies directory. If that doesn't exist then you will get a 404. As mentioned by others, if you are using Ambari to make the changes then you are using the default.xml topology since that is the only topology that Ambari is aware.
... View more
07-22-2016
01:26 PM
1 Kudo
There are a number of ways that you can do this. Personally, I would opt for using Apache Knox rather than pulling in the client jars and config for Hadoop. This will allow you to use JDBC to HiveServer2 and the HBase RestServer API instead. Assuming that you will authenticate the enduser in your web application, you can then propagate the user identity via the Pre-authenticated SSO provider in Knox [1]. Coupled with mutual authentication with SSL [2], you have a trusted proxy that is able to authenticate to HiveServer2 via keberos and act on behalf of your endusers which are authenticated in your web application. [1] - http://knox.apache.org/books/knox-0-9-0/user-guide.html#Preauthenticated+SSO+Provider [2] - http://knox.apache.org/books/knox-0-9-0/user-guide.html#Mutual+Authentication+with+SSL
... View more
07-12-2016
11:34 AM
1 Kudo
The JCEKS credential provider leverages the Sun/Oracle proprietary keystore format to protect the credentials within the store. The proprietary algorithm used by Sun/Oracle is based on password based encryption but uses 3-key triple DES (instead of DES) in CBC mode with PKCS #5 padding. This has an effective cryptographic strength of 112 bits, although the key is 168 bits plus 24 parity bits for a total of 192 bits. This key (and the initialization vector) is derived from a password using a proprietary MD5-based algorithm. Normally, deriving the initialization vector from the key would defeat the purpose, but each entry also has a unique salt for key derivation. This means that the derived key and initialization vector are unique to to each entry. The JCEKS provider does require a password for the keystore. There are a couple ways to specify this password: 1. Environment variable 2. Password file with file permissions - location specified within configuration 3. Default password of "none" with keystore protected with file permissions The first two are perfectly viable but each require that both the credential administrator and the runtime consumer of the credential have access to the same keystore password. This is usually non-trivial. In addition, #2 is solely dependent on file permissions and therefore the credential is in clear text in the password file. #3 has a hardcoded password but this means that it is available to both administrator and consumer and the credential is not stored in clear text. Combined with appropriate file permissions, this approach is arguably the best for using the JCEKS provider. It is important to understand that the Credential Provider API is a pluggable API and other providers can be implemented in order to have a more secure approach. A credential server that authenticated the requesting user instead of requiring a password for instance would be a good way to remove the keystore password issue. Incidentally, there are Apache docs for this which will be published once Hadoop 2.8.3 and later are released.
... View more
07-11-2016
08:47 PM
In theory, the hadoop-auth provider in Knox could be used with KnoxSSO in order to accept the kerberos ticket. It would assume that the kerberos ticket would be presented to Knox via the SPNEGO challenge from hadoop-auth and that the result would be a ticket that for Knox and from the same realm or a trusted realm as Knox is configured for. There are a good number of maybe's in there and it is certainly not something that has been tested. I would be interested in hearing the results. Again, this has not been tested and is not a supported usecase for HDP.
... View more
07-03-2016
02:48 PM
Zac is correct. The topologies have always been considered to be cluster definitions thus the cluster name is the topology name. Unfortunately, this sometimes gets confused when folks are using Ambari where you name you cluster as well. They are different things.
... View more
06-19-2016
01:52 PM
I assume that you mean that Knox will be deployed within a DMZ of sorts between two firewalls. The challenges will be to make sure that the appropriate hosts and ports are available to Knox for accessing the Hadoop components inside the cluster.
... View more
05-12-2016
06:19 PM
This seems really odd. Would you happen to have multiple instances of Knox gateway running behind a load-balancer?
... View more
05-06-2016
08:06 PM
1 Kudo
Hi Pavel - From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge. Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file. The following shows you how HttpClient is used: https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/shell/Hadoop.java The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session: https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/shell/hdfs/Put.java Here is the base class that uses the Hadoop session class: https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/shell/AbstractRequest.java Bottom line: I would suggest that you use HttpClient to do the interactions. HTH. --larry
... View more
04-26-2016
11:50 AM
Yes, that looks like an incorrect setting and that should be changed. I don't think that there is anyway to override that from the curl side of things. Even if there were webhcat is configured wrong for your deployment and any other clients that need to access it. Your questions: 1. No 2. I'm not sure about this actually. However, one of the beauties of using the REST API is that you don't need client side config files. Therefore, it would do you no good anyway. 3. HiveServer2 is accessed via ODBC/JDBC or the beeline client (which uses JDBC). You can certainly use HiveServer2 to insert values. It's primary purpose is to provide a server for executing SQL against hive tables, etc. See: http://hortonworks.com/hadoop-tutorial/secure-jdbc-odbc-clients-access-hiveserver2-using-apache-knox/ and http://knox.apache.org/books/knox-0-9-0/user-guide.html#Hive for details of accessing it through Knox. You may also look at the samples in {GATEWAY_HOME}/samples/hive.
... View more
04-26-2016
12:31 AM
1 Kudo
This is probably not in anyway related to your use of Knox. You could try the same access going directly to webhcat to test it out, if you like. I believe that you should be able to manage the settings for the templeton.libjars in Ambari under Advanced webhcat-site.xml. That seems to be where those settings are for me. As an example here are my current settings: /usr/hdp/${hdp.version}/zookeeper/zookeeper.jar,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar
Adjust the filenames to match what you have in your environment and see if that helps. If you are not using Ambari then you will need to find webhcat.xml, manually edit it and restart webhcat. Additionally, you may want to consider using HiveServer2 for SQL access. I believe that WebHCat is generally used for metadata related queries.
... View more