Member since
09-29-2015
42
Posts
34
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
572 | 10-06-2017 06:59 PM | |
506 | 01-19-2017 05:08 PM | |
760 | 07-22-2016 01:26 PM | |
967 | 07-12-2016 11:34 AM | |
583 | 06-19-2016 01:52 PM |
05-31-2018
01:02 PM
While we still do not have a solution for this, the following discussion on the Apache Knox user@ list may be of interest: http://mail-archives.apache.org/mod_mbox/knox-user/201802.mbox/%3cCAMvr1bgHTmeT2C0PyC_NUj_TZvBcGXBEqgXjHo2XmF7gnTZdag@mail.gmail.com%3e
... View more
05-22-2018
06:29 PM
If nothing has changed in the configuration then I would assume that something about the users have changed relative to the configuration. For instance, users may no longer be members of the "users" group or the Knox hosts have moved to different IP addresses than are configured.
... View more
01-29-2018
07:31 PM
What version of Oozie and HDP is this support available in?
... View more
12-20-2017
07:11 PM
2 Kudos
The KnoxShell release artifact provides a small footprint client environment that removes all unnecessary server dependencies, configuration, binary scripts, etc. It is comprised of a couple different things that empower different sorts of users.
A set of SDK type classes for providing access to Hadoop resources over HTTP A Groovy based DSL for scripting access to Hadoop resources based on the underlying SDK classes Token based Sessions to provide a CLI SSO session for executing multiple script While testing the KnoxShell examples for the 0.14.0 Apache Knox release, I realized that using the KnoxShell for access to HiveServer2 was not easily done. This is due to the fact that we are leveraging the knoxshell executable jar which makes is difficult to add additional classes and jars to the classpath for the executing script. I needed to create a launch script that called the main class of the executable jar while also being able to set the classpath with additional jars for Apache Hive clients. This article will go over the creation of a simple SQL client that we will call "knoxline" by using the KnoxShell Groovy based DSL. This particular article should work using the 0.14.0 knoxshell download and with previous gateway server releases as well. We will show how to use a simple groovy script to write a SQL client that can do something like the following: Download In the 0.14.0 release, you may get to the knoxshell download through the Apache Knox site. From the above page click the Gateway client binary archive link or just use the one here. Unzip this file into your preferred location which will result in a knoxshell-0.14.0 directory and we will refer to that location as the {GATEWAY_HOME}. CD {GATEWAY_HOME} You should see something similar to the following:
bash-3.2$ ls -l total 160 -r--r--r--@ 1 larry staff 71714 Dec 6 18:32 LICENSE -r--r--r--@ 1 larry staff 164 Dec 6 18:32 NOTICE -rw-r--r--@ 1 larry staff 1452 Dec 6 18:32 README drwxr-xr-x@ 6 larry staff 204 Dec 14 18:06 bin drwxr--r--@ 3 larry staff 102 Dec 14 18:06 conf drwxr-xr-x@ 19 larry staff 646 Dec 14 18:06 samples Directory
Description
bin
contains the main knoxshell jar and related shell scripts
conf
only contains log4j config
logs
contains the knoxshell.log file
samples
has numerous examples to help you get started
Setup Truststore for Client Get/setup truststore for the target Knox instance or fronting load balancer
if you have access to the server you may use the command
knoxcli.sh export-cert --type JKS
copy the resulting gateway-client-identity.jks to your user home directory you may also ask your Knox administrator to provide you with the public cert for the gateway and create your own truststore within your user home directory NOTE: if you see errors related to SSL and PKIX your truststore is not properly setup Add Hive Client Libraries
In order to add the client libraries that provide the HiveDriver and others, we will add an additional directory to the above structure. Directory
Description
lib
To contain external jars to add to the classpath for things like HiveDriver
Next we will download the hive standalone client jar which will contain nearly everything we need. For this article, we will download Hive 1.2.1 standalone jar and copy it to the newly created lib directory. You can use whatever version client jar is appropriate for your Hive deployment. Add Commons Logging Jar Download commons logging jar and copy to the libs directory as well. Add Launch Script As I mentioned earlier, we need to add a launch script that will execute the main class of the knoxshell executable jar while allowing us to set
additional jars on the classpath.
Save the following to a file named knoxline.sh within the bin directory: java -Dlog4j.configuration=conf/knoxshell-log4j.properties -cp bin/knoxshell.jar:lib/* org.apache.hadoop.gateway.shell.Shell bin/hive2.groovy "$@"
and save the following to another script in the bin directory called hive2.groovy:
import java.sql.DriverManager
import java.sql.SQLException
import org.apache.hadoop.gateway.shell.Credentials
gatewayHost = "localhost";
gatewayPort = 8443;
trustStore = System.getProperty('user.home') + "/gateway-client-trust.jks";
trustStorePassword = "changeit";
contextPath = "gateway/sandbox/hive";
sql = ""
if (args.length == 0) {
// accept defaults
System.out.println(String.format("\nDefault connection args: %s, %d, %s, %s, %s", gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath))
} else if (args[0] == "?" || args[0] == "help") {
System.out.println("\nExpected arguments: {host, port, truststore, truststore-pass, context-path}\n")
System.exit(0);
} else if (args.length == 5) {
gatewayHost = args[0];
gatewayPort = args[1].toInteger();
trustStore = args[2];
trustStorePassword = args[3];
contextPath = args[4];
System.out.println(String.format("\nProvided connection args: %s, %d, %s, %s, %s", gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath))
} else if (args.length > 0) {
System.out.println("\nERROR: Expected arguments: NONE for defaults or {host, port, truststore, truststore-pass, context-path}\n")
System.exit(1);
}
connectionString = String.format( "jdbc:hive2://%s:%d/;ssl=true;sslTrustStore=%s;trustStorePassword=%s?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/%s", gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath );
System.out.println(" _ _ _ ");
System.out.println("| | ___ __ _____ _| (_)_ __ ___ ");
System.out.println("| |/ / '_ \\ / _ \\ \\/ / | | '_ \\ / _ \\");
System.out.println("| <| | | | (_) > <| | | | | | __/");
System.out.println("|_|\\_\\_| |_|\\___/_/\\\\_\\_|_|_| |_|\\\\___|");
System.out.println("powered by Apache Knox");
System.out.println("");
credentials = new Credentials()
credentials.add("ClearInput", "Enter username: ", "user")
.add("HiddenInput", "Enter pas" + "sword: ", "pass")
credentials.collect()
user = credentials.get("user").string()
pass = credentials.get("pass").string()
// Load Hive JDBC Driver
Class.forName( "org.apache.hive.jdbc.HiveDriver" );
// Configure JDBC connection
connection = DriverManager.getConnection( connectionString, user, pass );
while(1) {
sql = System.console().readLine 'knoxline> '
if (!sql.equals("")) {
System.out.println(sql)
rs = true;
statement = connection.createStatement();
try {
if (statement.execute( sql )) {
resultSet = statement.getResultSet()
int colcount = 0
colcount = resultSet.getMetaData().getColumnCount();
row = 0
header = "| "
while ( resultSet.next() ) {
line = "| "
for (int i = 1; i <= colcount; i++) {
colvalue = resultSet.getString( i )
if (colvalue == null) colvalue = ""
colsize = colvalue.length()
headerSize = resultSet.getMetaData().getColumnLabel( i ).length()
if (headerSize > colsize) colsize = headerSize
if (row == 0) {
header += resultSet.getMetaData().getColumnLabel( i ).center(colsize) + " | ";
}
line += colvalue.center(colsize) + " | ";
}
if (row == 0) {
System.out.println("".padLeft(header.length()-1, "="))
System.out.println(header);
System.out.println("".padLeft(header.length()-1, "="))
}
System.out.println(line);
row++
}
System.out.println("\nRows: " + row + "\n");
resultSet.close();
}
}
catch(SQLException e) {
//e.printStackTrace()
System.out.println("SQL Exception encountered... " + e.getMessage())
if (e.getMessage().contains("org.apache.thrift.transport.TTransportException")) {
System.out.println("reconnecting... ")
connection = DriverManager.getConnection( connectionString, user, pass );
}
}
statement.close();
}
}
connection.close();
Execute a SQL Commands using KnoxLine Enter the knoxline sql client at the command line: I will use the defaults for the arguments in the script:
Default connection args: localhost, 8443, /Users/larry/gateway-client-trust.jks, changeit, gateway/sandbox/hive Depending on your deployment, you may want to set the above arguments on the CLI below:
knoxshell-0.14.0 larry$ bin/knoxline.sh Let's check for existing tables:
knoxline> show tables Let's create a table by loading file from the local disk of the cluster machine:
knoxline> CREATE TABLE logs(column1 string, column2 string, column3 string, column4 string, column5 string, column6 string, column7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' Show the created table: Show the table description:
knoxline> desc logs Load the data from the samples.log file in /tmp (copy the sample.log file from ./samples/hive/sample.log to the /tmp directory on your hiveserver2 host):
knoxline> LOAD DATA LOCAL INPATH '/tmp/sample.log' OVERWRITE INTO TABLE logs Do a select from the table: knoxline> select * from logs where column2='20:11:56' and column4='[TRACE]'
Some things to note about this sample:
The gateway URL defaults to the sandbox topology
alternatives would be passing it as an argument to the script Credential collectors are used to gather credentials or other input from various sources. In this sample the HiddenInput and ClearInput collectors prompt the user for the input with the provided prompt text and the values are acquired by a subsequent get call with the provided name value. The standard Java classes for JDBC are used rather than the Hadoop session object used for access to the pure REST APIs The resultSet is rendered in the familiar table format of other command line interfaces but shows how to access it for doing whatever scripting needs you have Error handling is more or less non-existent in this example I hope to bring "knoxline" to the KnoxShell module in a future release just as a simple way to do some quick queries from your KnoxShell environment.
... View more
- Find more articles tagged with:
- hiveserver2
- How-ToTutorial
- Knox
- knoxshell
- Security
Labels:
12-20-2017
07:09 PM
An example of using KnoxShell to develop a simple SQL client for access to HiveServer2 through Apache Knox Gateway.
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- hiveserver2
- Knox
- knoxshell
Labels:
10-09-2017
03:32 PM
The JOBTRACKER and NAMENODE services are a bit odd at first glance. Don't confuse their use with proxying of RPC services. These are used in order to realize the rewrite requirements for Oozie. Oozie responses include some host and port information that we need to be able to identify internally through the ServiceRegistryService gateway service.
... View more
10-06-2017
06:59 PM
Apache Knox is an HTTP Gateway - it doesn't proxy RPC calls to platform components. I will proxy REST API calls for many of the same components and provide you access to resources that you otherwise couldn't due to authentication requirements, etc. I assume by data encryption in this context you mean on the wire. Wire level encryption for Knox interactions is based on TLS/SSL.
... View more
09-13-2017
08:11 PM
This is a great article and is immediately relevant to work going on in the community for KIP-8! https://cwiki.apache.org/confluence/display/KNOX/KIP-8+Service+Discovery+and+Topology+Generation
... View more
03-22-2017
09:02 PM
2 Kudos
@Roland Simonis - I believe that @Robert Levas is generally correct but you are talking about a keytab being compromised. In that case, I believe it is generally gameover. Keytab management is extremely important. They should only be readable by root and not even backed up. If you want to protect clusters from keytabs that are compromised from other clusters then they should be for different realms - IMO.
... View more
02-24-2017
04:46 PM
This is a great article, Sandeep!
... View more
01-19-2017
05:08 PM
1 Kudo
The PROXY_USER_NAME is actually poorly named. This value is only populated if principal mapping within identity assertion is done to map the authenticated user to another username to proxy to the backend service. It should probably be called MAPPED_USER or something like that. See: http://knox.apache.org/books/knox-0-11-0/user-guide.html#Audit
... View more
01-18-2017
06:14 PM
1 Kudo
More than likely, you do not have group lookup configured in Knox. If you check the {GATEWAY_HOME}/logs/gateway-audit.log you will likely notice an empty array "[]" for groups with the authentication entries. The groups need to be looked up by the Knox code and made available to the Ranger Knox plugin. The plugin doesn't do its own group lookup. HTH
... View more
01-16-2017
04:45 PM
The gateway/knox_sample assumes a topology named knox_sample.xml in your {GATEWAY_HOME}/conf/topologies directory. If that doesn't exist then you will get a 404. As mentioned by others, if you are using Ambari to make the changes then you are using the default.xml topology since that is the only topology that Ambari is aware.
... View more
07-22-2016
01:26 PM
1 Kudo
There are a number of ways that you can do this. Personally, I would opt for using Apache Knox rather than pulling in the client jars and config for Hadoop. This will allow you to use JDBC to HiveServer2 and the HBase RestServer API instead. Assuming that you will authenticate the enduser in your web application, you can then propagate the user identity via the Pre-authenticated SSO provider in Knox [1]. Coupled with mutual authentication with SSL [2], you have a trusted proxy that is able to authenticate to HiveServer2 via keberos and act on behalf of your endusers which are authenticated in your web application. [1] - http://knox.apache.org/books/knox-0-9-0/user-guide.html#Preauthenticated+SSO+Provider [2] - http://knox.apache.org/books/knox-0-9-0/user-guide.html#Mutual+Authentication+with+SSL
... View more
07-12-2016
11:34 AM
1 Kudo
The JCEKS credential provider leverages the Sun/Oracle proprietary keystore format to protect the credentials within the store. The proprietary algorithm used by Sun/Oracle is based on password based encryption but uses 3-key triple DES (instead of DES) in CBC mode with PKCS #5 padding. This has an effective cryptographic strength of 112 bits, although the key is 168 bits plus 24 parity bits for a total of 192 bits. This key (and the initialization vector) is derived from a password using a proprietary MD5-based algorithm. Normally, deriving the initialization vector from the key would defeat the purpose, but each entry also has a unique salt for key derivation. This means that the derived key and initialization vector are unique to to each entry. The JCEKS provider does require a password for the keystore. There are a couple ways to specify this password: 1. Environment variable 2. Password file with file permissions - location specified within configuration 3. Default password of "none" with keystore protected with file permissions The first two are perfectly viable but each require that both the credential administrator and the runtime consumer of the credential have access to the same keystore password. This is usually non-trivial. In addition, #2 is solely dependent on file permissions and therefore the credential is in clear text in the password file. #3 has a hardcoded password but this means that it is available to both administrator and consumer and the credential is not stored in clear text. Combined with appropriate file permissions, this approach is arguably the best for using the JCEKS provider. It is important to understand that the Credential Provider API is a pluggable API and other providers can be implemented in order to have a more secure approach. A credential server that authenticated the requesting user instead of requiring a password for instance would be a good way to remove the keystore password issue. Incidentally, there are Apache docs for this which will be published once Hadoop 2.8.3 and later are released.
... View more
07-11-2016
08:47 PM
In theory, the hadoop-auth provider in Knox could be used with KnoxSSO in order to accept the kerberos ticket. It would assume that the kerberos ticket would be presented to Knox via the SPNEGO challenge from hadoop-auth and that the result would be a ticket that for Knox and from the same realm or a trusted realm as Knox is configured for. There are a good number of maybe's in there and it is certainly not something that has been tested. I would be interested in hearing the results. Again, this has not been tested and is not a supported usecase for HDP.
... View more
07-03-2016
02:48 PM
Zac is correct. The topologies have always been considered to be cluster definitions thus the cluster name is the topology name. Unfortunately, this sometimes gets confused when folks are using Ambari where you name you cluster as well. They are different things.
... View more
06-19-2016
01:52 PM
I assume that you mean that Knox will be deployed within a DMZ of sorts between two firewalls. The challenges will be to make sure that the appropriate hosts and ports are available to Knox for accessing the Hadoop components inside the cluster.
... View more
05-12-2016
06:19 PM
This seems really odd. Would you happen to have multiple instances of Knox gateway running behind a load-balancer?
... View more
05-06-2016
08:06 PM
1 Kudo
Hi Pavel - From what I can tell from your description, it seems that you are developing a java application to consume WebHDFS APIs through Knox. We have samples that do this available in {GATEWAY_HOME}/samples directory. The Groovy scripts are based on java classes that leverage HttpClient to handle the basic challenge. Look at samples/ExampleWebHdfsPutGet.groovy for an example that does a PUT and then a subsequent GET of a given file. The following shows you how HttpClient is used: https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/shell/Hadoop.java The following shows you the implementation of the HDFS Put command in our client shell classes which leverages the execute method of the AbstractRequest base class to interact with Knox through the above Hadoop class as the session: https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/shell/hdfs/Put.java Here is the base class that uses the Hadoop session class: https://github.com/apache/knox/blob/master/gateway-shell/src/main/java/org/apache/hadoop/gateway/shell/AbstractRequest.java Bottom line: I would suggest that you use HttpClient to do the interactions. HTH. --larry
... View more
04-26-2016
11:50 AM
Yes, that looks like an incorrect setting and that should be changed. I don't think that there is anyway to override that from the curl side of things. Even if there were webhcat is configured wrong for your deployment and any other clients that need to access it. Your questions: 1. No 2. I'm not sure about this actually. However, one of the beauties of using the REST API is that you don't need client side config files. Therefore, it would do you no good anyway. 3. HiveServer2 is accessed via ODBC/JDBC or the beeline client (which uses JDBC). You can certainly use HiveServer2 to insert values. It's primary purpose is to provide a server for executing SQL against hive tables, etc. See: http://hortonworks.com/hadoop-tutorial/secure-jdbc-odbc-clients-access-hiveserver2-using-apache-knox/ and http://knox.apache.org/books/knox-0-9-0/user-guide.html#Hive for details of accessing it through Knox. You may also look at the samples in {GATEWAY_HOME}/samples/hive.
... View more
04-26-2016
12:31 AM
1 Kudo
This is probably not in anyway related to your use of Knox. You could try the same access going directly to webhcat to test it out, if you like. I believe that you should be able to manage the settings for the templeton.libjars in Ambari under Advanced webhcat-site.xml. That seems to be where those settings are for me. As an example here are my current settings: /usr/hdp/${hdp.version}/zookeeper/zookeeper.jar,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar
Adjust the filenames to match what you have in your environment and see if that helps. If you are not using Ambari then you will need to find webhcat.xml, manually edit it and restart webhcat. Additionally, you may want to consider using HiveServer2 for SQL access. I believe that WebHCat is generally used for metadata related queries.
... View more
04-07-2016
09:30 PM
1 Kudo
See the Apache docs here for HiveServer2 HA support: http://knox.apache.org/books/knox-0-8-0/user-guide.html#HiveServer2+HA
... View more
02-04-2016
09:48 PM
4 Kudos
Knox is composed of a number of Gateway Services at its core. Among these are one for setting up the Jetty SSL listener (JettySSLService) and another for protection of various credentials (AliasService) in order to keep them out of clear text config. These services are able to leverage each other for various things. The JettySSLService has the AliasService injected in order to get to the protected gateway-identity-passphrase. This is a password that is stored at the gateway level as apposed to the topology or cluster level. While mixing the two concerns would be possible with the functionality of JCEKS, it would inappropriately couple the two services implementations together. There has always been plans to look for a more central and secure credential store. The addition of the Credential Provider API in Hadoop opens up this possibility. Currently, the primary credential provider in Hadoop is still a JCEKS provider. We need a server to sit over a secure storage that you would need to authenticate to in order to get the protected passwords. Once this is available the current design in Knox will allow us to transition to a central credential server without the JettySSLService even being aware.
... View more
01-25-2016
06:32 PM
4 Kudos
Hi Vendant - SAML v2 available in the Apache Knox 0.7.0 release through the PicketLink federation provider. It is however not documented since the PicketLink project itself is being reworked into a larger identity solution. The Apache Knox 0.8.0 release which should be finalizing in the next week or so will leverage the pac4j provider that has recently been committed. Pac4j provider SAML v2 as well as various other integrations and is the focus of the 0.8.0 release. While I haven't tested it explicitly with OpenSSO, it has been successfully tested with shibboleth and Okta for SAML prior to commit. Given an environment with OpenSSO deployed, a professional services organization would be great to contribute such testing - you don't get much closer to the real world and the customers than that!
... View more
01-18-2016
09:18 PM
That seems to indicate that the Knox certificate is not correctly set as trusted. I am not sure how to do this for ODBC. In the following blog, the truststore and related password are provided in the connect string - for JDBC. The ODBC connect string may have the same idea. You will need to import the public cert for the Knox server into whatever the truststore is.
... View more
01-18-2016
03:44 AM
1 Kudo
Apache Knox provides the same REST APIs for PUTting files into HDFS. Please see the Apache docs for examples: http://knox.apache.org/books/knox-0-7-0/user-guide.html#WebHDFS+Examples Note that the link will take you to examples of using the groovy scripting capabilities of Knox as well as examples of using curl to do the same things.
... View more
01-04-2016
06:23 PM
1 Kudo
There is currently no end-to-end API management story for hosting APIs on Hadoop. I have spent some time thinking and talking about this idea and with some customer driven usecases we may be able to get it on the roadmap. In the meantime, I would consider using Slider to deploy a service that you can host on tomcat or jetty, etc. I know that there has been some recent work for deploying tomcat to YARN via Slider. This may be worth looking into. See the work done as part of https://issues.apache.org/jira/browse/SLIDER-1012. If you would like to bring a more holistic API management usecase to the Knox community that would incorporate the use of Slider to deploy and publish APIs to the YARN registry, and a facilities for discovery and subscription that would be useful for your deployment scenarios then please engage the Knox community on the dev@ list. We would love to get your insights and perspective!
... View more
12-21-2015
05:41 PM
1 Kudo
I would also offer that this mechanism would limit the usecases in which your custom service can be used to authentication that is based on username/password. There are a number of existing and upstream authentication/federation providers that do not involve providing a password to Knox. Your service will not work with KnoxSSO, HeaderPreAuth (SiteMinder, etc), OAuth, SAML, CAS, etc. I would suggest that you bring your usecase to the dev@ list for Apache Knox and that we determine the best approach for services like the one you have in mind.
... View more
10-28-2015
06:28 PM
4 Kudos
To put a bit of a finer point on this topic, we should describe the type of solutions that integrate easily with the pre-authenticated SSO provider in Knox. There are a number of solutions in the enterprise that follow a particular pattern for integration. This pattern requires all traffic to resources that participate in the SSO to be routed through a proxy or gateway to access those services. What this enables is the ability to inject headers into the request as it flows through the network to represent the authenticated user and in some cases the groups associated with that user. Apache Knox has the ability to flow-in the identity of the end user through the use of these HTTP headers by using the header based pre-authenticated SSO provider. It defaults to header names that are often used for SiteMinder integration - SM_USER and SM_GROUPS. The header names can be overridden to match those used in different environments. Tivoli Access Manager and other solutions follow this same pattern. The provider can also be configured to only accept requests from specific (or a range) of ip addresses as well as to require mutual authentication with SSL client certificates. These helps to mitigate risk of some other party circumventing the SSO solution and asserting an arbitrary identity for resource access. It is important to understand that the SSO solution and network security provisions need to ensure that there is no way to circumvent the SSO provider's proxy and go directly to Knox.
... View more