Created on 05-23-2017 01:59 AM
When Knox is configured for perimeter security, the end users need to depend heavily on cURL tool or the browser to access the Hadoop services exposed via Knox.
Similarly, Hive queries can be submitted by using WebHCat (Templeton) service via Knox. User can also set various parameters required for Hive job to run correctly.
Here's the cURL command syntax which can be used to submit a Hive Job via Knox:
$curl -ivk -u <username>:<password> -d <Hive parameters> [-d ...] https://<knox-server-FQDN>:8443/gateway/<topology>templeton/v1/hive"
Complete list of Hive parameters can be found in WebHCat cURL Command Reference.
The most important Hive parameters are:
OR
Specifies a Hive query string using 'execute' OR HDFS file name of Hive program to run using 'file'. It is mandatory to provide either "execute" OR "file" option.
Any Hive configuration values like 'hive.execution.engine' or '' can be set by using 'define'. Multiple 'define's can be provided on cURL command.
One caveat, cURL can't seem to be processing the double equal symbol in "define=NAME=VALUE" correctly. It would convert that into "defineNAME=VALUE" erroneously. Fix is to escape one equal symbol with URL-encoded equivalent. Meaning, any 'define' should be provided like this: -d define="hive.execution.engine%3Dmr"
Specifies a HDFS location where the output (and error) of the Hive job execution will be written to. Once the job is finished (either success or failure), this location can be checked for stdout, stderr and exit code of the Hive query / program.
With this knowledge, here's a working example cURL command which submits Hive Select query as a job to the cluster via Knox. The output will be a YARN job id which can be used further to track the job progress in Resource Manager UI.
# curl -ivk -u hr1:passw0rd -d execute="select+*+from+hivetest;" -d statusdir="/user/hr1/hive.output7" -d define="hive.execution.engine%3Dmr" "https://knox-server.domain.com:8443/gateway/default/templeton/v1/hive" * About to connect() to knox-server.domain.com port 8443 (#0) * Trying 127.0.0.1... connected * Connected to knox-server.domain.com (127.0.0.1) port 8443 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * warning: ignoring value of ssl.verifyhost * skipping SSL peer certificate verification * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: * subject: CN=knox-server.domain.com,OU=Test,O=Hadoop,L=Test,ST=Test,C=US * start date: Apr 07 23:02:54 2017 GMT * expire date: Apr 07 23:02:54 2018 GMT * common name: knox-server.domain.com * issuer: CN=knox-server.domain.com,OU=Test,O=Hadoop,L=Test,ST=Test,C=US * Server auth using Basic with user 'hr1' > POST /gateway/default/templeton/v1/hive HTTP/1.1 > Authorization: Basic aHIxOkJhZc3Mjmq== > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > Host: knox-server.domain.com:8443 > Accept: */* > Content-Length: 98 > Content-Type: application/x-www-form-urlencoded > < HTTP/1.1 200 OK HTTP/1.1 200 OK < Date: Fri, 19 May 2017 02:13:58 GMT Date: Fri, 19 May 2017 02:13:58 GMT < Set-Cookie: JSESSIONID=1k52mpj6ot9rm1nwi2dc9qcvu;Path=/gateway/default;Secure;HttpOnly Set-Cookie: JSESSIONID=1k52mpj6ot9rm1nwi2dc9qcvu;Path=/gateway/default;Secure;HttpOnly < Expires: Thu, 01 Jan 1970 00:00:00 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT < Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0; Expires=Thu, 18-May-2017 02:13:58 GMT Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0; Expires=Thu, 18-May-2017 02:13:58 GMT < Content-Type: application/json; charset=UTF-8 Content-Type: application/json; charset=UTF-8 < Server: Jetty(7.6.0.v20120127) Server: Jetty(7.6.0.v20120127) < Content-Length: 31 Content-Length: 31 < * Connection #0 to host knox-server.domain.com left intact * Closing connection #0 {"id":"job_1495157584958_0016"}
Hope this helps you out!
Created on 05-23-2017 07:19 PM
This is very helpful! Thanks for this article!
Created on 10-24-2018 09:19 PM
how can i see the output of hive command with job_id.
Also can you please help me to understand the command which will have query as well as query output stored in a file with single command