Community Articles

VR46 · ‎05-23-2017

Motivation:

When Knox is configured for perimeter security, the end users need to depend heavily on cURL tool or the browser to access the Hadoop services exposed via Knox.

Similarly, Hive queries can be submitted by using WebHCat (Templeton) service via Knox. User can also set various parameters required for Hive job to run correctly.

cURL Command Syntax:

Here's the cURL command syntax which can be used to submit a Hive Job via Knox:

$curl -ivk -u <username>:<password> -d <Hive parameters> [-d ...] https://<knox-server-FQDN>:8443/gateway/<topology>templeton/v1/hive"

Complete list of Hive parameters can be found in WebHCat cURL Command Reference.

The most important Hive parameters are:

Hive Query : -d execute="<Hive-Query>"

OR

Hive Program : -d file="/hdfs/path/to/hive/program"

Specifies a Hive query string using 'execute' OR HDFS file name of Hive program to run using 'file'. It is mandatory to provide either "execute" OR "file" option.

Hive Configuration : -d define="NAME=VALUE"

Any Hive configuration values like 'hive.execution.engine' or '' can be set by using 'define'. Multiple 'define's can be provided on cURL command.

One caveat, cURL can't seem to be processing the double equal symbol in "define=NAME=VALUE" correctly. It would convert that into "defineNAME=VALUE" erroneously. Fix is to escape one equal symbol with URL-encoded equivalent. Meaning, any 'define' should be provided like this: -d define="hive.execution.engine%3Dmr"

Output directory in HDFS : -d statusdir="/hdfs/path/to/output/directory"

Specifies a HDFS location where the output (and error) of the Hive job execution will be written to. Once the job is finished (either success or failure), this location can be checked for stdout, stderr and exit code of the Hive query / program.

Example:

With this knowledge, here's a working example cURL command which submits Hive Select query as a job to the cluster via Knox. The output will be a YARN job id which can be used further to track the job progress in Resource Manager UI.

# curl -ivk -u hr1:passw0rd -d execute="select+*+from+hivetest;" -d statusdir="/user/hr1/hive.output7" -d define="hive.execution.engine%3Dmr" "https://knox-server.domain.com:8443/gateway/default/templeton/v1/hive"
* About to connect() to knox-server.domain.com port 8443 (#0)
*   Trying 127.0.0.1... connected
* Connected to knox-server.domain.com (127.0.0.1) port 8443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=knox-server.domain.com,OU=Test,O=Hadoop,L=Test,ST=Test,C=US
* start date: Apr 07 23:02:54 2017 GMT
* expire date: Apr 07 23:02:54 2018 GMT
* common name: knox-server.domain.com
* issuer: CN=knox-server.domain.com,OU=Test,O=Hadoop,L=Test,ST=Test,C=US
* Server auth using Basic with user 'hr1'
> POST /gateway/default/templeton/v1/hive HTTP/1.1
> Authorization: Basic aHIxOkJhZc3Mjmq==
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: knox-server.domain.com:8443
> Accept: */*
> Content-Length: 98
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Fri, 19 May 2017 02:13:58 GMT
Date: Fri, 19 May 2017 02:13:58 GMT
< Set-Cookie: JSESSIONID=1k52mpj6ot9rm1nwi2dc9qcvu;Path=/gateway/default;Secure;HttpOnly
Set-Cookie: JSESSIONID=1k52mpj6ot9rm1nwi2dc9qcvu;Path=/gateway/default;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0; Expires=Thu, 18-May-2017 02:13:58 GMT
Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0; Expires=Thu, 18-May-2017 02:13:58 GMT
< Content-Type: application/json; charset=UTF-8
Content-Type: application/json; charset=UTF-8
< Server: Jetty(7.6.0.v20120127)
Server: Jetty(7.6.0.v20120127)
< Content-Length: 31
Content-Length: 31
<
* Connection #0 to host knox-server.domain.com left intact
* Closing connection #0
{"id":"job_1495157584958_0016"}

Hope this helps you out!

pkasinathan · ‎05-23-2017

This is very helpful! Thanks for this article!

naks · ‎10-24-2018

how can i see the output of hive command with job_id.

Also can you please help me to understand the command which will have query as well as query output stored in a file with single command

Cloudera Community

Community Articles

How to pass Hive configuration parameters to Knox via cURL

Apache Hive

Apache Knox

Motivation:

cURL Command Syntax:

Example:

Re: How to pass Hive configuration parameters to Knox via cURL

Re: How to pass Hive configuration parameters to Knox via cURL

Configure Knox with OpenID Connect

Configure Knox to access HDFS UI

Passing ListSFTP parameters on NiFi

Configure Knox SSO and Knox Proxy for NiFi UI Acce...

Passing dynamic values to Nifi Parameters/processo...

Configure Knox to access Atlas UI

Connecting to Hive via Knox from Tableau

How to configure a Knox topology for namenode HA

Working with CDE Spark Job Parameters in Cloudera ...

Resolution of Failed Knox Gateway Start During CDP...