Support Questions
Find answers, ask questions, and share your expertise

How do you enable URL rewriting for Knox for WebHDFS?

Explorer

We have added Knox to our cluster and are proxying WebHDFS calls through it. We would like to have Knox rewriting the WebHDFS urls so that all subsequent calls to WebHDFS can be proxied through Knox, but it is unclear how to enable the URL rewriting.

Currently, I can send an https request, via curl, to Knox with a request to 'OPEN' a file and it will return the Location header which I can then use to download the file from HDFS.

$ curl -s -i -k -H "Authorization: Basic c3NpdGFyYW06elNtM0JvVyE=" -X GET 'https://api01.qa:8443/quasar/jupstats/webhdfs/v1/user/rchapin/output_directory/000001_0?op=OPEN'

HTTP/1.1 307 Temporary Redirect
Set-Cookie: JSESSIONID=1qbldz84z20s9li4l0nz4hdkw;Path=/quasar/jupstats;Secure;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache
Expires: Tue, 24 May 2016 01:53:57 GMT
Date: Tue, 24 May 2016 01:53:57 GMT
Pragma: no-cache
Expires: Tue, 24 May 2016 01:53:57 GMT
Date: Tue, 24 May 2016 01:53:57 GMT
Pragma: no-cache
Location: http://dn04.qa.quasar.local:50075/webhdfs/v1/user/rchapin/output_directory/000001_0?op=OPEN&user.nam...
Server: Jetty(6.1.26.hwx)
Content-Type: application/octet-stream
Content-Length: 0

The problem is that the URL returned in the Location header is a direct link to one of the data nodes and is not a URL to the Knox server. Based on the Knox documentation here, Knox should be rewriting the Location header to proxy that request through itself and it should be encrypting the original query parameters.

In my attempts to figure out how to enable rewriting I read the section regarding Provider Configuration, however I was unable to find any further information about how to configure the rewrite provider, or find an example of what a provider configuration block for rewrites looks like.

Any assistance on how to configure Knox to enable URL rewriting would be greatly appreciated.

The Knox topology file is as follows:

<topology>
            <gateway>

                <provider>
                    <role>authentication</role>
                    <name>ShiroProvider</name>
                    <enabled>true</enabled>
                    <param>
                        <name>sessionTimeout</name>
                        <value>30</value>
                    </param>
                    <param>
                        <name>main.ldapRealm</name>
                        <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.userDnTemplate</name>
                        <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.url</name>
                        <value>ldap://{{knox_host_name}}:33389</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                        <value>simple</value>
                    </param>
                    <param>
                        <name>urls./**</name>
                        <value>authcBasic</value>
                    </param>
                </provider>

                <provider>
                    <role>identity-assertion</role>
                    <name>Default</name>
                    <enabled>true</enabled>
                </provider>

                <provider>
                    <role>authorization</role>
                    <name>XASecurePDPKnox</name>
                    <enabled>true</enabled>
                </provider>

            </gateway>

            <service>
                <role>NAMENODE</role>
                <url>hdfs://{{namenode_host}}:{{namenode_rpc_port}}</url>
            </service>

            <service>
                <role>JOBTRACKER</role>
                <url>rpc://{{rm_host}}:{{jt_rpc_port}}</url>
            </service>

            <service>
                <role>WEBHDFS</role>
                <url>http://{{namenode_host}}:{{namenode_http_port}}/webhdfs</url>
            </service>

            <service>
                <role>WEBHCAT</role>
                <url>http://{{webhcat_server_host}}:{{templeton_port}}/templeton</url>
            </service>

            <service>
                <role>OOZIE</role>
                <url>http://{{oozie_server_host}}:{{oozie_server_port}}/oozie</url>
            </service>

            <service>
                <role>WEBHBASE</role>
                <url>http://{{hbase_master_host}}:{{hbase_master_port}}</url>
            </service>

            <service>
                <role>HIVE</role>
                <url>http://{{hive_server_host}}:{{hive_http_port}}/{{hive_http_path}}</url>
            </service>

            <service>
                <role>RESOURCEMANAGER</role>
                <url>http://{{rm_host}}:{{rm_port}}/ws</url>
            </service>
        </topology>
1 ACCEPTED SOLUTION

Explorer

So at this point, I believe the problem was my own making, and I'll answer my own question

We had re-configured the cluster to be HA, however, I did not update the Knox configurations for HA.

After updating the topology file as follows, adding HA configurations for both WebHDFS, and HIVE, and updating the NAMENODE service to use the HA servicename.

        <topology>

            <gateway>

                <provider>
                    <role>ha</role>
                    <name>HaProvider</name>
                    <enabled>true</enabled>
                    <param>
                        <name>WEBHDFS</name>
                        <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
                    </param>
                    <param>
                        <name>HIVE</name>
                        <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
                    </param>
                </provider>


                <provider>
                    <role>authentication</role>
                    <name>ShiroProvider</name>
                    <enabled>true</enabled>
                    <param>
                        <name>sessionTimeout</name>
                        <value>30</value>
                    </param>
                    <param>
                        <name>main.ldapRealm</name>
                        <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.userDnTemplate</name>
                        <value>CN={0},OU=Network Architecture and Planning,OU=Network Operations Users,DC=qa,DC=hnops,DC=net</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.url</name>
                        <value>ldap://qa.hnops.net:389</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                        <value>simple</value>
                    </param>
                    <param>
                        <name>urls./**</name>
                        <value>authcBasic</value>
                    </param>
                </provider>


                <provider>
                    <role>identity-assertion</role>
                    <name>Default</name>
                    <enabled>true</enabled>
                </provider>

                <provider>
                    <role>authorization</role>
                    <name>AclsAuthz</name>
                    <enabled>true</enabled>
                </provider>


            </gateway>


            <service>
                <role>NAMENODE</role>
                <url>hdfs://quasar</url>
            </service>


            <service>
                <role>JOBTRACKER</role>
                <url>rpc://nn01.qa.quasar.local:8050</url>
            </service>


            <service>
                <role>WEBHDFS</role>
                <url>http://nn02.qa.quasar.local:50070/webhdfs</url>
                <url>http://nn01.qa.quasar.local:50070/webhdfs</url>
            </service>


            <service>
                <role>WEBHCAT</role>
                <url>http://sn02.qa.quasar.local:50111/templeton</url>
            </service>


            <service>
                <role>OOZIE</role>
                <url>http://sn02.qa.quasar.local:11000/oozie</url>
            </service>


            <service>
                <role>WEBHBASE</role>
                <url>http://None:8080</url>
            </service>


            <service>
                <role>HIVE</role>
                <url>http://sn02.qa.quasar.local:10001/cliservice</url>
                <url>http://sn01.qa.quasar.local:10001/cliservice</url>
            </service>


            <service>
                <role>RESOURCEMANAGER</role>
                <url>http://nn01.qa.quasar.local:8088/ws</url>
            </service>
        </topology>

Knox is now properly re-writing the Location header and proxying the requests.

$ curl -s -i -k -H "Authorization: Basic cmNoYXBpbjphYmMxMjMhQCM=" -X GET 'https://api01.qa:8443/quasar/jupstats/webhdfs/v1/user/rchapin/output_directory/000001_0?op=OPEN'HTTP/1.1 307 Temporary Redirect
Set-Cookie: JSESSIONID=jssiado2ozvrd7q2emics1c2;Path=/quasar/jupstats;Secure;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache
Expires: Wed, 25 May 2016 15:31:46 GMT
Date: Wed, 25 May 2016 15:31:46 GMT
Pragma: no-cache
Expires: Wed, 25 May 2016 15:31:46 GMT
Date: Wed, 25 May 2016 15:31:46 GMT
Pragma: no-cache
Location: https://api01.qa:8443/quasar/jupstats/webhdfs/data/v1/webhdfs/v1/user/rchapin/output_directory/00000...
Server: Jetty(6.1.26.hwx)
Content-Type: application/octet-stream
Content-Length: 0

View solution in original post

1 REPLY 1

Explorer

So at this point, I believe the problem was my own making, and I'll answer my own question

We had re-configured the cluster to be HA, however, I did not update the Knox configurations for HA.

After updating the topology file as follows, adding HA configurations for both WebHDFS, and HIVE, and updating the NAMENODE service to use the HA servicename.

        <topology>

            <gateway>

                <provider>
                    <role>ha</role>
                    <name>HaProvider</name>
                    <enabled>true</enabled>
                    <param>
                        <name>WEBHDFS</name>
                        <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
                    </param>
                    <param>
                        <name>HIVE</name>
                        <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
                    </param>
                </provider>


                <provider>
                    <role>authentication</role>
                    <name>ShiroProvider</name>
                    <enabled>true</enabled>
                    <param>
                        <name>sessionTimeout</name>
                        <value>30</value>
                    </param>
                    <param>
                        <name>main.ldapRealm</name>
                        <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.userDnTemplate</name>
                        <value>CN={0},OU=Network Architecture and Planning,OU=Network Operations Users,DC=qa,DC=hnops,DC=net</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.url</name>
                        <value>ldap://qa.hnops.net:389</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                        <value>simple</value>
                    </param>
                    <param>
                        <name>urls./**</name>
                        <value>authcBasic</value>
                    </param>
                </provider>


                <provider>
                    <role>identity-assertion</role>
                    <name>Default</name>
                    <enabled>true</enabled>
                </provider>

                <provider>
                    <role>authorization</role>
                    <name>AclsAuthz</name>
                    <enabled>true</enabled>
                </provider>


            </gateway>


            <service>
                <role>NAMENODE</role>
                <url>hdfs://quasar</url>
            </service>


            <service>
                <role>JOBTRACKER</role>
                <url>rpc://nn01.qa.quasar.local:8050</url>
            </service>


            <service>
                <role>WEBHDFS</role>
                <url>http://nn02.qa.quasar.local:50070/webhdfs</url>
                <url>http://nn01.qa.quasar.local:50070/webhdfs</url>
            </service>


            <service>
                <role>WEBHCAT</role>
                <url>http://sn02.qa.quasar.local:50111/templeton</url>
            </service>


            <service>
                <role>OOZIE</role>
                <url>http://sn02.qa.quasar.local:11000/oozie</url>
            </service>


            <service>
                <role>WEBHBASE</role>
                <url>http://None:8080</url>
            </service>


            <service>
                <role>HIVE</role>
                <url>http://sn02.qa.quasar.local:10001/cliservice</url>
                <url>http://sn01.qa.quasar.local:10001/cliservice</url>
            </service>


            <service>
                <role>RESOURCEMANAGER</role>
                <url>http://nn01.qa.quasar.local:8088/ws</url>
            </service>
        </topology>

Knox is now properly re-writing the Location header and proxying the requests.

$ curl -s -i -k -H "Authorization: Basic cmNoYXBpbjphYmMxMjMhQCM=" -X GET 'https://api01.qa:8443/quasar/jupstats/webhdfs/v1/user/rchapin/output_directory/000001_0?op=OPEN'HTTP/1.1 307 Temporary Redirect
Set-Cookie: JSESSIONID=jssiado2ozvrd7q2emics1c2;Path=/quasar/jupstats;Secure;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache
Expires: Wed, 25 May 2016 15:31:46 GMT
Date: Wed, 25 May 2016 15:31:46 GMT
Pragma: no-cache
Expires: Wed, 25 May 2016 15:31:46 GMT
Date: Wed, 25 May 2016 15:31:46 GMT
Pragma: no-cache
Location: https://api01.qa:8443/quasar/jupstats/webhdfs/data/v1/webhdfs/v1/user/rchapin/output_directory/00000...
Server: Jetty(6.1.26.hwx)
Content-Type: application/octet-stream
Content-Length: 0
; ;