Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do you enable URL rewriting for Knox for WebHDFS?

avatar
Explorer

We have added Knox to our cluster and are proxying WebHDFS calls through it. We would like to have Knox rewriting the WebHDFS urls so that all subsequent calls to WebHDFS can be proxied through Knox, but it is unclear how to enable the URL rewriting.

Currently, I can send an https request, via curl, to Knox with a request to 'OPEN' a file and it will return the Location header which I can then use to download the file from HDFS.

$ curl -s -i -k -H "Authorization: Basic c3NpdGFyYW06elNtM0JvVyE=" -X GET 'https://api01.qa:8443/quasar/jupstats/webhdfs/v1/user/rchapin/output_directory/000001_0?op=OPEN'

HTTP/1.1 307 Temporary Redirect
Set-Cookie: JSESSIONID=1qbldz84z20s9li4l0nz4hdkw;Path=/quasar/jupstats;Secure;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache
Expires: Tue, 24 May 2016 01:53:57 GMT
Date: Tue, 24 May 2016 01:53:57 GMT
Pragma: no-cache
Expires: Tue, 24 May 2016 01:53:57 GMT
Date: Tue, 24 May 2016 01:53:57 GMT
Pragma: no-cache
Location: http://dn04.qa.quasar.local:50075/webhdfs/v1/user/rchapin/output_directory/000001_0?op=OPEN&user.nam...
Server: Jetty(6.1.26.hwx)
Content-Type: application/octet-stream
Content-Length: 0

The problem is that the URL returned in the Location header is a direct link to one of the data nodes and is not a URL to the Knox server. Based on the Knox documentation here, Knox should be rewriting the Location header to proxy that request through itself and it should be encrypting the original query parameters.

In my attempts to figure out how to enable rewriting I read the section regarding Provider Configuration, however I was unable to find any further information about how to configure the rewrite provider, or find an example of what a provider configuration block for rewrites looks like.

Any assistance on how to configure Knox to enable URL rewriting would be greatly appreciated.

The Knox topology file is as follows:

<topology>
            <gateway>

                <provider>
                    <role>authentication</role>
                    <name>ShiroProvider</name>
                    <enabled>true</enabled>
                    <param>
                        <name>sessionTimeout</name>
                        <value>30</value>
                    </param>
                    <param>
                        <name>main.ldapRealm</name>
                        <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.userDnTemplate</name>
                        <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.url</name>
                        <value>ldap://{{knox_host_name}}:33389</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                        <value>simple</value>
                    </param>
                    <param>
                        <name>urls./**</name>
                        <value>authcBasic</value>
                    </param>
                </provider>

                <provider>
                    <role>identity-assertion</role>
                    <name>Default</name>
                    <enabled>true</enabled>
                </provider>

                <provider>
                    <role>authorization</role>
                    <name>XASecurePDPKnox</name>
                    <enabled>true</enabled>
                </provider>

            </gateway>

            <service>
                <role>NAMENODE</role>
                <url>hdfs://{{namenode_host}}:{{namenode_rpc_port}}</url>
            </service>

            <service>
                <role>JOBTRACKER</role>
                <url>rpc://{{rm_host}}:{{jt_rpc_port}}</url>
            </service>

            <service>
                <role>WEBHDFS</role>
                <url>http://{{namenode_host}}:{{namenode_http_port}}/webhdfs</url>
            </service>

            <service>
                <role>WEBHCAT</role>
                <url>http://{{webhcat_server_host}}:{{templeton_port}}/templeton</url>
            </service>

            <service>
                <role>OOZIE</role>
                <url>http://{{oozie_server_host}}:{{oozie_server_port}}/oozie</url>
            </service>

            <service>
                <role>WEBHBASE</role>
                <url>http://{{hbase_master_host}}:{{hbase_master_port}}</url>
            </service>

            <service>
                <role>HIVE</role>
                <url>http://{{hive_server_host}}:{{hive_http_port}}/{{hive_http_path}}</url>
            </service>

            <service>
                <role>RESOURCEMANAGER</role>
                <url>http://{{rm_host}}:{{rm_port}}/ws</url>
            </service>
        </topology>
1 ACCEPTED SOLUTION

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
1 REPLY 1

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login