Member since 
    
	
		
		
		02-21-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                69
            
            
                Posts
            
        
                45
            
            
                Kudos Received
            
        
                11
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1900 | 06-06-2018 02:51 PM | |
| 5471 | 10-12-2017 02:48 PM | |
| 1797 | 08-01-2017 08:58 PM | |
| 35106 | 06-12-2017 02:36 PM | |
| 5982 | 02-16-2017 04:58 PM | 
			
    
	
		
		
		01-08-2019
	
		
		11:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 <namenode>    is the issue 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-06-2018
	
		
		02:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @Thiago
 Charchar, you can use the HBase REST service that comes by default in the package, you only have to start it - the init script is located under /usr/hdp/current/hbase-master/etc/rc.d/hbase-rest.  These will be the endpoints offered: https://hbase.apache.org/1.1/apidocs/org/apache/hadoop/hbase/rest/package-summary.html  You can start it on the HBase Master nodes (usually 2 of them) but if you'd need it to scale, I guess you can start it on as many nodes are required, it's just a Java app that offers the REST service and connects to HBase in the backend.    You can also tune it a little bit, for example setting the number of threads (in Custom
hbase-site):  hbase.rest.threads.max=200
hbase.rest.threads.min=10 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-08-2018
	
		
		11:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 Introduction 
 OpenID Connect (OIDC) is an authentication layer on top of OAuth 2.0, an authorization framework. 
 It uses simple JSON Web Tokens (JWT), which is an open standard for securely transmitting information as a JSON object. These objects are normally signed with an RSA key and contain information such as if the user was authenticated or the user's id and email. More information and examples can be found here: https://auth0.com/docs/tokens/concepts/jwts 
 Knox, together with pac4j, a Java security engine, uses OpenID Connect (and also SAML, CAS, OAuth) to enable Single Sign On using a variety of 3rd party Identity Providers that support these protocols. 
 For example, Knox can be integrated with a service such as https://auth0.com to provide users authenticated to auth0 access to Hadoop resources (without a need to enter their credentials again or even integrate Knox with an LDAP). 
 Following is an example of how Knox can integrate with auth0 using OpenID Connect. 
 Setup auth0 
 Sign up 
 The first step is to sign up to https://auth0.com and create an account to manage the identity provider. 
 Since this a public SaaS based service, it needs a unique identifier to distinguish between clients (it will form its own unique subdomain): 
    
 Add the first user 
 Once the account is created, add a new user in the auth0 internal database. This user will be used to login in Knox. auth0 can integrate with a variety of sources (including AD/LDAP) and Social (Google, Facebook). However, for this test, we define the user in the default auth0 internal Database. 
 Go to Users section and create your first user: 
    
 Define the client 
 Once we have a new user, the next step is to define a new client, in this case Knox, which is a Regular Web Application: 
    
    
 Client id and secret 
 Once the application is created, go to the Settings tab and note down the information that Knox will use to authenticate itself, as a client application, to auth0 (these will later be used in Knox's configuration): 
    
 Callback URLs 
 Then there is one setting that needs defined here - Allowed Callback URLs - which is the Knox URL that auth0 will redirect the user back to after successful authentication. This URL is in the following format: 
 https://<KNOX>:8443/gateway/knoxsso/api/v1/websso?pac4jCallback=true&client_name=OidcClient
 
 So for our example, we can use: 
    
 discoveryUri 
 Lastly, one other piece of configuration required by Knox is discoveryUri, which can be found at the end of the configuration page, in the Show Advanced Settings section -> Endpoints -> OpenID Configuration (typically in the format https://<IDP-FQDN>/.well-known/openid-configuration): 
 https://anghelknoxtest.eu.auth0.com/.well-known/openid-configuration
 
    
 This process, creating an application, getting its ID and Secret, configuring the allowed callback URL and finding out the discoveryUri are steps for any OpenID Connect identity provider. These have also been tested with NetIQ for instance. 
 Knox Configuration 
 To enable any SSO type authentication in Knox (be it OpenID Connect, SAML or other pac4j IdP), the Knox default topology must be configured to use the SSOCookieProvider and then the knoxsso.xml topology must be configured to use the OpenID Connect provider. 
 default topology 
 Set the following in the default topology (Advanced topology in Ambari) so that it uses SSOCookieProvider. Replace any other authentication providers (by default it's the ShiroProvider) with the ones bellow and set the sso.authentication.provider.url to the correct Knox IP/FQDN): 
   <provider>
    <role>webappsec</role>
    <name>WebAppSec</name>
    <enabled>true</enabled>
    <param>
      <name>cors.enabled</name>
      <value>true</value>
    </param>
  </provider>
  <provider>
    <role>federation</role>
    <name>SSOCookieProvider</name>
    <enabled>true</enabled>
    <param>
      <name>sso.authentication.provider.url</name>
      <value>https://00.000.000.000:8443/gateway/knoxsso/api/v1/websso</value>
    </param>
  </provider>
 
 knoxsso topology 
 And then configure the pac4j provider for OpenID Connect in your knoxsso.xml topology (Advanced knoxsso-topology in Ambari) - as per documentation.   
   Set the following: 
 
 pac4j.callbackUrl should point to the Knox IP or FQDN in this format: https://<KNOX>:8443/gateway/knoxsso/api/v1/websso 
 clientName to OidcClient 
 oidc.id to the Client ID from the auth0 Client configuration 
 oidc.secret to the Client Secret from the auth0 Client configuration 
 oidc.discoveryUri to the oidc.discoveryUri from the auth0 Client configuration 
 oidc.preferredJwsAlgorithm to RS256 
 knoxsso.redirect.whitelist.regex should include the IP or FQDN of Knox. 
 
   The following is the full topology definition for knoxsso.xml with the example values from the previous points: 
       <topology>
          <gateway>
              <provider>
                  <role>webappsec</role>
                  <name>WebAppSec</name>
                  <enabled>true</enabled>
                  <param><name>xframe.options.enabled</name><value>true</value></param>
              </provider>
              <provider>
                  <role>federation</role>
                  <name>pac4j</name>
                  <enabled>true</enabled>
                  <param>
                    <name>pac4j.callbackUrl</name>
                    <value>https://00.000.000.000:8443/gateway/knoxsso/api/v1/websso</value>
                  </param>
                  <param>
                    <name>clientName</name>
                    <value>OidcClient</value>
                  </param>
                  <param>
                    <name>oidc.id</name>
                    <value>8CD7789Nyl5QZd0Owuyamb7E0Qi29F9t</value>
                  </param>
                  <param>
                    <name>oidc.secret</name>
                    <value>CSIR3VtIdEdhak6LWYgPEv69P4J0P7ZcMOVnQovMoAnZGVOtCjcEEWyPOQoUxRh_</value>
                  </param>
                  <param>
                    <name>oidc.discoveryUri</name>
                    <value>https://anghelknoxtest.eu.auth0.com/.well-known/openid-configuration</value>
                  </param>
                  <param>
                    <name>oidc.preferredJwsAlgorithm</name>
                    <value>RS256</value>
                  </param>
              </provider>
          </gateway>
          <application>
            <name>knoxauth</name>
          </application>
          <service>
              <role>KNOXSSO</role>
              <param>
                  <name>knoxsso.cookie.secure.only</name>
                  <value>false</value>
              </param>
              <param>
                  <name>knoxsso.token.ttl</name>
                  <value>3600000</value>
              </param>
              <param>
                 <name>knoxsso.redirect.whitelist.regex</name>
                 <value>^https?:\/\/(localhost|00\.000\.000\.000|127\.0\.0\.1|0:0:0:0:0:0:0:1|::1):[0-9].*$</value>
              </param>
          </service>
      </topology>
 
 Test 
 Now restart Knox and test if it works. 
 To test, go to any Knox page, like https://<KNOX>:8443/gateway/default/templeton/v1/status or https://<KNOX>:8443/gateway/default/webhdfs/v1/tmp 
 For example, going to https://00.000.000.000:8443/gateway/default/templeton/v1/status should redirect to the auth0 authentication page: 
    
 And once you input the user details for the user that was previously created, it should redirect back to the knox page that was originally requested: 
    
 The user id 
 There is one issue remaining, that of the user identifier that is being retrieved by Knox, which is then used to communicate with Hadoop services and Ranger. 
 If we look in the gateway-audit.log file, we can see the following entry for the above request: 
 WEBHCAT|auth0|5a79dd769bf9bc6ee2539f67|||access|uri|/gateway/default/templeton/v1/status|success|Response status: 200
 
 From the log, we can see that the user Knox actually "sees" is auth0|5a79dd769bf9bc6ee2539f67 which is the user id from auth0: 
    
 This is not exactly useful if we want to apply Ranger policies for example, or if we care about the user that Knox proxies to Hadoop services. 
 To understand better, if we enable DEBUG logging in Knox, we would see the following entry after the user authenticates: 
 DEBUG filter.Pac4jIdentityAdapter (Pac4jIdentityAdapter.java:doFilter(70)) - User authenticated as: <OidcProfile> | id: auth0|5a79dd769bf9bc6ee2539f67 |attributes: {sub=auth0|5a79dd769bf9bc6ee2539f67, email_verified=true, updated_at=2018-02-08T12:47:56.061Z, nickname=aanghel, name=aanghel@hortonworks.com, picture=https://s.gravatar.com/avatar/7baaabe6020925809d0650e9d4cefe9c?s=480&r=pg&d=https%3A%2F%2Fcdn.auth0.com%2Favatars%2Faa.png, email=aanghel@hortonworks.com} | roles: [] | permissions: [] | isRemembered: false |
 
 From the above, we can see that the OpenID Connect implementation in Knox uses a Profile which has an id and many attributes. To be able to use any of these attributes, Knox 0.14 is required or at least the KNOX-1119 patch, which adds a new configuration variable - pac4j.id_attribute, that allows us to pick the attribute that we want from the session above.  We can define this configuration in the knoxsso topology, after pac4j.callbackUrl: 
                   <param>
                    <name>pac4j.id_attribute</name>
                    <value>nickname</value>
                  </param>
 
 With the above, a Knox 0.14 will use the actual username when communicating with Hadoop or Ranger. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		10-12-2017
	
		
		04:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @balalaika ^^    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-12-2017
	
		
		03:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Glad it work, but I didn't say to use ls -l (which outputs the additional stuff), but ls -1 (as in the 1 - the number) which outputs only the filename, one per line. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-12-2017
	
		
		02:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @balalaika  It would be better to just output the file listing with 1 filename per line and then have a SplitText processor, followed by a FetchFile. You can do this with the -1 parameter for ls:  ls -1 /home/user/test | grep ".zip"  SplitText will just generate 1 flowfile for each file which is then easily picked by FetchFile.  You will still need to add an ExtractText between them but with a simple (.*) rule (this is to transfer the flowfile content - which is the actual filename - into an attribute that can be used by FetchFile as the filename). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-11-2017
	
		
		06:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Foivos A The output.stream relation from ExecuteStreamCommand contains the stdout from the command executed.  Unless you do cat <unzipped_file> at the end of your script you won't see anything on that relation. And this would only work if you only have 1 unzipped file of course.  The way I did this was to have the script "echo" at the end the names of the local files, one per line. This output will go to the output.stream relation and from there you can do SplitText to split the output by line followed by a FetchFile -> PutHDFS.  If you're still interested, I can share my flow and the scripts, but as Abdelkrim mentioned, UnpackContent should do the job, even for very large files as UnpackContent followed by PutHDFS will be streamed so will not affect the NiFi heap.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-29-2017
	
		
		07:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Jacqualin jasmin  You have to use the API or the configs.sh helper command:  /var/lib/ambari-server/resources/scripts/configs.sh
-port 8080 set localhost <CLUSTER_NAME> ranger-env ranger_admin_log_dir
"/var/log/hdp/ranger/admin"  /var/lib/ambari-server/resources/scripts/configs.sh
-port 8080 set localhost <CLUSTER_NAME> ranger-env ranger_usersync_log_dir
"/var/log/hdp/ranger/usersync"  /var/lib/ambari-server/resources/scripts/configs.sh
-port 8080 set localhost <CLUSTER_NAME> ranger-env ranger.tagsync.logdir
"/var/log/hdp/ranger/tagsync" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-01-2017
	
		
		08:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @Gerd Koenig  NiFi needs two configuration passwords, nifi.security.encrypt.configuration.password and nifi.sensitive.props.key. You can pass those using the nifi-ambari-config in the blueprint. The passwords also need to be at least 12 characters in length.  Here's a simple blueprint that normally works for me:  {
  "configurations" : [
    {
      "nifi-ambari-config" : {
        "nifi.node.ssl.port": "9091",
        "nifi.node.port": "9090",
        "nifi.security.encrypt.configuration.password": "AsdQwe123456",
        "nifi.sensitive.props.key": "AsdQwe123456"
      }
    },
    {
      "nifi-env" : {
        "nifi_group" : "nifi",
        "nifi_user" : "nifi"
      }
    }
  ],
  "host_groups" : [
    {
      "name" : "mytestcluster-singlenode",
      "configurations" : [ ],
      "components" : [
        { "name" : "ZOOKEEPER_CLIENT" },
        { "name" : "INFRA_SOLR_CLIENT" },
        { "name" : "ZOOKEEPER_SERVER" },
        { "name" : "NIFI_MASTER" },
        { "name" : "AMBARI_SERVER" },
        { "name" : "INFRA_SOLR" },
        { "name" : "METRICS_COLLECTOR" },
        { "name" : "METRICS_GRAFANA" },
        { "name" : "METRICS_MONITOR" }
        
      ]
    }
  ],
  "Blueprints" : {
    "stack_name" : "HDF",
    "stack_version" : "3.0"
  }
}
  Regarding the folders, yes, NiFi will recreate them based on the configuration variables. These can also be set in the blueprint:      {
      "nifi-ambari-config" : {
          "nifi.internal.dir" : "/var/lib/nifi",
          "nifi.content.repository.dir.default" : "/var/lib/nifi/content_repository",
          "nifi.state.dir" : "{nifi_internal_dir}/state/local",
          "nifi.flow.config.dir" : "{nifi_internal_dir}/conf",
          "nifi.config.dir" : "{nifi_install_dir}/conf",
          "nifi.flowfile.repository.dir" : "/var/lib/nifi/flowfile_repository",
          "nifi.provenance.repository.dir.default" : "/var/lib/nifi/provenance_repository",
          "nifi.database.dir" : "/var/lib/nifi/database_repository"
      }
    },
    
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-03-2017
	
		
		07:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @suresh krish this is now fixed in Ambari 2.5.1:   https://issues.apache.org/jira/browse/AMBARI-20868 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













